naginterfaces.library.contab.condl_logistic¶

naginterfaces.library.contab.condl_logistic(ns, z, isz, ic, isi, b, tol, maxit, iprint=0, io_manager=None)[source]¶

condl_logistic returns parameter estimates for the conditional logistic analysis of stratified data, for example, data from case-control studies and survival analyses.

For full information please refer to the NAG Library document for g11ca

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g11/g11caf.html

Parameters

nsint

The number of strata, $n_{s}$ .

zfloat, array-like, shape $(n, m)$

The $i$ th row must contain the covariates which are associated with the $i$ th observation.

iszint, array-like, shape $(m)$

Indicates which subset of covariates are to be included in the model.

If $i s z [j - 1] \geq 1$ , the $j$ th covariate is included in the model.

If $i s z [j - 1] = 0$ , the $j$ th covariate is excluded from the model and not referenced.

icint, array-like, shape $(n)$

Indicates whether the $i$ th observation is a case or a control.

If $i c [i - 1] = 0$ , indicates that the $i$ th observation is a case.

If $i c [i - 1] = 1$ , indicates that the $i$ th observation is a control.

isiint, array-like, shape $(n)$

Stratum indicators which also allow data points to be excluded from the analysis.

If $i s i [i - 1] = k$ , indicates that the $i$ th observation is from the $k$ th stratum, where $k = 1, 2, \dots, n s$ .

If $i s i [i - 1] = 0$ , indicates that the $i$ th observation is to be omitted from the analysis.

bfloat, array-like, shape $(ip)$

Initial estimates of the covariate coefficient parameters $β$ . $b [j - 1]$ must contain the initial estimate of the coefficent of the covariate in $z$ corresponding to the $j$ th nonzero value of $i s z$ .

Suggested value: in many cases an initial value of zero for $b [j - 1]$ may be used. For another suggestion see Further Comments.

tolfloat

Indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than $t o l \times (1.0 + CurrentDeviance)$ . This corresponds approximately to an absolute accuracy if the deviance is small and a relative accuracy if the deviance is large.

maxitint

The maximum number of iterations required for computing the estimates. If $m a x i t$ is set to $0$ then the standard errors, the score functions and the variance-covariance matrix are computed for the input value of $β$ in $b$ but $β$ is not updated.

iprintint, optional

Indicates if the printing of information on the iterations is required.

$i p r i n t \leq 0$

No printing.

$i p r i n t \geq 1$

The deviance and the current estimates are printed every $i p r i n t$ iterations. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see FileObjManager).

io_managerFileObjManager, optional

Manager for I/O in this routine.

Returns

devfloat: The deviance, that is, minus twice the maximized log-likelihood.
bfloat, ndarray, shape $(ip)$: $b [j - 1]$ contains the estimate ${^β}_{i}$ of the coefficient of the covariate stored in the $i$ th column of $z$ where $i$ is the $j$ th nonzero value in the array $i s z$ .
sefloat, ndarray, shape $(ip)$: $s e [j - 1]$ is the asymptotic standard error of the estimate contained in $b [j - 1]$ and score function in $s c [j - 1]$ , for $j = 1, 2, \dots, ip$ .
scfloat, ndarray, shape $(ip)$: $s c [j]$ is the value of the score function $U_{j} (β)$ for the estimate contained in $b [j - 1]$ .
covfloat, ndarray, shape $(ip \times (ip + 1) / 2)$: The variance-covariance matrix of the parameter estimates in $b$ stored in packed form by column, i.e., the covariance between the parameter estimates given in $b [i - 1]$ and $b [j - 1]$ , $j \geq i$ , is given in $c o v [j (j - 1) / 2 + i]$ .
ncaint, ndarray, shape $(n s)$: $n c a [i - 1]$ contains the number of cases in the $i$ th stratum, for $i = 1, 2, \dots, n s$ .
nctint, ndarray, shape $(n s)$: $n c t [i - 1]$ contains the number of controls in the $i$ th stratum, for $i = 1, 2, \dots, n s$ .

Raises

NagValueError

(errno $1$ )

On entry, $t o l = ⟨ v a l u e ⟩$ .

Constraint: $t o l \geq 10 \times machine precision$ .

(errno $1$ )

On entry, $ip = ⟨ v a l u e ⟩$ .

Constraint: $ip \geq 1$ .

(errno $1$ )

On entry, $m a x i t = ⟨ v a l u e ⟩$ .

Constraint: $m a x i t \geq 0$ .

(errno $1$ )

On entry, $n s = ⟨ v a l u e ⟩$ .

Constraint: $n s \geq 1$ .

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 2$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ .

Constraint: $m \geq 1$ .

(errno $2$ )

On entry, there are not $ip$ values of $i s z > 0$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ and $i s z [i - 1] < ⟨ v a l u e ⟩$ .

Constraint: $i s z [i - 1] \geq 0$ .

(errno $2$ )

On entry, too few observations included in model.

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ , $i s i [i - 1] = ⟨ v a l u e ⟩$ and $n s = ⟨ v a l u e ⟩$ .

Constraint: $0 \leq i s i [i - 1] \leq n s$ .

(errno $2$ )

On entry, $i = ⟨ v a l u e ⟩$ and $i c [i - 1] = ⟨ v a l u e ⟩$ .

Constraint: $i c [i] = 0$ or $1$ .

(errno $4$ )

Overflow in calculations.

(errno $5$ )

The matrix of second partial derivatives is singular.

Warns

NagAlgorithmicWarning

(errno $6$ ): Convergence not achieved in $⟨ v a l u e ⟩$ iterations.

Notes

In the analysis of binary data, the logistic model is commonly used. This relates the probability of one of the outcomes, say $y = 1$ , to $p$ explanatory variates or covariates by

P r o b (y = 1) = \frac{e x p (α + z^{T} β)}{1 + e x p (α + z^{T} β)},

where $β$ is a vector of unknown coefficients for the covariates $z$ and $α$ is a constant term. If the observations come from different strata or groups, $α$ would vary from strata to strata. If the observed outcomes are independent then the $y$ s follow a Bernoulli distribution, i.e., a binomial distribution with sample size one and the model can be fitted as a generalized linear model with binomial errors.

In some situations the number of observations for which $y = 1$ may not be independent. For example, in epidemiological research, case-control studies are widely used in which one or more observed cases are matched with one or more controls. The matching is based on fixed characteristics such as age and sex, and is designed to eliminate the effect of such characteristics in order to more accurately determine the effect of other variables. Each case-control group can be considered as a stratum. In this type of study the binomial model is not appropriate, except if the strata are large, and a conditional logistic model is used. This considers the probability of the cases having the observed vectors of covariates given the set of vectors of covariates in the strata. In the situation of one case per stratum, the conditional likelihood for $n_{s}$ strata can be written as

L = n_{s} \prod i = 1 \frac{e x p (z_{i}^{T} β)}{[\sum_{l \in S_{i}} e x p (z_{l}^{T} β)]},

where $S_{i}$ is the set of observations in the $i$ th stratum, with associated vectors of covariates $z_{l}$ , $l \in S_{i}$ , and $z_{i}$ is the vector of covariates of the case in the $i$ th stratum. In the general case of $c_{i}$ cases per strata then the full conditional likelihood is

L = n_{s} \prod i = 1 \frac{e x p (s_{i}^{T} β)}{[\sum_{l \in C_{i}} e x p (s_{l}^{T} β)]},

where $s_{i}$ is the sum of the vectors of covariates for the cases in the $i$ th stratum and $s_{l}$ , $l \in C_{i}$ refer to the sum of vectors of covariates for all distinct sets of $c_{i}$ observations drawn from the $i$ th stratum. The conditional likelihood can be maximized by a Newton–Raphson procedure. The covariances of the parameter estimates can be estimated from the inverse of the matrix of second derivatives of the logarithm of the conditional likelihood, while the first derivatives provide the score function, $U_{j} (β)$ , for $j = 1, 2, \dots, p$ , which can be used for testing the significance of parameters.

If the strata are not small, $C_{i}$ can be large so to improve the speed of computation, the algorithm in Howard (1972) and described by Krailo and Pike (1984) is used.

A second situation in which the above conditional likelihood arises is in fitting Cox’s proportional hazard model (see surviv.coxmodel) in which the strata refer to the risk sets for each failure time and where the failures are cases. When ties are present in the data surviv.coxmodel uses an approximation. For an exact estimate, the data can be expanded using surviv.coxmodel_risksets to create the risk sets/strata and condl_logistic used.

References

Cox, D R, 1972, Regression models in life tables (with discussion), J. Roy. Statist. Soc. Ser. B (34), 187–220

Cox, D R and Hinkley, D V, 1974, Theoretical Statistics, Chapman and Hall

Howard, S, 1972, Remark on the paper by Cox, D R (1972): Regression methods, J. R. Statist. Soc. (B 34), and life tables, 187–220

Krailo, M D and Pike, M C, 1984, Algorithm AS 196. Conditional multivariate logistic analysis of stratified case-control studies, Appl. Statist. (33), 95–103

Smith, P G, Pike, M C, Hill, P, Breslow, N E and Day, N E, 1981, Algorithm AS 162. Multivariate conditional logistic analysis of stratum-matched case-control studies, Appl. Statist. (30), 190–197

NAG and Python

Return to Front

naginterfaces.library.contab.condl_logistic¶

naginterfaces.library.contab.condl_​logistic¶

naginterfaces.library.contab.condl_logistic¶