The function may be called by the names: g11cac, nag_contab_condl_logistic or nag_condl_logistic.
In the analysis of binary data, the logistic model is commonly used. This relates the probability of one of the outcomes, say , to explanatory variates or covariates by
where is a vector of unknown coefficients for the covariates and is a constant term. If the observations come from different strata or groups, would vary from strata to strata. If the observed outcomes are independent then the s follow a Bernoulli distribution, i.e., a binomial distribution with sample size one and the model can be fitted as a generalized linear model with binomial errors.
In some situations the number of observations for which may not be independent. For example, in epidemiological research, case-control studies are widely used in which one or more observed cases are matched with one or more controls. The matching is based on fixed characteristics such as age and sex, and is designed to eliminate the effect of such characteristics in order to more accurately determine the effect of other variables. Each case-control group can be considered as a stratum. In this type of study the binomial model is not appropriate, except if the strata are large, and a conditional logistic model is used. This considers the probability of the cases having the observed vectors of covariates given the set of vectors of covariates in the strata. In the situation of one case per stratum, the conditional likelihood for strata can be written as
where is the set of observations in the th stratum, with associated vectors of covariates , , and is the vector of covariates of the case in the th stratum. In the general case of cases per strata then the full conditional likelihood is
where is the sum of the vectors of covariates for the cases in the th stratum and , refer to the sum of vectors of covariates for all distinct sets of observations drawn from the th stratum. The conditional likelihood can be maximized by a Newton–Raphson procedure. The covariances of the parameter estimates can be estimated from the inverse of the matrix of second derivatives of the logarithm of the conditional likelihood, while the first derivatives provide the score function, , for , which can be used for testing the significance of parameters.
A second situation in which the above conditional likelihood arises is in fitting Cox's proportional hazard model (see g12bac) in which the strata refer to the risk sets for each failure time and where the failures are cases. When ties are present in the data g12bac uses an approximation. For an exact estimate, the data can be expanded using g12zac to create the risk sets/strata and g11cac used.
Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B34 187–220
Cox D R and Hinkley D V (1974) Theoretical Statistics Chapman and Hall
Howard S (1972) Remark on the paper by Cox, D R (1972): Regression methods J. R. Statist. Soc.B 34 and life tables 187–220
Krailo M D and Pike M C (1984) Algorithm AS 196. Conditional multivariate logistic analysis of stratified case-control studies Appl. Statist.33 95–103
Smith P G, Pike M C, Hill P, Breslow N E and Day N E (1981) Algorithm AS 162. Multivariate conditional logistic analysis of stratum-matched case-control studies Appl. Statist.30 190–197
1: – Nag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by . See Section 3.1.3 in the Introduction to the NAG Library CL Interface for a more detailed explanation of the use of this argument.
On exit: the deviance, that is, minus twice the maximized log-likelihood.
12: – doubleInput/Output
On entry: initial estimates of the covariate coefficient parameters . must contain the initial estimate of the coefficent of the covariate in z corresponding to the th nonzero value of isz.
in many cases an initial value of zero for may be used. For another suggestion see Section 9.
On exit: contains the estimate of the coefficient of the covariate stored in the th column of z where is the th nonzero value in the array isz.
13: – doubleOutput
On exit: is the asymptotic standard error of the estimate contained in and score function in , for .
14: – doubleOutput
On exit: is the value of the score function for the estimate contained in .
15: – doubleOutput
On exit: the variance-covariance matrix of the parameter estimates in b stored in packed form by column, i.e., the covariance between the parameter estimates given in and , , is given in .
16: – IntegerOutput
On exit: contains the number of cases in the th stratum, for .
17: – IntegerOutput
On exit: contains the number of controls in the th stratum, for .
18: – doubleInput
On entry: indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than . This corresponds approximately to an absolute accuracy if the deviance is small and a relative accuracy if the deviance is large.
19: – IntegerInput
On entry: the maximum number of iterations required for computing the estimates. If maxit is set to then the standard errors, the score functions and the variance-covariance matrix are computed for the input value of in b but is not updated.
20: – IntegerInput
On entry: indicates if the printing of information on the iterations is required.
The deviance and the current estimates are printed every iprint iterations.
21: – const char *Input
On entry: the name of a file to which diagnostic output will be directed. If outfile is NULL the diagnostic output will be directed to standard output.
22: – NagError *Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).
6Error Indicators and Warnings
Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
On entry, argument had an illegal value.
Convergence not achieved in iterations. The progress towards convergence can be examined by using a nonzero value of iprint. Any non-convergence may be due to a linear combination of covariates being monotonic with time. Full results are returned.
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
Cannot close file .
Cannot open file for writing.
On entry, too few observations included in model.
Overflow in calculations. Try using different starting values.
On entry, .
The matrix of second partial derivatives is singular. Try different starting values or include fewer covariates.
g11cac is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g11cac makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
The other models described in Section 3 can be fitted using the generalized linear modelling functions g02gbcandg02gcc.
The case with one case per stratum can be analysed by having a dummy response variable such that for a case and for a control, and fitting a Poisson generalized linear model with a log link and including a factor with a level for each strata. These models can be fitted by using g02gcc.
g11cac uses mean centering, which involves subtracting the means from the covariables prior to computation of any statistics. This helps to minimize the effect of outlying observations and accelerates convergence. In order to reduce the risk of the sums computed by Howard's algorithm becoming too large, the scaling factor described in Krailo and Pike (1984) is used.
If the initial estimates are poor then there may be a problem with overflow in calculating or there may be non-convergence. Reasonable estimates can often be obtained by fitting an unconditional model.
The data was used for illustrative purposes by Smith et al. (1981) and consists of two strata and two covariates. The data is input, the model is fitted and the results are printed.