hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_contab_condl_logistic (g11ca)

Purpose

nag_contab_condl_logistic (g11ca) returns parameter estimates for the conditional logistic analysis of stratified data, for example, data from case-control studies and survival analyses.

Syntax

[dev, b, se, sc, cov, nca, nct, ifail] = g11ca(ns, z, isz, ic, isi, b, tol, maxit, 'n', n, 'm', m, 'ip', ip, 'iprint', iprint)
[dev, b, se, sc, cov, nca, nct, ifail] = nag_contab_condl_logistic(ns, z, isz, ic, isi, b, tol, maxit, 'n', n, 'm', m, 'ip', ip, 'iprint', iprint)

Description

In the analysis of binary data, the logistic model is commonly used. This relates the probability of one of the outcomes, say y = 1y=1, to pp explanatory variates or covariates by
Prob(y = 1) = (exp(α + zTβ))/(1 + exp(α + zTβ)),
Prob(y=1)=exp(α+zTβ) 1+exp(α+zTβ) ,
where ββ is a vector of unknown coefficients for the covariates zz and αα is a constant term. If the observations come from different strata or groups, αα would vary from strata to strata. If the observed outcomes are independent then the yys follow a Bernoulli distribution, i.e., a binomial distribution with sample size one and the model can be fitted as a generalized linear model with binomial errors.
In some situations the number of observations for which y = 1y=1 may not be independent. For example, in epidemiological research, case-control studies are widely used in which one or more observed cases are matched with one or more controls. The matching is based on fixed characteristics such as age and sex, and is designed to eliminate the effect of such characteristics in order to more accurately determine the effect of other variables. Each case-control group can be considered as a stratum. In this type of study the binomial model is not appropriate, except if the strata are large, and a conditional logistic model is used. This considers the probability of the cases having the observed vectors of covariates given the set of vectors of covariates in the strata. In the situation of one case per stratum, the conditional likelihood for nsns strata can be written as
ns
L = (exp(ziTβ))/([lSiexp(zlTβ)]),
i = 1
L=i=1nsexp(ziTβ) [lSiexp(zlTβ)] ,
(1)
where SiSi is the set of observations in the iith stratum, with associated vectors of covariates zlzl, lSilSi, and zizi is the vector of covariates of the case in the iith stratum. In the general case of cici cases per strata then the full conditional likelihood is
ns
L = (exp(siTβ))/([lCiexp(slTβ)]),
i = 1
L=i=1nsexp(siTβ) [lCiexp(slTβ)] ,
(2)
where sisi is the sum of the vectors of covariates for the cases in the iith stratum and slsl, lCilCi refer to the sum of vectors of covariates for all distinct sets of cici observations drawn from the iith stratum. The conditional likelihood can be maximized by a Newton–Raphson procedure. The covariances of the parameter estimates can be estimated from the inverse of the matrix of second derivatives of the logarithm of the conditional likelihood, while the first derivatives provide the score function, Uj(β)Uj(β), for j = 1,2,,pj=1,2,,p, which can be used for testing the significance of parameters.
If the strata are not small, CiCi can be large so to improve the speed of computation, the algorithm in Howard (1972) and described by Krailo and Pike (1984) is used.
A second situation in which the above conditional likelihood arises is in fitting Cox's proportional hazard model (see nag_surviv_coxmodel (g12ba)) in which the strata refer to the risk sets for each failure time and where the failures are cases. When ties are present in the data nag_surviv_coxmodel (g12ba) uses an approximation. For an exact estimate, the data can be expanded using nag_surviv_coxmodel_risksets (g12za) to create the risk sets/strata and nag_contab_condl_logistic (g11ca) used.

References

Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B 34 187–220
Cox D R and Hinkley D V (1974) Theoretical Statistics Chapman and Hall
Howard S (1972) Remark on the paper by Cox, D R (1972): Regression methods J. R. Statist. Soc. B 34 and life tables 187–220
Krailo M D and Pike M C (1984) Algorithm AS 196. Conditional multivariate logistic analysis of stratified case-control studies Appl. Statist. 33 95–103
Smith P G, Pike M C, Hill P, Breslow N E and Day N E (1981) Algorithm AS 162. Multivariate conditional logistic analysis of stratum-matched case-control studies Appl. Statist. 30 190–197

Parameters

Compulsory Input Parameters

1:     ns – int64int32nag_int scalar
The number of strata, nsns.
Constraint: ns1ns1.
2:     z(ldz,m) – double array
ldz, the first dimension of the array, must satisfy the constraint ldznldzn.
The iith row must contain the covariates which are associated with the iith observation.
3:     isz(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint m1m1.
Indicates which subset of covariates are to be included in the model.
If isz(j)1iszj1, the jjth covariate is included in the model.
If isz(j) = 0iszj=0, the jjth covariate is excluded from the model and not referenced.
Constraint: isz(j)0iszj0 and at least one value must be nonzero.
4:     ic(n) – int64int32nag_int array
n, the dimension of the array, must satisfy the constraint n2n2.
Indicates whether the iith observation is a case or a control.
If ic(i) = 0ici=0, indicates that the iith observation is a case.
If ic(i) = 1ici=1, indicates that the iith observation is a control.
Constraint: ic(i) = 0ici=0 or 11, for i = 1,2,,ni=1,2,,n.
5:     isi(n) – int64int32nag_int array
n, the dimension of the array, must satisfy the constraint n2n2.
Stratum indicators which also allow data points to be excluded from the analysis.
If isi(i) = kisii=k, indicates that the iith observation is from the kkth stratum, where k = 1,2,,nsk=1,2,,ns.
If isi(i) = 0isii=0, indicates that the iith observation is to be omitted from the analysis.
Constraint: 0isi(i)ns0isiins and more than ip values of isi(i) > 0isii>0, for i = 1,2,,ni=1,2,,n.
6:     b(ip) – double array
ip, the dimension of the array, must satisfy the constraint ip1ip1 and ip = ip= number of nonzero values of isz .
Initial estimates of the covariate coefficient parameters ββ. b(j)bj must contain the initial estimate of the coefficent of the covariate in z corresponding to the jjth nonzero value of isz.
7:     tol – double scalar
Indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than tol × (1.0 + CurrentDeviance)tol×(1.0+CurrentDeviance). This corresponds approximately to an absolute accuracy if the deviance is small and a relative accuracy if the deviance is large.
Constraint: tol10 × machine precisiontol10×machine precision.
8:     maxit – int64int32nag_int scalar
The maximum number of iterations required for computing the estimates. If maxit is set to 00 then the standard errors, the score functions and the variance-covariance matrix are computed for the input value of ββ in b but ββ is not updated.
Constraint: maxit0maxit0.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the arrays ic, isi and the first dimension of the array z. (An error is raised if these dimensions are not equal.)
nn, the number of observations.
Constraint: n2n2.
2:     m – int64int32nag_int scalar
Default: The dimension of the array isz and the second dimension of the array z. (An error is raised if these dimensions are not equal.)
The number of covariates in array z.
Constraint: m1m1.
3:     ip – int64int32nag_int scalar
Default: The dimension of the array b.
pp, the number of covariates included in the model as indicated by isz.
Constraint: ip1ip1 and ip = ip= number of nonzero values of isz .
4:     iprint – int64int32nag_int scalar
Indicates if the printing of information on the iterations is required.
iprint0iprint0
No printing.
iprint1iprint1
The deviance and the current estimates are printed every iprint iterations. When printing occurs the output is directed to the current advisory message unit (see nag_file_set_unit_advisory (x04ab)).
Default: 00

Input Parameters Omitted from the MATLAB Interface

ldz wk lwk

Output Parameters

1:     dev – double scalar
The deviance, that is, 2 × -2×, (maximized log marginal likelihood).
2:     b(ip) – double array
b(j)bj contains the estimate β̂iβ^i of the coefficient of the covariate stored in the iith column of z where ii is the jjth nonzero value in the array isz.
3:     se(ip) – double array
se(j)sej is the asymptotic standard error of the estimate contained in b(j)bj and score function in sc(j)scj, for j = 1,2,,ipj=1,2,,ip.
4:     sc(ip) – double array
sc(j)scj is the value of the score function Uj(β)Uj(β) for the estimate contained in b(j)bj.
5:     cov(ip × (ip + 1) / 2ip×(ip+1)/2) – double array
The variance-covariance matrix of the parameter estimates in b stored in packed form by column, i.e., the covariance between the parameter estimates given in b(i)bi and b(j)bj, jiji, is given in cov(j(j1) / 2 + i)covj(j-1)/2+i.
6:     nca(ns) – int64int32nag_int array
nca(i)ncai contains the number of cases in the iith stratum, for i = 1,2,,nsi=1,2,,ns.
7:     nct(ns) – int64int32nag_int array
nct(i)ncti contains the number of controls in the iith stratum, for i = 1,2,,nsi=1,2,,ns.
8:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

  ifail = 1ifail=1
On entry,m < 1m<1,
orn < 2n<2,
orns < 1ns<1,
orip < 1ip<1,
orldz < nldz<n,
ortol < 10 × machine precisiontol<10×machine precision,
ormaxit < 0maxit<0.
  ifail = 2ifail=2
On entry,isz(i) < 0iszi<0, for some ii,
orthe value of ip is incompatible with isz,
oric(i)1ici1 or 00.
orisi(i) < 0isii<0 or isi(i) > nsisii>ns,
orthe number of values of isz(i) > 0iszi>0 is greater than or equal to n0n0, the number of observations excluding any with isi(i) = 0isii=0.
  ifail = 3ifail=3
The value of lwk is too small.
  ifail = 4ifail=4
Overflow has been detected. Try using different starting values.
  ifail = 5ifail=5
The matrix of second partial derivatives is singular. Try different starting values or include fewer covariates.
W ifail = 6ifail=6
Convergence has not been achieved in maxit iterations. The progress towards convergence can be examined by using a nonzero value of iprint. Any non-convergence may be due to a linear combination of covariates being monotonic with time.
Full results are returned.

Accuracy

The accuracy is specified by tol.

Further Comments

The other models described in Section [Description] can be fitted using the generalized linear modelling functions nag_correg_glm_binomial (g02gb) and nag_correg_glm_poisson (g02gc).
The case with one case per stratum can be analysed by having a dummy response variable yy such that y = 1y=1 for a case and y = 0y=0 for a control, and fitting a Poisson generalized linear model with a log link and including a factor with a level for each strata. These models can be fitted by using nag_correg_glm_poisson (g02gc).
nag_contab_condl_logistic (g11ca) uses mean centering, which involves subtracting the means from the covariables prior to computation of any statistics. This helps to minimize the effect of outlying observations and accelerates convergence. In order to reduce the risk of the sums computed by Howard's algorithm becoming too large, the scaling factor described in Krailo and Pike (1984) is used.
If the initial estimates are poor then there may be a problem with overflow in calculating exp(βTzi)exp(βTzi) or there may be non-convergence. Reasonable estimates can often be obtained by fitting an unconditional model.

Example

function nag_contab_condl_logistic_example
ns = int64(2);
z = [0, 1;
     1, 2;
     0, 1;
     1, 3;
     0, 1;
     1, 0;
     0, 2];
isz = [int64(1);1];
ic = [int64(0);0;1;1;0;1;1];
isi = [int64(1);1;1;1;2;2;2];
b = [0;
     0];
tol = 1e-05;
maxit = int64(10);
[dev, bOut, se, sc, covar, nca, nct, ifail] = ...
   nag_contab_condl_logistic(ns, z, isz, ic, isi, b, tol, maxit)
 

dev =

    5.4749


bOut =

   -0.5223
   -0.2674


se =

    1.3901
    0.8473


sc =

   1.0e-05 *

   -0.4794
   -0.7901


covar =

    1.9325
   -0.2317
    0.7180


nca =

                    2
                    1


nct =

                    2
                    2


ifail =

                    0


function g11ca_example
ns = int64(2);
z = [0, 1;
     1, 2;
     0, 1;
     1, 3;
     0, 1;
     1, 0;
     0, 2];
isz = [int64(1);1];
ic = [int64(0);0;1;1;0;1;1];
isi = [int64(1);1;1;1;2;2;2];
b = [0;
     0];
tol = 1e-05;
maxit = int64(10);
[dev, bOut, se, sc, covar, nca, nct, ifail] = g11ca(ns, z, isz, ic, isi, b, tol, maxit)
 

dev =

    5.4749


bOut =

   -0.5223
   -0.2674


se =

    1.3901
    0.8473


sc =

   1.0e-05 *

   -0.4794
   -0.7901


covar =

    1.9325
   -0.2317
    0.7180


nca =

                    2
                    1


nct =

                    2
                    2


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013