nag_surviv_cox_model (g12bac) : NAG C Library, Mark 26

Specification

#include <nag.h>

#include <nagg12.h>

void

nag_surviv_cox_model (Integer n, Integer m, Integer ns, const double z[], Integer tdz, const Integer sz[], Integer ip, const double t[], const Integer ic[], const double omega[], const Integer isi[], double *dev, double b[], double se[], double sc[], double cov[], double res[], Integer *nd, double tp[], double sur[], Integer tdsur, Integer ndmax, double tol, Integer max_iter, Integer iprint, const char *outfile, NagError *fail)

Description

The proportional hazard model relates the time to an event, usually death or failure, to a number of explanatory variables known as covariates. Some of the observations may be right censored, that is the exact time to failure is not known, only that it is greater than a known time.

Let

t_{i}

, for

i = 1, 2, \dots, n

be the failure time or censored time for the

i

th observation with the vector of

p

covariates

z_{i}

. It is assumed that censoring and failure mechanisms are independent. The hazard function,

λ (t, z)

, is the probability that an individual with covariates

z

fails at time

t

given that the individual survived up to time

t

. In the Cox proportional hazards model (Cox (1972))

λ (t, z)

is of the form:

λ (t, z) = λ_{0} (t) \exp (z^{T} β + ω)

where

λ_{0}

is the base-line hazard function, an unspecified function of time,

β

is a vector of unknown arguments and

ω

is a known offset.

Assuming there are ties in the failure times giving

n_{d} < n

distinct failure times,

t_{(1)} < \dots < t_{(n_{d})}

such that

d_{i}

individuals fail at

t_{(i)}

, it follows that the marginal likelihood for

β

is well approximated (see Kalbfleisch and Prentice (1980)) by:

L = \prod_{i = 1}^{n_{d}} \frac{\exp (s_{i}^{T} β + ω_{i})}{{[\sum_{l \in R (t_{(i)})} \exp (z_{l}^{T} β + ω_{l})]}^{d_{i}}}

(1)

where

s_{i}

is the sum of the covariates of individuals observed to fail at

t_{(i)}

and

R (t_{(i)})

is the set of individuals at risk just prior to

t_{(i)}

, that is it is all individuals that fail or are censored at time

t_{(i)}

along with all individuals that survive beyond time

t_{(i)}

. The maximum likelihood estimates (MLEs) of

β

, given by

\hat{β}

, are obtained by maximizing (1) using a Newton–Raphson iteration technique that includes step halving and utilizes the first and second partial derivatives of (1) which are given by equations (2) and (3) below:

U_{j} (β) = \frac{\partial \ln L}{\partial β_{j}} = \sum_{i = 1}^{n_{d}} [s_{j i} - d_{i} α_{j i} (β)] = 0

(2)

for

j = 1, \dots, p

, where

s_{j i}

is the

j

th element in the vector

s_{i}

and

α_{j i} (β) = \frac{\sum_{l \in R (t_{(i)})} z_{j l} \exp (z_{l}^{T} β + ω_{l})}{\sum_{l \in R (t_{(i)})} \exp (z_{l}^{T} β + ω_{l})} .

Similarly,

I_{h j} (β) = - \frac{\partial^{2} \ln L}{\partial β_{h} \partial β_{j}} = \sum_{i = 1}^{n_{d}} d_{i} γ_{h j i}

(3)

where

γ_{h j i} = \frac{\sum_{l \in R (t_{(i)})} z_{h l} z_{j l} \exp (z_{l}^{T} β + ω_{l})}{\sum_{l \in R (t_{(i)})} \exp (z_{l}^{T} β + ω_{l})} - α_{h i} (β) α_{j i} (β) h, j = 1, \dots, p .

U_{j} (β)

is the

j

th component of a score vector and

I_{h j} (β)

is the

(h, j)

element of the observed information matrix

I (β)

whose inverse

I {(β)}^{- 1} = {[I_{h j} (β)]}^{- 1}

gives the variance-covariance matrix of

β

It should be noted that if a covariate or a linear combination of covariates is monotonically increasing or decreasing with time then one or more of the

β_{j}

's will be infinite.

λ_{0} (t)

varies across

ν

strata, where the number of individuals in the

k

th stratum is

n_{k}

k = 1, \dots, ν

with

n = \sum_{k = 1}^{ν} n_{k}

, then rather than maximizing (1) to obtain

\hat{β}

, the following marginal likelihood is maximized:

L = \prod_{k = 1}^{ν} L_{k},

(4)

where

L_{k}

is the contribution to likelihood for the

n_{k}

observations in the

k

th stratum treated as a single sample in (1). When strata are included the covariate coefficients are constant across strata but there is a different base-line hazard function

λ_{0}

The base-line survivor function associated with a failure time

t_{(i)}

, is estimated as

\exp (- \hat{H} (t_{(i)}))

, where

\hat{H} (t_{(i)}) = \sum_{t_{(j)} \leq t_{(i)}} (\frac{d_{i}}{\sum_{l \in R (t_{(j)})} \exp (z_{l}^{T} \hat{β} + ω_{l})}),

(5)

where

d_{i}

is the number of failures at time

t_{(i)}

. The residual for the

l

th observation is computed as:

r (t_{l}) = \hat{H} (t_{l}) \exp (- z_{l}^{T} \hat{β} + ω_{l})

where

\hat{H} (t_{l}) = \hat{H} (t_{(i)}), t_{(i)} \leq t_{l} < t_{(i + 1)}

. The deviance is defined as

- 2 \times

(logarithm of marginal likelihood). There are two ways to test whether individual covariates are significant: the differences between the deviances of nested models can be compared with the appropriate

χ^{2}

-distribution; or, the asymptotic normality of the parameter estimates can be used to form

z

tests by dividing the estimates by their standard errors or the score function for the model under the null hypothesis can be used to form

z

tests.

References

Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B 34 187–220

Gross A J and Clark V A (1975) Survival Distributions: Reliability Applications in the Biomedical Sciences Wiley

Kalbfleisch J D and Prentice R L (1980) The Statistical Analysis of Failure Time Data Wiley

Arguments

n

– IntegerInput

On entry: the number of data points,

n

Constraint:

n \geq 2

m

– IntegerInput

On entry: the number of covariates in array z.

Constraint:

m \geq 1

ns

– IntegerInput

On entry: the number of strata. If

ns > 0

then the stratum for each observation must be supplied in isi.

Constraint:

ns \geq 0

z [n \times tdz]

– const doubleInput

Note: the

(i, j)

th element of the matrix

Z

is stored in

z [(i - 1) \times tdz + j - 1]

On entry: the

i

th row must contain the covariates which are associated with the

i

th failure time given in t.

tdz

– IntegerInput

On entry: the stride separating matrix column elements in the array z.

Constraint:

tdz \geq m

sz [m]

– const IntegerInput

On entry: indicates which subset of covariates is to be included in the model.

$sz [i - 1] \geq 1$: The $j$ th covariate is included in the model.
$sz [i - 1] = 0$: The $j$ th covariate is excluded from the model and not referenced.

Constraints:

$sz [j - 1] \geq 0$ ;
At least one and at most $n_{0} - 1$ elements of sz must be nonzero where $n_{0}$ is the number of observations excluding any with zero value of isi.

ip

– IntegerInput

On entry: the number of covariates included in the model as indicated by sz.

Constraint:

ip =

number of nonzero values of sz.

t [n]

– const doubleInput

On entry: the vector of

n

failure censoring times.

ic [n]

– const IntegerInput

On entry: the status of the individual at time

t

given in t.

$ic [i - 1] = 0$: Indicates that the $i$ th individual has failed at time $t [i - 1]$ .
$ic [i - 1] = 1$: Indicates that the $i$ th individual has been censored at time $t [i - 1]$ .

Constraint:

ic [i - 1] = 0 or 1

, for

i = 1, 2, \dots, n

10:

omega [n]

– const doubleInput

On entry: if an offset is required then omega must contain the value of

ω_{i}

, for

i = 1, 2, \dots, n

. Otherwise omega must be set NULL.

11:

isi [\times]

– const IntegerInput

On entry: if

ns > 0

the stratum indicators which also allow data points to be excluded from the analysis. If

ns = 0

, isi is not referenced and may be NULL.

$isi [i - 1] = k$: Indicates that the $i$ th data point is in the $k$ th stratum, for $k = 1, 2, \dots, ns$ .
$isi [i - 1] = 0$: Indicates that the $i$ th data point is omitted from the analysis.

Constraint: if

ns > 0

0 \leq isi [i - 1] \leq ns

, and more than ip values of

isi [i - 1] > 0

, for

i = 1, 2, \dots, n

12:

dev

– double *Output

On exit: the deviance, that is

- 2 \times

(maximized log marginal likelihood).

13:

b [ip]

– doubleInput/Output

On entry: initial estimates of the covariate coefficient arguments

β

b [j - 1]

must contain the initial estimate of the coefficient of the covariate in z corresponding to the

j

th nonzero value of sz.

Suggested value: In many cases an initial value of zero for

b [j - 1]

may be used. For other suggestions see Section 9.

On exit:

b [j - 1]

contains the estimate

{\hat{β}}_{i}

, the coefficient of the covariate stored in the

i

th column of z where

i

is the

j

th nonzero value in the array sz.

14:

se [ip]

– doubleOutput

On exit:

se [j - 1]

is the asymptotic standard error of the estimate contained in

b [j - 1]

and score function in

sc [j - 1]

for

j = 1, 2, \dots, ip

15:

sc [ip]

– doubleOutput

On exit:

sc [j - 1]

is the value of the score function,

U_{j} (β)

, for the estimate contained in

b [j - 1]

16:

cov [ip \times (ip + 1)]

– doubleOutput

On exit: the variance-covariance matrix of the parameter estimates in b stored in packed form by column, i.e., the covariance between the parameter estimates given in

b [i - 1]

and

b [j - 1]

j \geq i

, is stored in

cov (j (j - 1) / 2 + i)

17:

res [n]

– doubleOutput

On exit: the residuals,

r (t_{l})

l = 1, 2, \dots, n

18:

nd

– Integer *Output

On exit: the number of distinct failure times.

19:

tp [ndmax]

– doubleOutput

On exit:

tp [i - 1]

contains the

i

th distinct failure time, for

i = 1, 2, \dots, nd

20:

sur [ndmax \times tdsur]

– doubleOutput

Note: the

(i, j)

th element of the matrix is stored in

sur [(i - 1) \times tdsur + j - 1]

On exit: if

ns = 0

, sur

(i, 1)

contains the estimated survival function for the

i

th distinct failure time.

ns > 0

, sur

(i, k)

contains the estimated survival function for the

i

th distinct failure time in the

k

th stratum.

21:

tdsur

– IntegerInput

On entry: the stride separating matrix column elements in the array sur.

Constraint:

tdsur \geq \max (ns, 1)

22:

ndmax

– IntegerInput

On entry: the second dimension of the array sur.

Constraint:

ndmax \geq

the number of distinct failure times. This is returned in nd.

23:

tol

– doubleInput

On entry: indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than

tol \times

(

1.0 +

CurrentDeviance). This corresponds approximately to an absolute precision if the deviance is small and a relative precision if the deviance is large.

Constraint:

tol \geq 10 \times machine precision

24:

max_iter

– IntegerInput

On entry: the maximum number of iterations to be used for computing the estimates. If max_iter is set to 0 then the standard errors, score functions, variance-covariance matrix and the survival function are computed for the input value of

β

in b but

β

is not updated.

Constraint:

max_iter \geq 0

25:

iprint

– IntegerInput

On entry: indicates if the printing of information on the iterations is required.

$iprint \leq 0$: There is no printing.
$iprint \geq 1$: The deviance and the current estimates are printed every iprint iterations.

26:

outfile

– const char *Input

On entry: the name of the file into which information is to be output. If outfile is set to NULL or to the string ‘stdout’, then the monitoring information is output to stdout.

27:

fail

– NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

Error Indicators and Warnings

NE_2_INT_ARG_LT

On entry,

tdsur = 〈value〉

while

ns = 〈value〉

. These arguments must satisfy

tdsur \geq ns

On entry,

tdz = 〈value〉

while

m = 〈value〉

. These arguments must satisfy

tdz \geq m

NE_ALLOC_FAIL

Dynamic memory allocation failed.

NE_ARRAY_CONS

The contents of array ic are not valid.
Constraint: not all values of ic can be 1.

NE_G12BA_CONV

Convergence has not been achieved in max_iter iterations. The progress towards convergence can be examined by using by setting iprint to

\geq 1

. Any non-convergence may be due to a linear combination of covariates being monotonic with time. Full results are returned.

NE_G12BA_DEV

In the current iteration 10 step halvings have been performed without decreasing the deviance from the previous iteration. Convergence is assumed.

NE_G12BA_MAT_SING

The matrix of second partial derivatives is singular. Try different starting values or include fewer covariates.

NE_G12BA_NDMAX

On entry, ndmax is

= 〈value〉

while the output value of

nd = 〈value〉

.
Constraint:

ndmax \geq nd

NE_G12BA_OVERFLOW

Overflow has been detected. Try different starting values.

NE_G12BA_SZ_IP

On entry,

ip = 〈value〉

and the number of nonzero values of

sz = 〈value〉

.
Constraint:

ip =

the number of nonzero values of sz.

NE_G12BA_SZ_ISI

On entry, the number of values of

sz [i] > 0

〈value〉

n = 〈value〉

and excluded observations with

isi [i] = 0

〈value〉

.
Constraint: the number of values of nonzero sz must be less than

n -

excluded observations.

NE_INT_ARG_LT

On entry,

m = 〈value〉

.
Constraint:

m \geq 1

On entry, max_iter must not be less than 0:

max_iter = 〈value〉

On entry,

n = 〈value〉

.
Constraint:

n \geq 2

On entry,

ns = 〈value〉

.
Constraint:

ns \geq 0

On entry,

tdsur = 〈value〉

.
Constraint:

tdsur \geq 1

NE_INT_ARRAY_CONS

On entry,

ic [〈value〉] = 〈value〉

.
Constraint:

ic [〈value〉] = 0

or 1.

On entry,

isi [〈value〉] = 〈value〉

.
Constraint:

0 \leq isi [〈value〉] \leq ns

On entry,

sz [〈value〉] = 〈value〉

.
Constraint:

sz [〈value〉] \geq 0

NE_INTERNAL_ERROR

An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.

NE_NOT_APPEND_FILE

Cannot open file outfile for appending.

NE_NOT_CLOSE_FILE

Cannot close file outfile.

NE_REAL_MACH_PREC

On entry,

tol = 〈value〉

machine precision (nag_machine_precision) = 〈value〉

.
Constraint:

tol \geq 10.0 \times machine precision

Further Comments

nag_surviv_cox_model (g12bac) uses mean centering which involves subtracting the means from the covariables prior to computation of any statistics. This helps to minimize the effect of outlying observations and accelerates convergence.

If the initial estimates are poor then there may be a problem with overflow in calculating

\exp (β^{T} z_{i})

or there may be non-convergence. Reasonable estimates can often be obtained by fitting an exponential model using nag_glm_poisson (g02gcc).

NAG C Library Function Document

nag_surviv_cox_model (g12bac)

▸▿ Contents

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results

NAG C Library Function Document

nag_surviv_cox_model (g12bac)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results