The function may be called by the names: g12bac, nag_surviv_coxmodel or nag_surviv_cox_model.
The proportional hazard model relates the time to an event, usually death or failure, to a number of explanatory variables known as covariates. Some of the observations may be right censored, that is the exact time to failure is not known, only that it is greater than a known time.
Let , for be the failure time or censored time for the th observation with the vector of covariates . It is assumed that censoring and failure mechanisms are independent. The hazard function, , is the probability that an individual with covariates fails at time given that the individual survived up to time . In the Cox proportional hazards model (Cox (1972)) is of the form:
where is the base-line hazard function, an unspecified function of time, is a vector of unknown arguments and is a known offset.
Assuming there are ties in the failure times giving distinct failure times, such that individuals fail at , it follows that the marginal likelihood for is well approximated (see Kalbfleisch and Prentice (1980)) by:
where is the sum of the covariates of individuals observed to fail at and is the set of individuals at risk just prior to , that is it is all individuals that fail or are censored at time along with all individuals that survive beyond time . The maximum likelihood estimates (MLEs) of , given by , are obtained by maximizing (1) using a Newton–Raphson iteration technique that includes step halving and utilizes the first and second partial derivatives of (1) which are given by equations (2) and (3) below:
for , where is the th element in the vector and
is the th component of a score vector and is the element of the observed information matrix whose inverse gives the variance-covariance matrix of .
It should be noted that if a covariate or a linear combination of covariates is monotonically increasing or decreasing with time then one or more of the 's will be infinite.
If varies across strata, where the number of individuals in the th stratum is , with , then rather than maximizing (1) to obtain , the following marginal likelihood is maximized:
where is the contribution to likelihood for the observations in the th stratum treated as a single sample in (1). When strata are included the covariate coefficients are constant across strata but there is a different base-line hazard function .
The base-line survivor function associated with a failure time , is estimated as , where
where is the number of failures at time . The residual for the th observation is computed as:
where . The deviance is defined as (logarithm of marginal likelihood). There are two ways to test whether individual covariates are significant: the differences between the deviances of nested models can be compared with the appropriate -distribution; or, the asymptotic normality of the parameter estimates can be used to form tests by dividing the estimates by their standard errors or the score function for the model under the null hypothesis can be used to form tests.
Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B34 187–220
Gross A J and Clark V A (1975) Survival Distributions: Reliability Applications in the Biomedical Sciences Wiley
Kalbfleisch J D and Prentice R L (1980) The Statistical Analysis of Failure Time Data Wiley
the number of distinct failure times. This is returned in nd.
23: – doubleInput
On entry: indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than (CurrentDeviance). This corresponds approximately to an absolute precision if the deviance is small and a relative precision if the deviance is large.
24: – IntegerInput
On entry: the maximum number of iterations to be used for computing the estimates. If max_iter is set to 0 then the standard errors, score functions, variance-covariance matrix and the survival function are computed for the input value of in b but is not updated.
25: – IntegerInput
On entry: indicates if the printing of information on the iterations is required.
There is no printing.
The deviance and the current estimates are printed every iprint iterations.
26: – const char *Input
On entry: the name of the file into which information is to be output. If outfile is set to NULL or to the string ‘stdout’, then the monitoring information is output to stdout.
27: – NagError *Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).
6Error Indicators and Warnings
On entry, while . These arguments must satisfy .
On entry, while . These arguments must satisfy .
Dynamic memory allocation failed.
The contents of array ic are not valid.
Constraint: not all values of ic can be 1.
Convergence has not been achieved in max_iter iterations. The progress towards convergence can be examined by using by setting iprint to . Any non-convergence may be due to a linear combination of covariates being monotonic with time. Full results are returned.
In the current iteration 10 step halvings have been performed without decreasing the deviance from the previous iteration. Convergence is assumed.
The matrix of second partial derivatives is singular. Try different starting values or include fewer covariates.
On entry, ndmax is while the output value of .
Overflow has been detected. Try different starting values.
On entry, and the number of nonzero values of .
Constraint: the number of nonzero values of sz.
On entry, the number of values of is , and excluded observations with is .
Constraint: the number of values of nonzero sz must be less than excluded observations.
Background information to multithreading can be found in the Multithreading documentation.
g12bac is not threaded in any implementation.
g12bac uses mean centering which involves subtracting the means from the covariables prior to computation of any statistics. This helps to minimize the effect of outlying observations and accelerates convergence.
If the initial estimates are poor then there may be a problem with overflow in calculating or there may be non-convergence. Reasonable estimates can often be obtained by fitting an exponential model using g02gcc.
The data are the remission times for two groups of leukemia patients (see Gross and Clark (1975) p242). A dummy variable indicates which group they come from. An initial estimate is computed using the exponential model and then the Cox proportional hazard model is fitted and parameter estimates and the survival function are printed.