# NAG C Library Function Document

## 1Purpose

nag_surviv_risk_sets (g12zac) creates the risk sets associated with the Cox proportional hazards model for fixed covariates.

## 2Specification

 #include #include
 void nag_surviv_risk_sets (Nag_OrderType order, Integer n, Integer m, Integer ns, const double z[], Integer pdz, const Integer isz[], Integer ip, const double t[], const Integer ic[], const Integer isi[], Integer *num, Integer ixs[], Integer *nxs, double x[], Integer mxn, Integer id[], Integer *nd, double tp[], Integer irs[], NagError *fail)

## 3Description

The Cox proportional hazards model (see Cox (1972)) relates the time to an event, usually death or failure, to a number of explanatory variables known as covariates. Some of the observations may be right-censored, that is, the exact time to failure is not known, only that it is greater than a known time.
Let ${t}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$, be the failure time or censored time for the $i$th observation with the vector of $p$ covariates ${z}_{i}$. The covariance matrix $Z$ is constructed so that it contains $n$ rows with the $i$th row containing the $p$ covariates ${z}_{i}$. It is assumed that censoring and failure mechanisms are independent. The hazard function, $\lambda \left(t,z\right)$, is the probability that an individual with covariates $z$ fails at time $t$ given that the individual survived up to time $t$. In the Cox proportional hazards model, $\lambda \left(t,z\right)$ is of the form
 $λt,z=λ0texpzTβ,$
where ${\lambda }_{0}$ is the base-line hazard function, an unspecified function of time, and $\beta$ is a vector of unknown parameters. As ${\lambda }_{0}$ is unknown, the parameters $\beta$ are estimated using the conditional or marginal likelihood. This involves considering the covariate values of all subjects that are at risk at the time when a failure occurs. The probability that the subject that failed had their observed set of covariate values is computed.
The risk set at a failure time consists of those subjects that fail or are censored at that time and those who survive beyond that time. As risk sets are computed for every distinct failure time, it should be noted that the combined risk sets may be considerably larger than the original data. If the data can be considered as coming from different strata such that ${\lambda }_{0}$ varies from strata to strata but $\beta$ remains constant, then nag_surviv_risk_sets (g12zac) will return a factor that indicates to which risk set/strata each member of the risk sets belongs rather than just to which risk set.
Given the risk sets the Cox proportional hazards model can then be fitted using a Poisson generalized linear model (nag_glm_poisson (g02gcc) with nag_dummy_vars (g04eac) to compute dummy variables) using Breslow's approximation for ties (see Breslow (1974)). This will give the same fit as nag_surviv_cox_model (g12bac). If the exact treatment of ties in discrete time is required, as given by Cox (1972), then the model is fitted as a conditional logistic model using nag_condl_logistic (g11cac).

## 4References

Breslow N E (1974) Covariate analysis of censored survival data Biometrics 30 89–99
Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B 34 187–220
Gross A J and Clark V A (1975) Survival Distributions: Reliability Applications in the Biomedical Sciences Wiley

## 5Arguments

1:    $\mathbf{order}$Nag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by ${\mathbf{order}}=\mathrm{Nag_RowMajor}$. See Section 3.3.1.3 in How to Use the NAG Library and its Documentation for a more detailed explanation of the use of this argument.
Constraint: ${\mathbf{order}}=\mathrm{Nag_RowMajor}$ or $\mathrm{Nag_ColMajor}$.
2:    $\mathbf{n}$IntegerInput
On entry: $n$, the number of data points.
Constraint: ${\mathbf{n}}\ge 2$.
3:    $\mathbf{m}$IntegerInput
On entry: the number of covariates in array z.
Constraint: ${\mathbf{m}}\ge 1$.
4:    $\mathbf{ns}$IntegerInput
On entry: the number of strata. If ${\mathbf{ns}}>0$ then the stratum for each observation must be supplied in isi.
Constraint: ${\mathbf{ns}}\ge 0$.
5:    $\mathbf{z}\left[\mathit{dim}\right]$const doubleInput
Note: the dimension, dim, of the array z must be at least
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{pdz}}×{\mathbf{m}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{n}}×{\mathbf{pdz}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
The $\left(i,j\right)$th element of the matrix $Z$ is stored in
• ${\mathbf{z}}\left[\left(j-1\right)×{\mathbf{pdz}}+i-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• ${\mathbf{z}}\left[\left(i-1\right)×{\mathbf{pdz}}+j-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
On entry: must contain the $n$ covariates in column or row major order.
6:    $\mathbf{pdz}$IntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array z.
Constraints:
• if ${\mathbf{order}}=\mathrm{Nag_ColMajor}$, ${\mathbf{pdz}}\ge {\mathbf{n}}$;
• if ${\mathbf{order}}=\mathrm{Nag_RowMajor}$, ${\mathbf{pdz}}\ge {\mathbf{m}}$.
7:    $\mathbf{isz}\left[{\mathbf{m}}\right]$const IntegerInput
On entry: indicates which subset of covariates are to be included in the model.
${\mathbf{isz}}\left[j-1\right]\ge 1$
The $j$th covariate is included in the model.
${\mathbf{isz}}\left[j-1\right]=0$
The $j$th covariate is excluded from the model and not referenced.
Constraint: ${\mathbf{isz}}\left[j-1\right]\ge 0$ and at least one value must be nonzero.
8:    $\mathbf{ip}$IntegerInput
On entry: $p$, the number of covariates included in the model as indicated by isz.
Constraint: ${\mathbf{ip}}=\text{}$ the number of nonzero values of isz.
9:    $\mathbf{t}\left[{\mathbf{n}}\right]$const doubleInput
On entry: the vector of $n$ failure censoring times.
10:  $\mathbf{ic}\left[{\mathbf{n}}\right]$const IntegerInput
On entry: the status of the individual at time $t$ given in t.
${\mathbf{ic}}\left[i-1\right]=0$
Indicates that the $i$th individual has failed at time ${\mathbf{t}}\left[i-1\right]$.
${\mathbf{ic}}\left[i-1\right]=1$
Indicates that the $i$th individual has been censored at time ${\mathbf{t}}\left[i-1\right]$.
Constraint: ${\mathbf{ic}}\left[\mathit{i}-1\right]=0$ or $1$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$.
11:  $\mathbf{isi}\left[\mathit{dim}\right]$const IntegerInput
Note: the dimension, dim, of the array isi must be at least
• ${\mathbf{n}}$ when ${\mathbf{ns}}>0$;
• $1$ otherwise.
On entry: if ${\mathbf{ns}}>0$, the stratum indicators which also allow data points to be excluded from the analysis.
If ${\mathbf{ns}}=0$, isi is not referenced.
${\mathbf{isi}}\left[i\right]=k$
Indicates that the $i$th data point is in the $k$th stratum, where $k=1,2,\dots ,{\mathbf{ns}}$.
${\mathbf{isi}}\left[i\right]=0$
Indicates that the $i$th data point is omitted from the analysis.
Constraint: if ${\mathbf{ns}}>0$, $0\le {\mathbf{isi}}\left[\mathit{i}\right]\le {\mathbf{ns}}$, for $\mathit{i}=0,1,\dots ,{\mathbf{n}}-1$.
12:  $\mathbf{num}$Integer *Output
On exit: the number of values in the combined risk sets.
13:  $\mathbf{ixs}\left[{\mathbf{mxn}}\right]$IntegerOutput
On exit: the factor giving the risk sets/strata for the data in x and id.
If ${\mathbf{ns}}=0$ or $1$, ${\mathbf{ixs}}\left[i-1\right]=l$ for members of the $l$th risk set.
If ${\mathbf{ns}}>1$, ${\mathbf{ixs}}\left[i-1\right]=\left(j-1\right)×{\mathbf{nd}}+l$ for the observations in the $l$th risk set for the $j$th strata.
14:  $\mathbf{nxs}$Integer *Output
On exit: the number of levels for the risk sets/strata factor given in ixs.
15:  $\mathbf{x}\left[{\mathbf{mxn}}×{\mathbf{ip}}\right]$doubleOutput
Note: the $\left(i,j\right)$th element of the matrix $X$ is stored in
• ${\mathbf{x}}\left[\left(j-1\right)×{\mathbf{mxn}}+i-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• ${\mathbf{x}}\left[\left(i-1\right)×{\mathbf{ip}}+j-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
On exit: the first num rows contain the values of the covariates for the members of the risk sets.
16:  $\mathbf{mxn}$IntegerInput
On entry: the first dimension of the array x and the dimension of the arrays ixs and id.
Constraint: mxn must be sufficiently large for the arrays to contain the expanded risk sets. The size will depend on the pattern of failures times and censored times. The minimum value will be returned in num unless the function exits with ${\mathbf{fail}}\mathbf{.}\mathbf{code}=$ NE_INT.
17:  $\mathbf{id}\left[{\mathbf{mxn}}\right]$IntegerOutput
On exit: indicates if the member of the risk set given in x failed.
${\mathbf{id}}\left[i-1\right]=1$ if the member of the risk set failed at the time defining the risk set and ${\mathbf{id}}\left[i-1\right]=0$ otherwise.
18:  $\mathbf{nd}$Integer *Output
On exit: the number of distinct failure times, i.e., the number of risk sets.
19:  $\mathbf{tp}\left[{\mathbf{n}}\right]$doubleOutput
On exit: ${\mathbf{tp}}\left[\mathit{i}-1\right]$ contains the $\mathit{i}$th distinct failure time, for $\mathit{i}=1,2,\dots ,{\mathbf{nd}}$.
20:  $\mathbf{irs}\left[{\mathbf{n}}\right]$IntegerOutput
On exit: indicates rows in x and elements in ixs and id corresponding to the risk sets. The first risk set corresponding to failure time ${\mathbf{tp}}\left[0\right]$ is given by rows $1$ to ${\mathbf{irs}}\left[0\right]$. The $\mathit{l}$th risk set is given by rows ${\mathbf{irs}}\left[\mathit{l}-2\right]+1$ to ${\mathbf{irs}}\left[\mathit{l}-1\right]$, for $\mathit{l}=1,2,\dots ,{\mathbf{nd}}$.
21:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

## 6Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_INT
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{ic}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ic}}\left[i-1\right]=0$ or $1$.
On entry, $i=〈\mathit{\text{value}}〉$, ${\mathbf{isi}}\left[i-1\right]=〈\mathit{\text{value}}〉$ and ${\mathbf{ns}}=〈\mathit{\text{value}}〉$.
Constraint: $0\le {\mathbf{isi}}\left[i-1\right]\le {\mathbf{ns}}$.
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{isz}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{isz}}\left[j-1\right]\ge 0$.
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\ge 1$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 2$.
On entry, ${\mathbf{ns}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ns}}\ge 0$.
On entry, ${\mathbf{pdz}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdz}}>0$.
NE_INT_2
On entry, ${\mathbf{pdz}}=〈\mathit{\text{value}}〉$ and ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdz}}\ge {\mathbf{m}}$.
NE_INT_ARRAY_ELEM_CONS
On entry, ${\mathbf{mxn}}=〈\mathit{\text{value}}〉$ and minimum value for ${\mathbf{mxn}}=〈\mathit{\text{value}}〉$.
Constraint: mxn must be sufficiently large for the arrays to contain the expanded risk set.
On entry, there are not ip values of ${\mathbf{isz}}>0$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.

Not applicable.

## 8Parallelism and Performance

nag_surviv_risk_sets (g12zac) is not threaded in any implementation.

When there are strata present, i.e., ${\mathbf{ns}}>1$, not all the nxs groups may be present.

## 10Example

The data are the remission times for two groups of leukemia patients (see page 242 of Gross and Clark (1975)). A dummy variable indicates which group they come from. The risk sets are computed using nag_surviv_risk_sets (g12zac) and the Cox's proportional hazard model is fitted using nag_condl_logistic (g11cac).

### 10.1Program Text

Program Text (g12zace.c)

### 10.2Program Data

Program Data (g12zace.d)

### 10.3Program Results

Program Results (g12zace.r)