Source code for naginterfaces.library.surviv

# -*- coding: utf-8 -*-
r"""
Module Summary
--------------
Interfaces for the NAG Mark 28.7 `surviv` Chapter.

``surviv`` - Survival Analysis

This module is concerned with statistical techniques used in the analysis of survival/reliability/failure time data.

Other modules contain functions which are also used to analyse this type of data. Submodule :mod:`~naginterfaces.library.correg` contains generalized linear models, submodule :mod:`~naginterfaces.library.univar` contains functions to fit distribution models, and submodule :mod:`~naginterfaces.library.nonpar` contains rank based methods.

See Also
--------
``naginterfaces.library.examples.surviv`` :
    This subpackage contains examples for the ``surviv`` module.
    See also the :ref:`library_surviv_ex` subsection.

Functionality Index
-------------------

**Cox's proportional hazard model**

  create the risk sets: :meth:`coxmodel_risksets`

  parameter estimates and other statistics: :meth:`coxmodel`

**Survival**

  Rank statistics: :meth:`logrank`

Survivor function: :meth:`kaplanmeier`

For full information please refer to the NAG Library document

https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12intro.html
"""

# NAG Copyright 2017-2022.

from ..library import machine

[docs]def kaplanmeier(t, ic, freq, ifreq): r""" ``kaplanmeier`` computes the Kaplan--Meier, (or product-limit), estimates of survival probabilities for a sample of failure times. .. _g12aa-py2-py-doc: For full information please refer to the NAG Library document for g12aa https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12aaf.html .. _g12aa-py2-py-parameters: **Parameters** **t** : float, array-like, shape :math:`\left(n\right)` The failure and censored times; these need not be ordered. **ic** : int, array-like, shape :math:`\left(n\right)` :math:`\mathrm{ic}[\textit{i}-1]` contains the censoring code of the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. :math:`\mathrm{ic}[i-1] = 0` The :math:`i`\ th observation is a failure time. :math:`\mathrm{ic}[i-1] = 1` The :math:`i`\ th observation is right-censored. **freq** : str, length 1 Indicates whether frequencies are provided for each time point. :math:`\mathrm{freq} = \texttt{'F'}` Frequencies are provided for each failure and censored time. :math:`\mathrm{freq} = \texttt{'S'}` The failure and censored times are considered as single observations, i.e., a frequency of :math:`1` is assumed. **ifreq** : int, array-like, shape :math:`\left(:\right)` Note: the required length for this argument is determined as follows: if :math:`\mathrm{freq}=\texttt{'F'}`: :math:`n`; if :math:`\mathrm{freq}=\texttt{'S'}`: :math:`1`; otherwise: :math:`0`. If :math:`\mathrm{freq} = \texttt{'F'}`, :math:`\mathrm{ifreq}[i]` must contain the frequency of the :math:`i`\ th observation. If :math:`\mathrm{ifreq} = \texttt{'S'}`, a frequency of :math:`1` is assumed and :math:`\mathrm{ifreq}` is not referenced. **Returns** **nd** : int The number of distinct failure times, :math:`n_d`. **tp** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{tp}[\textit{i}-1]` contains the :math:`\textit{i}`\ th ordered distinct failure time, :math:`t_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n_{\mathrm{d}}`. **p** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{p}[\textit{i}-1]` contains the Kaplan--Meier estimate of the survival probability, :math:`\hat{S}\left(t\right)`, for time :math:`\mathrm{tp}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,n_d`. **psig** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{psig}[\textit{i}-1]` contains an estimate of the standard deviation of :math:`\mathrm{p}[\textit{i}-1]`, for :math:`\textit{i} = 1,2,\ldots,n_d`. .. _g12aa-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n \geq 2`. (`errno` :math:`2`) On entry, :math:`\mathrm{freq} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{freq} = \texttt{'F'}` or :math:`\texttt{'S'}`. (`errno` :math:`3`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ic}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ic}[i-1] = 0` or :math:`1`. (`errno` :math:`4`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ifreq}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ifreq}[i-1] \geq 0`. .. _g12aa-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` A survivor function, :math:`S\left(t\right)`, is the probability of surviving to at least time :math:`t` with :math:`S\left(t\right) = 1-F\left(t\right)`, where :math:`F\left(t\right)` is the cumulative distribution function of the failure times. The Kaplan--Meier or product limit estimator provides an estimate of :math:`S\left(t\right)`, :math:`\hat{S}\left(t\right)`, from sample of failure times which may be progressively right-censored. Let :math:`t_i`, :math:`i = 1,2,\ldots,n_d`, be the ordered distinct failure times for the sample of observed failure/censored times, and let the number of observations in the sample that have not failed by time :math:`t_i` be :math:`n_i`. If a failure and a loss (censored observation) occur at the same time :math:`t_i`, then the failure is treated as if it had occurred slightly before time :math:`t_i` and the loss as if it had occurred slightly after :math:`t_i`. The Kaplan--Meier estimate of the survival probabilities is a step function which in the interval :math:`t_i` to :math:`t_{{i+1}}` is given by .. math:: \hat{S}\left(t\right) = \prod_{{j = 1}}^i\left(\frac{{n_j-d_j}}{n_j}\right)\text{,} where :math:`d_j` is the number of failures occurring at time :math:`t_j`. ``kaplanmeier`` computes the Kaplan--Meier estimates and the corresponding estimates of the variances, :math:`\hat{\text{var}}\left(\hat{S}\left(t\right)\right)`, using Greenwood's formula, .. math:: \hat{\mathrm{var}}\left(\hat{S}\left(t\right)\right) = \hat{S}\left(t\right)^2\sum_{{j = 1}}^i\frac{d_j}{{n_j\left(n_j-d_j\right)}}\text{.} .. _g12aa-py2-py-references: **References** Gross, A J and Clark, V A, 1975, `Survival Distributions: Reliability Applications in the Biomedical Sciences`, Wiley Kalbfleisch, J D and Prentice, R L, 1980, `The Statistical Analysis of Failure Time Data`, Wiley """ raise NotImplementedError
[docs]def logrank(t, ic, grp, ngrp, ifreq=None, wt=None, ldn=None): r""" ``logrank`` calculates the rank statistics, which can include the logrank test, for comparing survival curves. .. _g12ab-py2-py-doc: For full information please refer to the NAG Library document for g12ab https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12abf.html .. _g12ab-py2-py-parameters: **Parameters** **t** : float, array-like, shape :math:`\left(n\right)` The observed failure and censored times; these need not be ordered. **ic** : int, array-like, shape :math:`\left(n\right)` :math:`\mathrm{ic}[\textit{i}-1]` contains the censoring code of the :math:`\textit{i}`\ th observation, for :math:`\textit{i} = 1,2,\ldots,n`. :math:`\mathrm{ic}[i-1] = 0` the :math:`i`\ th observation is a failure time. :math:`\mathrm{ic}[i-1] = 1` the :math:`i`\ th observation is right-censored. **grp** : int, array-like, shape :math:`\left(n\right)` :math:`\mathrm{grp}[\textit{i}-1]` contains a flag indicating which group the :math:`\textit{i}`\ th observation belongs in, for :math:`\textit{i} = 1,2,\ldots,n`. **ngrp** : int :math:`g`, the number of groups. **ifreq** : None or int, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{ifreq}\text{ is not }\mathbf{None}`: :math:`n`; otherwise: :math:`0`. Optionally, the frequency (number of observations) that each entry in :math:`\mathrm{t}` corresponds to. If :math:`\mathrm{ifreq}\text{ is }\mathbf{None}` then each entry in :math:`\mathrm{t}` is assumed to correspond to a single observation, i.e., a frequency of :math:`1` is assumed. **wt** : None or float, array-like, shape :math:`\left(:\right)`, optional Note: the required length for this argument is determined as follows: if :math:`\mathrm{wt}\text{ is not }\mathbf{None}`: :math:`\mathrm{ldn}`; otherwise: :math:`0`. Optionally, the :math:`n_d` weights, :math:`w_i`, where :math:`n_d` is the number of distinct failure times. If :math:`\mathrm{wt}\text{ is }\mathbf{None}` then :math:`w_i = 1` for all :math:`i`. **ldn** : None or int, optional Note: if this argument is **None** then a default value will be used, determined as follows: :math:`n`. The size of arrays :math:`\mathrm{di}` and :math:`\mathrm{ni}`. As :math:`n_d\leq n`, if :math:`n_d` is not known `a priori` then a value of :math:`\textit{n}` can safely be used for :math:`\mathrm{ldn}`. **Returns** **ts** : float :math:`T`, the test statistic. **df** : int :math:`\nu`, the degrees of freedom. **p** : float :math:`P\left(X\geq T\right)`, when :math:`X\sim \chi_{\nu }^2`, i.e., the probability associated with :math:`\mathrm{ts}`. **obsd** : float, ndarray, shape :math:`\left(\mathrm{ngrp}\right)` :math:`O_i`, the observed number of failures in each group. **expt** : float, ndarray, shape :math:`\left(\mathrm{ngrp}\right)` :math:`E_i`, the expected number of failures in each group. **nd** : int :math:`n_d`, the number of distinct failure times. **di** : int, ndarray, shape :math:`\left(\mathrm{ldn}\right)` The first :math:`\mathrm{nd}` elements of :math:`\mathrm{di}` contain :math:`d_i`, the number of failures, across all groups, at time :math:`t_i`. **ni** : int, ndarray, shape :math:`\left(\mathrm{ldn}\right)` The first :math:`\mathrm{nd}` elements of :math:`\mathrm{ni}` contain :math:`n_i`, the size of the risk set, across all groups, at time :math:`t_i`. .. _g12ab-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 2`. (`errno` :math:`2`) On entry, all the times in :math:`\mathrm{t}` are the same. (`errno` :math:`3`) On entry, :math:`\mathrm{ic}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ic}[i-1] = 0` or :math:`1`. (`errno` :math:`4`) On entry, :math:`\mathrm{grp}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ngrp} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`1\leq \mathrm{grp}[i-1]\leq \mathrm{ngrp}`. (`errno` :math:`5`) On entry, :math:`\mathrm{ngrp} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`2\leq \mathrm{ngrp}\leq n`. (`errno` :math:`6`) On entry, :math:`\textit{lfreq} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lfreq} = 0` or :math:`\textit{lfreq} \geq n`. (`errno` :math:`7`) On entry, :math:`\mathrm{ifreq}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ifreq}[i-1]\geq 0`. (`errno` :math:`8`) On entry, :math:`\textit{lwt} = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ldn} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{lwt} = 0` or :math:`\textit{lwt} \geq \mathrm{ldn}`. (`errno` :math:`9`) On entry, :math:`\mathrm{wt}[\langle\mathit{\boldsymbol{value}}\rangle] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{wt}[i-1]\geq 0.0`. (`errno` :math:`11`) The degrees of freedom are zero. (`errno` :math:`18`) On entry, :math:`\mathrm{ldn} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ldn}\geq \langle\mathit{\boldsymbol{value}}\rangle`. (`errno` :math:`31`) On entry, all observations are censored. (`errno` :math:`41`) On entry, group :math:`\langle\mathit{\boldsymbol{value}}\rangle` has no observations. .. _g12ab-py2-py-notes: **Notes** A survivor function, :math:`S\left(t\right)`, is the probability of surviving to at least time :math:`t`. Given a series of :math:`n` failure or right-censored times from :math:`g` groups ``logrank`` calculates a rank statistic for testing the null hypothesis :math:`H_0:S_1\left(t\right) = S_2\left(t\right) = \cdots = S_g\left(t\right),∀t\leq \tau` where :math:`\tau` is the largest observed time, against the alternative hypothesis :math:`H_1:` at least one of the :math:`S_i\left(t\right)` differ, for some :math:`t\leq \tau`. Let :math:`t_{{\textit{i}}}`, for :math:`\textit{i} = 1,2,\ldots,n_d`, denote the list of distinct failure times across all :math:`g` groups and :math:`w_i` a series of :math:`n_d` weights. Let :math:`d_{{ij}}` denote the number of failures at time :math:`t_i` in group :math:`j` and :math:`n_{{ij}}` denote the number of observations in the group :math:`j` that are known to have not failed prior to time :math:`t_i`, i.e., the size of the risk set for group :math:`j` at time :math:`t_i`. If a censored observation occurs at time :math:`t_i` then that observation is treated as if the censoring had occurred slightly after :math:`t_i` and, therefore, the observation is counted as being part of the risk set at time :math:`t_i`. Finally let .. math:: d_i = \sum_{1}^{g}{d_{{ij}}}\quad \text{ and }\quad n_i = \sum_{1}^{g}{n_{{ij}}}\text{.} The (weighted) number of observed failures in the :math:`j`\ th group, :math:`O_j`, is, therefore, given by .. math:: O_j = \sum_{1}^{n_d}{w_id_{{ij}}} and the (weighted) number of expected failures in the :math:`j`\ th group, :math:`E_j`, by .. math:: E_j = \sum_{1}^{n_d}{w_i\frac{{n_{{ij}}d_i}}{n_i}}\text{.} If :math:`x` denotes the vector of differences :math:`x = \left(O_1-E_1,O_2-E_2,\ldots,O_g-E_g\right)` and .. math:: V_{{jk}} = \sum_{1}^{n_d}{w_i^2\left(\frac{{d_i\left(n_i-d_i\right)\left(n_in_{{ik}}I_{{jk}}-n_{{ij}}n_{{ik}}\right)}}{{n_i^2\left(n_i-1\right)}}\right)} where :math:`I_{{jk}} = 1` if :math:`j = k` and :math:`0` otherwise, then the rank statistic, :math:`T`, is calculated as .. math:: T = xV^-x^T where :math:`V^-` denotes a generalized inverse of the matrix :math:`V`. Under the null hypothesis, :math:`T\sim \chi_{\nu }^2` where the degrees of freedom, :math:`\nu`, is taken as the rank of the matrix :math:`V`. .. _g12ab-py2-py-references: **References** Gross, A J and Clark, V A, 1975, `Survival Distributions: Reliability Applications in the Biomedical Sciences`, Wiley Kalbfleisch, J D and Prentice, R L, 1980, `The Statistical Analysis of Failure Time Data`, Wiley Rostomily, R C, Duong, D, McCormick, K, Bland, M and Berger, M S, 1994, `Multimodality management of recurrent adult malignant gliomas: results of a phase II multiagent chemotherapy study and analysis of cytoreductive surgery`, Neurosurgery (35), 378 """ raise NotImplementedError
[docs]def coxmodel(ns, z, isz, t, ic, isi, b, ndmax, omega=None, tol=100*machine.precision(), maxit=1000, iprint=0, io_manager=None): r""" ``coxmodel`` returns parameter estimates and other statistics that are associated with the Cox proportional hazards model for fixed covariates. .. _g12ba-py2-py-doc: For full information please refer to the NAG Library document for g12ba https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html .. _g12ba-py2-py-parameters: **Parameters** **ns** : int The number of strata. If :math:`\mathrm{ns} > 0` then the stratum for each observation must be supplied in :math:`\mathrm{isi}`. **z** : float, array-like, shape :math:`\left(n, m\right)` The :math:`i`\ th row must contain the covariates which are associated with the :math:`i`\ th failure time given in :math:`\mathrm{t}`. **isz** : int, array-like, shape :math:`\left(m\right)` Indicates which subset of covariates is to be included in the model. :math:`\mathrm{isz}[j-1]\geq 1` The :math:`j`\ th covariate is included in the model. :math:`\mathrm{isz}[j-1] = 0` The :math:`j`\ th covariate is excluded from the model and not referenced. **t** : float, array-like, shape :math:`\left(n\right)` The vector of :math:`n` failure censoring times. **ic** : int, array-like, shape :math:`\left(n\right)` The status of the individual at time :math:`t` given in :math:`\mathrm{t}`. :math:`\mathrm{ic}[i-1] = 0` The :math:`i`\ th individual has failed at time :math:`\mathrm{t}[i-1]`. :math:`\mathrm{ic}[i-1] = 1` The :math:`i`\ th individual has been censored at time :math:`\mathrm{t}[i-1]`. **isi** : int, array-like, shape :math:`\left(:\right)` Note: the required length for this argument is determined as follows: if :math:`\mathrm{ns} > 0`: :math:`n`; otherwise: :math:`1`. If :math:`\mathrm{ns} > 0`, the stratum indicators which also allow data points to be excluded from the analysis. If :math:`\mathrm{ns} = 0`, :math:`\mathrm{isi}` is not referenced. :math:`\mathrm{isi}[i-1] = k` The :math:`i`\ th data point is in the :math:`k`\ th stratum, where :math:`k = 1,2,\ldots,\mathrm{ns}`. :math:`\mathrm{isi}[i-1] = 0` The :math:`i`\ th data point is omitted from the analysis. **b** : float, array-like, shape :math:`\left(\textit{ip}\right)` Initial estimates of the covariate coefficient parameters :math:`\beta`. :math:`\mathrm{b}[j-1]` must contain the initial estimate of the coefficient of the covariate in :math:`\mathrm{z}` corresponding to the :math:`j`\ th nonzero value of :math:`\mathrm{isz}`. `Suggested value`: in many cases an initial value of zero for :math:`\mathrm{b}[j-1]` may be used. For other suggestions see `Further Comments <https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html#fcomments>`__. **ndmax** : int The dimension of the array :math:`\mathrm{tp}`. The first dimension of the array :math:`\mathrm{sur}`. **omega** : None or float, array-like, shape :math:`\left(n\right)`, optional If :math:`\mathrm{omega}\text{ is not }\mathbf{None}`, the offset, :math:`\omega_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`. Otherwise :math:`\mathrm{omega}` is not referenced. **tol** : float, optional Indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than :math:`\mathrm{tol}\times \left(1.0+\mathrm{CurrentDeviance}\right)`. This corresponds approximately to an absolute precision if the deviance is small and a relative precision if the deviance is large. **maxit** : int, optional The maximum number of iterations to be used for computing the estimates. If :math:`\mathrm{maxit}` is set to :math:`0` then the standard errors, score functions, variance-covariance matrix and the survival function are computed for the input value of :math:`\beta` in :math:`\mathrm{b}` but :math:`\beta` is not updated. **iprint** : int, optional Indicates if the printing of information on the iterations is required. :math:`\mathrm{iprint}\leq 0` No printing. :math:`\mathrm{iprint}\geq 1` The deviance and the current estimates are printed every :math:`\mathrm{iprint}` iterations. When printing occurs the output is directed to the file object associated with the advisory I/O unit (see :class:`~naginterfaces.base.utils.FileObjManager`). **io_manager** : FileObjManager, optional Manager for I/O in this routine. **Returns** **dev** : float The deviance, that is :math:`-2\times \text{}`\ (maximized log marginal likelihood). **b** : float, ndarray, shape :math:`\left(\textit{ip}\right)` :math:`\mathrm{b}[j-1]` contains the estimate :math:`\hat{\beta }_i`, the coefficient of the covariate stored in the :math:`i`\ th column of :math:`\mathrm{z}` where :math:`i` is the :math:`j`\ th nonzero value in the array :math:`\mathrm{isz}`. **se** : float, ndarray, shape :math:`\left(\textit{ip}\right)` :math:`\mathrm{se}[\textit{j}-1]` is the asymptotic standard error of the estimate contained in :math:`\mathrm{b}[\textit{j}-1]` and score function in :math:`\mathrm{sc}[\textit{j}-1]`, for :math:`\textit{j} = 1,2,\ldots,\textit{ip}`. **sc** : float, ndarray, shape :math:`\left(\textit{ip}\right)` :math:`\mathrm{sc}[j-1]` is the value of the score function, :math:`U_j\left(\beta \right)`, for the estimate contained in :math:`\mathrm{b}[j-1]`. **cov** : float, ndarray, shape :math:`\left(\textit{ip}\times \left(\textit{ip}+1\right)/2\right)` The variance-covariance matrix of the parameter estimates in :math:`\mathrm{b}` stored in packed form by column, i.e., the covariance between the parameter estimates given in :math:`\mathrm{b}[i-1]` and :math:`\mathrm{b}[j-1]`, :math:`j\geq i`, is stored in :math:`\mathrm{cov}[j\left(j-1\right)/2+i-1]`. **res** : float, ndarray, shape :math:`\left(n\right)` The residuals, :math:`r\left(t_{\textit{l}}\right)`, for :math:`\textit{l} = 1,2,\ldots,n`. **nd** : int The number of distinct failure times. **tp** : float, ndarray, shape :math:`\left(\mathrm{ndmax}\right)` :math:`\mathrm{tp}[\textit{i}-1]` contains the :math:`\textit{i}`\ th distinct failure time, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nd}`. **sur** : float, ndarray, shape :math:`\left(\mathrm{ndmax}, \max\left(\mathrm{ns},1\right)\right)` If :math:`\mathrm{ns} = 0`, :math:`\mathrm{sur}[i-1,0]` contains the estimated survival function for the :math:`i`\ th distinct failure time. If :math:`\mathrm{ns} > 0`, :math:`\mathrm{sur}[i-1,k-1]` contains the estimated survival function for the :math:`i`\ th distinct failure time in the :math:`k`\ th stratum. .. _g12ba-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\textit{ip} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\textit{ip}\geq 1`. (`errno` :math:`1`) On entry, :math:`\mathrm{tol} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{tol}\geq 10\times \text{machine precision}`. (`errno` :math:`1`) On entry, :math:`\mathrm{maxit} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{maxit} \geq 0`. (`errno` :math:`1`) On entry, :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ns} \geq 0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 2`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m\geq 1`. (`errno` :math:`2`) On entry, there are not :math:`\textit{ip}` values of :math:`\mathrm{isz} > 0`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{isz}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isz}[i-1]\geq 0`. (`errno` :math:`2`) On entry too few observations included in model. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{isi}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{isi}[i-1]\leq \mathrm{ns}`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ic}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ic}[i-1] = 0` or :math:`1`. (`errno` :math:`2`) On entry, :math:`\mathrm{ndmax} = \langle\mathit{\boldsymbol{value}}\rangle` and minimum value for :math:`\mathrm{ndmax} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ndmax}\geq \text{number}` of distinct failure times. (`errno` :math:`2`) All observations are censored. (`errno` :math:`3`) The matrix of second partial derivative is singular. (`errno` :math:`4`) Overflow has been detected in the calculations. **Warns** **NagAlgorithmicWarning** (`errno` :math:`5`) Convergence not achieved in :math:`\langle\mathit{\boldsymbol{value}}\rangle` iterations. (`errno` :math:`6`) Too many step halvings required. .. _g12ba-py2-py-notes: **Notes** `In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.` The proportional hazard model relates the time to an event, usually death or failure, to a number of explanatory variables known as covariates. Some of the observations may be right-censored, that is the exact time to failure is not known, only that it is greater than a known time. Let :math:`t_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, be the failure time or censored time for the :math:`i`\ th observation with the vector of :math:`p` covariates :math:`z_i`. It is assumed that censoring and failure mechanisms are independent. The hazard function, :math:`\lambda \left(t, z\right)`, is the probability that an individual with covariates :math:`z` fails at time :math:`t` given that the individual survived up to time :math:`t`. In the Cox proportional hazards model (see Cox (1972)) :math:`\lambda \left(t, z\right)` is of the form: .. math:: \lambda \left(t, z\right) = \lambda_0\left(t\right)\mathrm{exp}\left(z^\mathrm{T}\beta +\omega \right) where :math:`\lambda_0` is the base-line hazard function, an unspecified function of time, :math:`\beta` is a vector of unknown parameters and :math:`\omega` is a known offset. Assuming there are ties in the failure times giving :math:`n_d < n` distinct failure times, :math:`t_{\left(1\right)} < \cdots < t_{{\left(n_d\right)}}` such that :math:`d_i` individuals fail at :math:`t_{\left(i\right)}`, it follows that the marginal likelihood for :math:`\beta` is well approximated (see Kalbfleisch and Prentice (1980)) by: .. math:: L = \prod_{{i = 1}}^{n_d}\frac{{\mathrm{exp}\left(s_i^\mathrm{T}\beta +\omega_i\right)}}{{{\left[\sum_{{l \in R\left(t_{\left(i\right)}\right)}}{\mathrm{exp}\left(z_l^\mathrm{T}\beta +\omega_l\right)}\right]}^{d_i}}} where :math:`s_i` is the sum of the covariates of individuals observed to fail at :math:`t_{\left(i\right)}` and :math:`R\left(t_{\left(i\right)}\right)` is the set of individuals at risk just prior to :math:`t_{\left(i\right)}`, that is, it is all individuals that fail or are censored at time :math:`t_{\left(i\right)}` along with all individuals that survive beyond time :math:`t_{\left(i\right)}`. The maximum likelihood estimates (MLEs) of :math:`\beta`, given by :math:`\hat{\beta }`, are obtained by maximizing `(1) <https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html#eqn1>`__ using a Newton--Raphson iteration technique that includes step halving and utilizes the first and second partial derivatives of `(1) <https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html#eqn1>`__ which are given by equations `(2) <https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html#eqn2>`__ and `(3) <https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html#eqn3>`__ below: .. math:: U_j\left(\beta \right) = \frac{{\partial \mathrm{ln}\left(L\right)}}{{\partial \beta_j}} = \sum_{{i = 1}}^{n_d}\left[s_{{ji}}-d_i\alpha_{{ji}}\left(\beta \right)\right] = 0 for :math:`j = 1,2,\ldots,p`, where :math:`s_{{ji}}` is the :math:`j`\ th element in the vector :math:`s_i` and .. math:: \alpha_{{ji}}\left(\beta \right) = \frac{{\sum_{{l \in R\left(t_{\left(i\right)}\right)}}z_{{jl}}{\mathrm{exp}\left(z_l^\mathrm{T}\beta +\omega_l\right)}}}{{\sum_{{l \in R\left(t_{\left(i\right)}\right)}}{\mathrm{exp}\left(z_l^\mathrm{T}\beta +\omega_l\right)}}}\text{.} Similarly, .. math:: I_{{hj}}\left(\beta \right) = -\frac{{\partial^2\mathrm{ln}\left(L\right)}}{{\partial \beta_h\partial \beta_j}} = \sum_{{i = 1}}^{n_d}d_i\gamma_{{hji}} where .. math:: \gamma_{{hji}} = \frac{{\sum_{{l \in R\left(t_{\left(i\right)}\right)}}z_{{hl}}z_{{jl}}\mathrm{exp}\left(z_l^\mathrm{T}\beta +\omega_l\right)}}{{\sum_{{l \in R\left(t_{\left(i\right)}\right)}}\mathrm{exp}\left(z_l^\mathrm{T}\beta +\omega_l\right)}}-\alpha_{\mathrm{hi}}\left(\beta \right)\alpha_{{ji}}\left(\beta \right)\text{, }\quad h,j = 1,\ldots,p\text{.} :math:`U_j\left(\beta \right)` is the :math:`j`\ th component of a score vector and :math:`I_{{hj}}\left(\beta \right)` is the :math:`\left(h, j\right)` element of the observed information matrix :math:`I\left(\beta \right)` whose inverse :math:`I\left(\beta \right)^{-1} = {\left[I_{{hj}}\left(\beta \right)\right]}^{-1}` gives the variance-covariance matrix of :math:`\beta`. It should be noted that if a covariate or a linear combination of covariates is monotonically increasing or decreasing with time then one or more of the :math:`\beta_j`'s will be infinite. If :math:`\lambda_0\left(t\right)` varies across :math:`\nu` strata, where the number of individuals in the :math:`\textit{k}`\ th stratum is :math:`n_{\textit{k}}`, for :math:`\textit{k} = 1,2,\ldots,\nu` with :math:`n = \sum_{{k = 1}}^{\nu }n_k`, then rather than maximizing `(1) <https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html#eqn1>`__ to obtain :math:`\hat{\beta }`, the following marginal likelihood is maximized: .. math:: L = \prod_{{k = 1}}^{\nu }L_k\text{,} where :math:`L_k` is the contribution to likelihood for the :math:`n_k` observations in the :math:`k`\ th stratum treated as a single sample in `(1) <https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12baf.html#eqn1>`__. When strata are included the covariate coefficients are constant across strata but there is a different base-line hazard function :math:`\lambda_0`. The base-line survivor function associated with a failure time :math:`t_{\left(i\right)}`, is estimated as :math:`\mathrm{exp}\left(-\hat{H}\left(t_{\left(i\right)}\right)\right)`, where .. math:: \hat{H}\left(t_{\left(i\right)}\right) = \sum_{{t_{\left(j\right)}\leq t_{\left(i\right)}}}\left(\frac{d_i}{{\sum_{{l \in R\left(t_{\left(j\right)}\right)}}{\mathrm{exp}\left(z_l^\mathrm{T}\hat{\beta }+\omega_l\right)}}}\right)\text{,} where :math:`d_i` is the number of failures at time :math:`t_{\left(i\right)}`. The residual for the :math:`l`\ th observation is computed as: .. math:: r\left(t_l\right) = \hat{H}\left(t_l\right)\mathrm{exp}\left(z_l^\mathrm{T}\hat{\beta }+\omega_l\right) where :math:`\hat{H}\left(t_l\right) = \hat{H}\left(t_{\left(i\right)}\right),t_{\left(i\right)}\leq t_l < t_{\left(i+1\right)}`. The deviance is defined as :math:`-2\times \text{}`\ (logarithm of marginal likelihood). There are two ways to test whether individual covariates are significant: the differences between the deviances of nested models can be compared with the appropriate :math:`\chi^2`-distribution; or, the asymptotic normality of the parameter estimates can be used to form :math:`z` tests by dividing the estimates by their standard errors or the score function for the model under the null hypothesis can be used to form :math:`z` tests. .. _g12ba-py2-py-references: **References** Cox, D R, 1972, `Regression models in life tables (with discussion)`, J. Roy. Statist. Soc. Ser. B (34), 187--220 Gross, A J and Clark, V A, 1975, `Survival Distributions: Reliability Applications in the Biomedical Sciences`, Wiley Kalbfleisch, J D and Prentice, R L, 1980, `The Statistical Analysis of Failure Time Data`, Wiley See Also -------- :meth:`naginterfaces.library.examples.surviv.coxmodel_ex.main` """ raise NotImplementedError
[docs]def coxmodel_risksets(ns, z, isz, ip, t, ic, isi, mxn): r""" ``coxmodel_risksets`` creates the risk sets associated with the Cox proportional hazards model for fixed covariates. .. _g12za-py2-py-doc: For full information please refer to the NAG Library document for g12za https://www.nag.com/numeric/nl/nagdoc_28.7/flhtml/g12/g12zaf.html .. _g12za-py2-py-parameters: **Parameters** **ns** : int The number of strata. If :math:`\mathrm{ns} > 0` then the stratum for each observation must be supplied in :math:`\mathrm{isi}`. **z** : float, array-like, shape :math:`\left(n, m\right)` The :math:`i`\ th row must contain the covariates which are associated with the :math:`i`\ th failure time given in :math:`\mathrm{t}`. **isz** : int, array-like, shape :math:`\left(m\right)` Indicates which subset of covariates are to be included in the model. :math:`\mathrm{isz}[j-1]\geq 1` The :math:`j`\ th covariate is included in the model. :math:`\mathrm{isz}[j-1] = 0` The :math:`j`\ th covariate is excluded from the model and not referenced. **ip** : int :math:`p`, the number of covariates included in the model as indicated by :math:`\mathrm{isz}`. **t** : float, array-like, shape :math:`\left(n\right)` The vector of :math:`n` failure censoring times. **ic** : int, array-like, shape :math:`\left(n\right)` The status of the individual at time :math:`t` given in :math:`\mathrm{t}`. :math:`\mathrm{ic}[i-1] = 0` Indicates that the :math:`i`\ th individual has failed at time :math:`\mathrm{t}[i-1]`. :math:`\mathrm{ic}[i-1] = 1` Indicates that the :math:`i`\ th individual has been censored at time :math:`\mathrm{t}[i-1]`. **isi** : int, array-like, shape :math:`\left(:\right)` Note: the required length for this argument is determined as follows: if :math:`\mathrm{ns} > 0`: :math:`n`; otherwise: :math:`1`. If :math:`\mathrm{ns} > 0`, the stratum indicators which also allow data points to be excluded from the analysis. If :math:`\mathrm{ns} = 0`, :math:`\mathrm{isi}` is not referenced. :math:`\mathrm{isi}[i] = k` Indicates that the :math:`i`\ th data point is in the :math:`k`\ th stratum, where :math:`k = 1,2,\ldots,\mathrm{ns}`. :math:`\mathrm{isi}[i] = 0` Indicates that the :math:`i`\ th data point is omitted from the analysis. **mxn** : int The first dimension of the array :math:`\mathrm{x}`. The dimension of the arrays :math:`\mathrm{ixs}` and :math:`\mathrm{fid}`. **Returns** **num** : int The number of values in the combined risk sets. **ixs** : int, ndarray, shape :math:`\left(\mathrm{mxn}\right)` The factor giving the risk sets/strata for the data in :math:`\mathrm{x}` and :math:`\mathrm{fid}`. If :math:`\mathrm{ns} = 0` or :math:`1`, :math:`\mathrm{ixs}[i-1] = l` for members of the :math:`l`\ th risk set. If :math:`\mathrm{ns} > 1`, :math:`\mathrm{ixs}[i-1] = \left(j-1\right)\times \mathrm{nd}+l` for the observations in the :math:`l`\ th risk set for the :math:`j`\ th strata. **nxs** : int The number of levels for the risk sets/strata factor given in :math:`\mathrm{ixs}`. **x** : float, ndarray, shape :math:`\left(\mathrm{num}, \mathrm{ip}\right)` The first :math:`\mathrm{num}` rows contain the values of the covariates for the members of the risk sets. **fid** : int, ndarray, shape :math:`\left(\mathrm{mxn}\right)` Indicates if the member of the risk set given in :math:`\mathrm{x}` failed. :math:`\mathrm{fid}[i-1] = 1` if the member of the risk set failed at the time defining the risk set and :math:`\mathrm{fid}[i-1] = 0` otherwise. **nd** : int The number of distinct failure times, i.e., the number of risk sets. **tp** : float, ndarray, shape :math:`\left(n\right)` :math:`\mathrm{tp}[\textit{i}-1]` contains the :math:`\textit{i}`\ th distinct failure time, for :math:`\textit{i} = 1,2,\ldots,\mathrm{nd}`. **irs** : int, ndarray, shape :math:`\left(n\right)` Indicates rows in :math:`\mathrm{x}` and elements in :math:`\mathrm{ixs}` and :math:`\mathrm{fid}` corresponding to the risk sets. The first risk set corresponding to failure time :math:`\mathrm{tp}[0]` is given by rows :math:`1` to :math:`\mathrm{irs}[0]`. The :math:`\textit{l}`\ th risk set is given by rows :math:`\mathrm{irs}[\textit{l}-2]+1` to :math:`\mathrm{irs}[\textit{l}-1]`, for :math:`\textit{l} = 1,2,\ldots,\mathrm{nd}`. .. _g12za-py2-py-errors: **Raises** **NagValueError** (`errno` :math:`1`) On entry, :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ns} \geq 0`. (`errno` :math:`1`) On entry, :math:`n = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`n\geq 2`. (`errno` :math:`1`) On entry, :math:`m = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`m\geq 1`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle`, :math:`\mathrm{isi}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ns} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`0\leq \mathrm{isi}[i-1]\leq \mathrm{ns}`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{ic}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{ic}[i-1] = 0` or :math:`1`. (`errno` :math:`2`) On entry, there are not :math:`\mathrm{ip}` values of :math:`\mathrm{isz} > 0`. (`errno` :math:`2`) On entry, :math:`i = \langle\mathit{\boldsymbol{value}}\rangle` and :math:`\mathrm{isz}[i-1] = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{isz}[j-1] \geq 0`. (`errno` :math:`3`) On entry, :math:`\mathrm{mxn} = \langle\mathit{\boldsymbol{value}}\rangle` and minimum value for :math:`\mathrm{mxn} = \langle\mathit{\boldsymbol{value}}\rangle`. Constraint: :math:`\mathrm{mxn}` must be sufficiently large for the arrays to contain the expanded risk set. .. _g12za-py2-py-notes: **Notes** The Cox proportional hazards model (see Cox (1972)) relates the time to an event, usually death or failure, to a number of explanatory variables known as covariates. Some of the observations may be right-censored, that is, the exact time to failure is not known, only that it is greater than a known time. Let :math:`t_{\textit{i}}`, for :math:`\textit{i} = 1,2,\ldots,n`, be the failure time or censored time for the :math:`i`\ th observation with the vector of :math:`p` covariates :math:`z_i`. It is assumed that censoring and failure mechanisms are independent. The hazard function, :math:`\lambda \left(t, z\right)`, is the probability that an individual with covariates :math:`z` fails at time :math:`t` given that the individual survived up to time :math:`t`. In the Cox proportional hazards model, :math:`\lambda \left(t, z\right)` is of the form .. math:: \lambda \left(t, z\right) = \lambda_0\left(t\right)\mathrm{exp}\left(z^\mathrm{T}\beta \right)\text{,} where :math:`\lambda_0` is the base-line hazard function, an unspecified function of time, and :math:`\beta` is a vector of unknown parameters. As :math:`\lambda_0` is unknown, the parameters :math:`\beta` are estimated using the conditional or marginal likelihood. This involves considering the covariate values of all subjects that are at risk at the time when a failure occurs. The probability that the subject that failed had their observed set of covariate values is computed. The risk set at a failure time consists of those subjects that fail or are censored at that time and those who survive beyond that time. As risk sets are computed for every distinct failure time, it should be noted that the combined risk sets may be considerably larger than the original data. If the data can be considered as coming from different strata such that :math:`\lambda_0` varies from strata to strata but :math:`\beta` remains constant, then ``coxmodel_risksets`` will return a factor that indicates to which risk set/strata each member of the risk sets belongs rather than just to which risk set. Given the risk sets the Cox proportional hazards model can then be fitted using a Poisson generalized linear model (:meth:`correg.glm_poisson <naginterfaces.library.correg.glm_poisson>` with :meth:`anova.dummyvars <naginterfaces.library.anova.dummyvars>` to compute dummy variables) using Breslow's approximation for ties (see Breslow (1974)). This will give the same fit as :meth:`coxmodel`. If the exact treatment of ties in discrete time is required, as given by Cox (1972), then the model is fitted as a conditional logistic model using :meth:`contab.condl_logistic <naginterfaces.library.contab.condl_logistic>`. .. _g12za-py2-py-references: **References** Breslow, N E, 1974, `Covariate analysis of censored survival data`, Biometrics (30), 89--99 Cox, D R, 1972, `Regression models in life tables (with discussion)`, J. Roy. Statist. Soc. Ser. B (34), 187--220 Gross, A J and Clark, V A, 1975, `Survival Distributions: Reliability Applications in the Biomedical Sciences`, Wiley """ raise NotImplementedError