Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_nonpar_rank_regsn_censored (g08rb)

## Purpose

nag_nonpar_rank_regsn_censored (g08rb) calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations when some of the observations may be right-censored.

## Syntax

[prvr, irank, zin, eta, vapvec, parest, ifail] = g08rb(nv, y, x, icen, gamma, nmax, tol, 'ns', ns, 'ip', ip)
[prvr, irank, zin, eta, vapvec, parest, ifail] = nag_nonpar_rank_regsn_censored(nv, y, x, icen, gamma, nmax, tol, 'ns', ns, 'ip', ip)

## Description

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for the regression model where the location parameters of the observations, θi${\theta }_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$, are related by θ = Xβ$\theta =X\beta$. Here X$X$ is an n$n$ by p$p$ matrix of explanatory variables and β$\beta$ is a vector of p$p$ unknown regression parameters. The observations are replaced by their ranks and an approximation, based on Taylor's series expansion, made to the rank marginal likelihood. For details of the approximation see Pettitt (1982).
An observation is said to be right-censored if we can only observe Yj * ${Y}_{j}^{*}$ with Yj * Yj${Y}_{j}^{*}\le {Y}_{j}$. We rank censored and uncensored observations as follows. Suppose we can observe Yj${Y}_{\mathit{j}}$, for j = 1,2,,n$\mathit{j}=1,2,\dots ,n$, directly but Yj * ${Y}_{j}^{*}$, for j = n + 1,,q$\mathit{j}=n+1,\dots ,q$ and nq$n\le q$, are censored on the right. We define the rank rj${r}_{j}$ of Yj${Y}_{j}$, for j = 1,2,,n$j=1,2,\dots ,n$, in the usual way; rj${r}_{j}$ equals i$i$ if and only if Yj${Y}_{j}$ is the i$i$th smallest amongst the Y1,Y2,,Yn${Y}_{1},{Y}_{2},\dots ,{Y}_{n}$. The right-censored Yj * ${Y}_{j}^{*}$, for j = n + 1,n + 2,,q$j=n+1,n+2,\dots ,q$, has rank rj${r}_{j}$ if and only if Yj * ${Y}_{j}^{*}$ lies in the interval [Y(rj),Y(rj + 1)]$\left[{Y}_{\left({r}_{j}\right)},{Y}_{\left({r}_{j}+1\right)}\right]$, with Y0 = ${Y}_{0}=-\infty$, Y(n + 1) = + ${Y}_{\left(n+1\right)}=+\infty$ and Y(1) < < Y(n)${Y}_{\left(1\right)}<\cdots <{Y}_{\left(n\right)}$ the ordered Yj${Y}_{\mathit{j}}$, for j = 1,2,,n$\mathit{j}=1,2,\dots ,n$.
The distribution of the Y$Y$ is assumed to be of the following form. Let FL (y) = ey / (1 + ey)${F}_{L}\left(y\right)={e}^{y}/\left(1+{e}^{y}\right)$, the logistic distribution function, and consider the distribution function Fγ(y)${F}_{\gamma }\left(y\right)$ defined by 1Fγ = [1FL(y)]1 / γ $1-{F}_{\gamma }={\left[1-{F}_{L}\left(y\right)\right]}^{1/\gamma }$. This distribution function can be thought of as either the distribution function of the minimum, X1,γ${X}_{1,\gamma }$, of a random sample of size γ1${\gamma }^{-1}$ from the logistic distribution, or as the Fγ(ylogγ)${F}_{\gamma }\left(y-\mathrm{log}\gamma \right)$ being the distribution function of a random variable having the F$F$-distribution with 2$2$ and 2γ1$2{\gamma }^{-1}$ degrees of freedom. This family of generalized logistic distribution functions [Fγ( . );0γ < ]$\left[{F}_{\gamma }\left(.\right)\text{;}0\le \gamma <\infty \right]$ naturally links the symmetric logistic distribution (γ = 1)$\left(\gamma =1\right)$ with the skew extreme value distribution (limγ0$\mathrm{lim}\gamma \to 0$) and with the limiting negative exponential distribution (limγ$\mathrm{lim}\gamma \to \infty$). For this family explicit results are available for right-censored data. See Pettitt (1983) for details.
Let lR${l}_{R}$ denote the logarithm of the rank marginal likelihood of the observations and define the q × 1$q×1$ vector a$a$ by a = lR(θ = 0)$a={l}_{R}^{\prime }\left(\theta =0\right)$, and let the q$q$ by q$q$ diagonal matrix B$B$ and q$q$ by q$q$ symmetric matrix A$A$ be given by BA = lR(θ = 0)$B-A=-{l}_{R}^{\prime \prime }\left(\theta =0\right)$. Then various statistics can be found from the analysis.
 (a) The score statistic XTa${X}^{\mathrm{T}}a$. This statistic is used to test the hypothesis H0 : β = 0${H}_{0}:\beta =0$ (see (e)). (b) The estimated variance-covariance matrix of the score statistic in (a). (c) The estimate β̂R = MXTa${\stackrel{^}{\beta }}_{R}=M{X}^{\mathrm{T}}a$. (d) The estimated variance-covariance matrix M = (XT(B − A)X) − 1$M={\left({X}^{\mathrm{T}}\left(B-A\right)X\right)}^{-1}$ of the estimate β̂R${\stackrel{^}{\beta }}_{R}$. (e) The χ2${\chi }^{2}$ statistic Q = β̂RM − 1​ ​β̂r = aTX(XT(B − A)X) − 1XTa$Q={\stackrel{^}{\beta }}_{R}{M}^{-1}\text{​ ​}{\stackrel{^}{\beta }}_{r}={a}^{\mathrm{T}}X{\left({X}^{\mathrm{T}}\left(B-A\right)X\right)}^{-1}{X}^{\mathrm{T}}a$, used to test H0 : β = 0${H}_{0}:\beta =0$. Under H0${H}_{0}$, Q$Q$ has an approximate χ2${\chi }^{2}$-distribution with p$p$ degrees of freedom. (f) The standard errors Mii1 / 2${M}_{ii}^{1/2}$ of the estimates given in (c). (g) Approximate z$z$-statistics, i.e., Zi = β̂Ri / se(β̂Ri)${Z}_{i}={\stackrel{^}{\beta }}_{{R}_{i}}/se\left({\stackrel{^}{\beta }}_{{R}_{i}}\right)$ for testing H0 : βi = 0${H}_{0}:{\beta }_{i}=0$. For i = 1,2, … ,n$i=1,2,\dots ,n$, Zi${Z}_{i}$ has an approximate N(0,1)$N\left(0,1\right)$ distribution.
In many situations, more than one sample of observations will be available. In this case we assume the model,
 hk (Yk) = XkT β + ek ,   k = 1,2, … ,ns , $hk (Yk) = XkT β+ek , k=1,2,…,ns ,$
where ns is the number of samples. In an obvious manner, Yk${Y}_{k}$ and Xk${X}_{k}$ are the vector of observations and the design matrix for the k$k$th sample respectively. Note that the arbitrary transformation hk${h}_{k}$ can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of β$\beta$ as β̂ = Dd$\stackrel{^}{\beta }=Dd$, where
 ns D − 1 = ∑ XT(Bk − Ak)Xk k = 1
$D-1=∑k=1nsXT(Bk-Ak)Xk$
and
 ns d = ∑ XkTak, k = 1
$d=∑k= 1ns XkT ak ,$
with ak${a}_{k}$, Bk${B}_{k}$ and Ak${A}_{k}$ defined as a$a$, B$B$ and A$A$ above but for the k$k$th sample.
The remaining statistics are calculated as for the one sample case.

## References

Kalbfleisch J D and Prentice R L (1980) The Statistical Analysis of Failure Time Data Wiley
Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243
Pettitt A N (1983) Approximate methods using ranks for regression with censored data Biometrika 70 121–132

## Parameters

### Compulsory Input Parameters

1:     nv(ns) – int64int32nag_int array
ns, the dimension of the array, must satisfy the constraint ns1${\mathbf{ns}}\ge 1$.
The number of observations in the i$\mathit{i}$th sample, for i = 1,2,,ns$\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
Constraint: nv(i)1${\mathbf{nv}}\left(\mathit{i}\right)\ge 1$, for i = 1,2,,ns$\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
2:     y(nsum) – double array
nsum, the dimension of the array, must satisfy the constraint nsum = i = 1ns nv(i) $\mathit{nsum}=\sum _{\mathit{i}=1}^{{\mathbf{ns}}}{\mathbf{nv}}\left(\mathit{i}\right)$.
The observations in each sample. Specifically, y( k = 1i1 nv(k) + j ) ${\mathbf{y}}\left(\sum _{k=1}^{i-1}{\mathbf{nv}}\left(k\right)+j\right)$ must contain the j$j$th observation in the i$i$th sample.
3:     x(ldx,ip) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnsum$\mathit{ldx}\ge \mathit{nsum}$.
The design matrices for each sample. Specifically, x( k = 1i1 nv(k) + j ,l) ${\mathbf{x}}\left(\sum _{k=1}^{i-1}{\mathbf{nv}}\left(k\right)+j,l\right)$ must contain the value of the l$l$th explanatory variable for the j$j$th observations in the i$i$th sample.
Constraint: x${\mathbf{x}}$ must not contain a column with all elements equal.
4:     icen(nsum) – int64int32nag_int array
nsum, the dimension of the array, must satisfy the constraint nsum = i = 1ns nv(i) $\mathit{nsum}=\sum _{\mathit{i}=1}^{{\mathbf{ns}}}{\mathbf{nv}}\left(\mathit{i}\right)$.
Defines the censoring variable for the observations in y.
icen(i) = 0${\mathbf{icen}}\left(i\right)=0$
If y(i)${\mathbf{y}}\left(i\right)$ is uncensored.
icen(i) = 1${\mathbf{icen}}\left(i\right)=1$
If y(i)${\mathbf{y}}\left(i\right)$ is censored.
Constraint: icen(i) = 0${\mathbf{icen}}\left(\mathit{i}\right)=0$ or 1$1$, for i = 1,2,,nsum$\mathit{i}=1,2,\dots ,\mathit{nsum}$.
5:     gamma – double scalar
The value of the parameter defining the generalized logistic distribution. For gamma0.0001${\mathbf{gamma}}\le 0.0001$, the limiting extreme value distribution is assumed.
Constraint: gamma0.0${\mathbf{gamma}}\ge 0.0$.
6:     nmax – int64int32nag_int scalar
The value of the largest sample size.
Constraint: nmax = max1ins (nv(i))${\mathbf{nmax}}=\underset{1\le i\le {\mathbf{ns}}}{\mathrm{max}}\phantom{\rule{0.25em}{0ex}}\left({\mathbf{nv}}\left(i\right)\right)$ and ${\mathbf{nmax}}>{\mathbf{ip}}$.
7:     tol – double scalar
The tolerance for judging whether two observations are tied. Thus, observations Yi${Y}_{i}$ and Yj${Y}_{j}$ are adjudged to be tied if |YiYj| < tol$|{Y}_{i}-{Y}_{j}|<{\mathbf{tol}}$.
Constraint: tol > 0.0${\mathbf{tol}}>0.0$.

### Optional Input Parameters

1:     ns – int64int32nag_int scalar
Default: The dimension of the array nv.
The number of samples.
Constraint: ns1${\mathbf{ns}}\ge 1$.
2:     ip – int64int32nag_int scalar
Default: The second dimension of the array x.
The number of parameters to be fitted.
Constraint: ip1${\mathbf{ip}}\ge 1$.

### Input Parameters Omitted from the MATLAB Interface

nsum ldx ldprvr work lwork iwa

### Output Parameters

1:     prvr(ldprvr,ip) – double array
ldprvrip + 1$\mathit{ldprvr}\ge {\mathbf{ip}}+1$.
The variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for 1ijip$1\le i\le j\le {\mathbf{ip}}$, prvr(i,j)${\mathbf{prvr}}\left(i,j\right)$ contains an estimate of the covariance between the i$i$th and j$j$th score statistics. For 1jiip1$1\le j\le i\le {\mathbf{ip}}-1$, prvr(i + 1,j)${\mathbf{prvr}}\left(i+1,j\right)$ contains an estimate of the covariance between the i$i$th and j$j$th parameter estimates.
2:     irank(nmax) – int64int32nag_int array
For the one sample case, irank contains the ranks of the observations.
3:     zin(nmax) – double array
For the one sample case, zin contains the expected values of the function g( . )$g\left(.\right)$ of the order statistics.
4:     eta(nmax) – double array
For the one sample case, eta contains the expected values of the function g( . )$g\prime \left(.\right)$ of the order statistics.
5:     vapvec(nmax × (nmax + 1) / 2${\mathbf{nmax}}×\left({\mathbf{nmax}}+1\right)/2$) – double array
For the one sample case, vapvec contains the upper triangle of the variance-covariance matrix of the function g( . )$g\left(.\right)$ of the order statistics stored column-wise.
6:     parest(4 × ip + 1$4×{\mathbf{ip}}+1$) – double array
The statistics calculated by the function.
The first ip components of parest contain the score statistics.
The next ip elements contain the parameter estimates.
parest(2 × ip + 1)${\mathbf{parest}}\left(2×{\mathbf{ip}}+1\right)$ contains the value of the χ2${\chi }^{2}$ statistic.
The next ip elements of parest contain the standard errors of the parameter estimates.
Finally, the remaining ip elements of parest contain the z$z$-statistics.
7:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, ns < 1${\mathbf{ns}}<1$, or tol ≤ 0.0${\mathbf{tol}}\le 0.0$, or ${\mathbf{nmax}}\le {\mathbf{ip}}$, or ldprvr < ip + 1$\mathit{ldprvr}<{\mathbf{ip}}+1$, or ldx < nsum$\mathit{ldx}<\mathit{nsum}$, or nmax ≠ max1 ≤ i ≤ ns (nv(i))${\mathbf{nmax}}\ne {\mathrm{max}}_{1\le i\le {\mathbf{ns}}}\left({\mathbf{nv}}\left(i\right)\right)$, or nv(i) ≤ 0${\mathbf{nv}}\left(i\right)\le 0$ for some i$i$, i = 1,2, … ,ns$i=1,2,\dots ,{\mathbf{ns}}$, or nsum ≠ ∑ i = 1nsnv(i)$\mathit{nsum}\ne \sum _{i=1}^{{\mathbf{ns}}}{\mathbf{nv}}\left(i\right)$, or ip < 1${\mathbf{ip}}<1$, or gamma < 0.0${\mathbf{gamma}}<0.0$, or lwork < nmax × (ip + 1)$\mathit{lwork}<{\mathbf{nmax}}×\left({\mathbf{ip}}+1\right)$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, icen(i) ≠ 0${\mathbf{icen}}\left(i\right)\ne 0$ or 1$1$, for some 1 ≤ i ≤ nsum$1\le i\le \mathit{nsum}$.
ifail = 3${\mathbf{ifail}}=3$
On entry, all the observations are adjudged to be tied. You are advised to check the value supplied for tol.
ifail = 4${\mathbf{ifail}}=4$
The matrix XT(BA)X${X}^{\mathrm{T}}\left(B-A\right)X$ is either ill-conditioned or not positive definite. This error should only occur with extreme rankings of the data.
ifail = 5${\mathbf{ifail}}=5$
 On entry, at least one column of the matrix X$X$ has all its elements equal.

## Accuracy

The computations are believed to be stable.

The time taken by nag_nonpar_rank_regsn_censored (g08rb) depends on the number of samples, the total number of observations and the number of parameters fitted.
In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See Pettitt (1982) for further details.

## Example

```function nag_nonpar_rank_regsn_censored_example
nv = [int64(40)];
y = [143;
164;
188;
188;
190;
192;
206;
209;
213;
216;
220;
227;
230;
234;
246;
265;
304;
216;
244;
142;
156;
163;
198;
205;
232;
232;
233;
233;
233;
233;
239;
240;
261;
280;
280;
296;
296;
323;
204;
344];
x = [0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1];
icen = [int64(0);0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;1];
gamma = 1e-05;
nmax = int64(40);
tol = 1e-05;
[parvar, irank, zin, eta, vapvec, parest, ifail] = ...
nag_nonpar_rank_regsn_censored(nv, y, x, icen, gamma, nmax, tol);
parvar, parest, ifail
```
```

parvar =

7.6526
0.1307

parest =

4.5840
0.5990
2.7459
0.3615
1.6571

ifail =

0

```
```function g08rb_example
nv = [int64(40)];
y = [143;
164;
188;
188;
190;
192;
206;
209;
213;
216;
220;
227;
230;
234;
246;
265;
304;
216;
244;
142;
156;
163;
198;
205;
232;
232;
233;
233;
233;
233;
239;
240;
261;
280;
280;
296;
296;
323;
204;
344];
x = [0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1];
icen = [int64(0);0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;1];
gamma = 1e-05;
nmax = int64(40);
tol = 1e-05;
[parvar, irank, zin, eta, vapvec, parest, ifail] = ...
g08rb(nv, y, x, icen, gamma, nmax, tol);
parvar, parest, ifail
```
```

parvar =

7.6526
0.1307

parest =

4.5840
0.5990
2.7459
0.3615
1.6571

ifail =

0

```