Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_nonpar_rank_regsn (g08ra)

## Purpose

nag_nonpar_rank_regsn (g08ra) calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations.

## Syntax

[prvr, irank, zin, eta, vapvec, parest, ifail] = g08ra(nv, y, x, idist, nmax, tol, 'ns', ns, 'ip', ip)
[prvr, irank, zin, eta, vapvec, parest, ifail] = nag_nonpar_rank_regsn(nv, y, x, idist, nmax, tol, 'ns', ns, 'ip', ip)

## Description

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for regression arguments arising from the following model.
For random variables ${Y}_{1},{Y}_{2},\dots ,{Y}_{n}$ we assume that, after an arbitrary monotone increasing differentiable transformation, $h\left(.\right)$, the model
 $hYi= xiT β+εi$ (1)
holds, where ${x}_{i}$ is a known vector of explanatory variables and $\beta$ is a vector of $p$ unknown regression coefficients. The ${\epsilon }_{i}$ are random variables assumed to be independent and identically distributed with a completely known distribution which can be one of the following: Normal, logistic, extreme value or double-exponential. In Pettitt (1982) an estimate for $\beta$ is proposed as $\stackrel{^}{\beta }=M{X}^{\mathrm{T}}a$ with estimated variance-covariance matrix $M$. The statistics $a$ and $M$ depend on the ranks ${r}_{i}$ of the observations ${Y}_{i}$ and the density chosen for ${\epsilon }_{i}$.
The matrix $X$ is the $n$ by $p$ matrix of explanatory variables. It is assumed that $X$ is of rank $p$ and that a column or a linear combination of columns of $X$ is not equal to the column vector of $1$ or a multiple of it. This means that a constant term cannot be included in the model (1). The statistics $a$ and $M$ are found as follows. Let ${\epsilon }_{i}$ have pdf $f\left(\epsilon \right)$ and let $g=-{f}^{\prime }/f$. Let ${W}_{1},{W}_{2},\dots ,{W}_{n}$ be order statistics for a random sample of size $n$ with the density $f\left(.\right)$. Define ${Z}_{i}=g\left({W}_{i}\right)$, then ${a}_{i}=E\left({Z}_{{r}_{i}}\right)$. To define $M$ we need ${M}^{-1}={X}^{\mathrm{T}}\left(B-A\right)X$, where $B$ is an $n$ by $n$ diagonal matrix with ${B}_{ii}=E\left({g}^{\prime }\left({W}_{{r}_{i}}\right)\right)$ and $A$ is a symmetric matrix with ${A}_{ij}=\mathrm{cov}\left({Z}_{{r}_{i}},{Z}_{{r}_{j}}\right)$. In the case of the Normal distribution, the ${Z}_{1}<\cdots <{Z}_{n}$ are standard Normal order statistics and $E\left({g}^{\prime }\left({W}_{i}\right)\right)=1$, for $i=1,2,\dots ,n$.
The analysis can also deal with ties in the data. Two observations are adjudged to be tied if $\left|{Y}_{i}-{Y}_{j}\right|<{\mathbf{tol}}$, where tol is a user-supplied tolerance level.
Various statistics can be found from the analysis:
 (a) The score statistic ${X}^{\mathrm{T}}a$. This statistic is used to test the hypothesis ${H}_{0}:\beta =0$, see (e). (b) The estimated variance-covariance matrix ${X}^{\mathrm{T}}\left(B-A\right)X$ of the score statistic in (a). (c) The estimate $\stackrel{^}{\beta }=M{X}^{\mathrm{T}}a$. (d) The estimated variance-covariance matrix $M={\left({X}^{\mathrm{T}}\left(B-A\right)X\right)}^{-1}$ of the estimate $\stackrel{^}{\beta }$. (e) The ${\chi }^{2}$ statistic $Q={\stackrel{^}{\beta }}^{\mathrm{T}}{M}^{-1}\stackrel{^}{\beta }={a}^{\mathrm{T}}X{\left({X}^{\mathrm{T}}\left(B-A\right)X\right)}^{-1}{X}^{\mathrm{T}}a$ used to test ${H}_{0}:\beta =0$. Under ${H}_{0}$, $Q$ has an approximate ${\chi }^{2}$-distribution with $p$ degrees of freedom. (f) The standard errors ${M}_{ii}^{1/2}$ of the estimates given in (c). (g) Approximate $z$-statistics, i.e., ${Z}_{i}={\stackrel{^}{\beta }}_{i}/se\left({\stackrel{^}{\beta }}_{i}\right)$ for testing ${H}_{0}:{\beta }_{i}=0$. For $i=1,2,\dots ,n$, ${Z}_{i}$ has an approximate $N\left(0,1\right)$ distribution.
In many situations, more than one sample of observations will be available. In this case we assume the model
 $hkYk= XkT β+ek, k=1,2,…,ns,$
where ns is the number of samples. In an obvious manner, ${Y}_{k}$ and ${X}_{k}$ are the vector of observations and the design matrix for the $k$th sample respectively. Note that the arbitrary transformation ${h}_{k}$ can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of $\beta$ as $\stackrel{^}{\beta }=Dd$, where
 $D-1=∑k=1ns XkT Bk-AkXk$
and
 $d=∑k= 1ns XkT ak ,$
with ${a}_{k}$, ${B}_{k}$ and ${A}_{k}$ defined as $a$, $B$ and $A$ above but for the $k$th sample.
The remaining statistics are calculated as for the one sample case.

## References

Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{nv}\left({\mathbf{ns}}\right)$int64int32nag_int array
The number of observations in the $\mathit{i}$th sample, for $\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
Constraint: ${\mathbf{nv}}\left(\mathit{i}\right)\ge 1$, for $\mathit{i}=1,2,\dots ,{\mathbf{ns}}$.
2:     $\mathrm{y}\left(\mathit{nsum}\right)$ – double array
nsum, the dimension of the array, must satisfy the constraint $\mathit{nsum}=\sum _{\mathit{i}=1}^{{\mathbf{ns}}}{\mathbf{nv}}\left(\mathit{i}\right)$.
The observations in each sample. Specifically, ${\mathbf{y}}\left(\sum _{k=1}^{i-1}{\mathbf{nv}}\left(k\right)+j\right)$ must contain the $j$th observation in the $i$th sample.
3:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{ip}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge \mathit{nsum}$.
The design matrices for each sample. Specifically, ${\mathbf{x}}\left(\sum _{k=1}^{i-1}{\mathbf{nv}}\left(k\right)+j,l\right)$ must contain the value of the $l$th explanatory variable for the $j$th observation in the $i$th sample.
Constraint: ${\mathbf{x}}$ must not contain a column with all elements equal.
4:     $\mathrm{idist}$int64int32nag_int scalar
The error distribution to be used in the analysis.
${\mathbf{idist}}=1$
Normal.
${\mathbf{idist}}=2$
Logistic.
${\mathbf{idist}}=3$
Extreme value.
${\mathbf{idist}}=4$
Double-exponential.
Constraint: $1\le {\mathbf{idist}}\le 4$.
5:     $\mathrm{nmax}$int64int32nag_int scalar
The value of the largest sample size.
Constraint: ${\mathbf{nmax}}=\underset{1\le i\le {\mathbf{ns}}}{\mathrm{max}}\phantom{\rule{0.25em}{0ex}}\left({\mathbf{nv}}\left(i\right)\right)$ and ${\mathbf{nmax}}>{\mathbf{ip}}$.
6:     $\mathrm{tol}$ – double scalar
The tolerance for judging whether two observations are tied. Thus, observations ${Y}_{i}$ and ${Y}_{j}$ are adjudged to be tied if $\left|{Y}_{i}-{Y}_{j}\right|<{\mathbf{tol}}$.
Constraint: ${\mathbf{tol}}>0.0$.

### Optional Input Parameters

1:     $\mathrm{ns}$int64int32nag_int scalar
Default: the dimension of the array nv.
The number of samples.
Constraint: ${\mathbf{ns}}\ge 1$.
2:     $\mathrm{ip}$int64int32nag_int scalar
Default: the second dimension of the array x.
The number of parameters to be fitted.
Constraint: ${\mathbf{ip}}\ge 1$.

### Output Parameters

1:     $\mathrm{prvr}\left(\mathit{ldprvr},{\mathbf{ip}}\right)$ – double array
The variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for $1\le i\le j\le {\mathbf{ip}}$, ${\mathbf{prvr}}\left(i,j\right)$ contains an estimate of the covariance between the $i$th and $j$th score statistics. For $1\le j\le i\le {\mathbf{ip}}-1$, ${\mathbf{prvr}}\left(i+1,j\right)$ contains an estimate of the covariance between the $i$th and $j$th parameter estimates.
2:     $\mathrm{irank}\left({\mathbf{nmax}}\right)$int64int32nag_int array
For the one sample case, irank contains the ranks of the observations.
3:     $\mathrm{zin}\left({\mathbf{nmax}}\right)$ – double array
For the one sample case, zin contains the expected values of the function $g\left(.\right)$ of the order statistics.
4:     $\mathrm{eta}\left({\mathbf{nmax}}\right)$ – double array
For the one sample case, eta contains the expected values of the function $g\prime \left(.\right)$ of the order statistics.
5:     $\mathrm{vapvec}\left({\mathbf{nmax}}×\left({\mathbf{nmax}}+1\right)/2\right)$ – double array
For the one sample case, vapvec contains the upper triangle of the variance-covariance matrix of the function $g\left(.\right)$ of the order statistics stored column-wise.
6:     $\mathrm{parest}\left(4×{\mathbf{ip}}+1\right)$ – double array
The statistics calculated by the function.
The first ip components of parest contain the score statistics.
The next ip elements contain the parameter estimates.
${\mathbf{parest}}\left(2×{\mathbf{ip}}+1\right)$ contains the value of the ${\chi }^{2}$ statistic.
The next ip elements of parest contain the standard errors of the parameter estimates.
Finally, the remaining ip elements of parest contain the $z$-statistics.
7:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{ns}}<1$, or ${\mathbf{tol}}\le 0.0$, or ${\mathbf{nmax}}\le {\mathbf{ip}}$, or $\mathit{ldprvr}<{\mathbf{ip}}+1$, or $\mathit{ldx}<\mathit{nsum}$, or ${\mathbf{nmax}}\ne {\mathrm{max}}_{1\le i\le {\mathbf{ns}}}\left({\mathbf{nv}}\left(i\right)\right)$, or ${\mathbf{nv}}\left(i\right)\le 0$, for some $i$, ${\mathbf{nv}}\left(i\right)$, or $\mathit{nsum}\ne \sum _{i=1}^{{\mathbf{ns}}}{\mathbf{nv}}\left(i\right)$, or ${\mathbf{ip}}<1$, or $\mathit{lwork}<{\mathbf{nmax}}×\left({\mathbf{ip}}+1\right)$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{idist}}<1$, or ${\mathbf{idist}}>4$.
${\mathbf{ifail}}=3$
On entry, all the observations are adjudged to be tied. You are advised to check the value supplied for tol.
${\mathbf{ifail}}=4$
The matrix ${X}^{\mathrm{T}}\left(B-A\right)X$ is either ill-conditioned or not positive definite. This error should only occur with extreme rankings of the data.
${\mathbf{ifail}}=5$
The matrix $X$ has at least one of its columns with all elements equal.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

The computations are believed to be stable.

The time taken by nag_nonpar_rank_regsn (g08ra) depends on the number of samples, the total number of observations and the number of arguments fitted.
In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See Pettitt (1982) for further details.

## Example

A program to fit a regression model to a single sample of $20$ observations using two explanatory variables. The error distribution will be taken to be logistic.
function g08ra_example

fprintf('g08ra example results\n\n');

% Single sample observations and design matrix
y = [1;  1;  3;  4;  2;  4;  1;  5;  4;  4;
4;  4;  4;  1;  4;  5;  5;  4;  4;  3];
x = [1, 23;  1, 32;  1, 37;  1, 41;  1, 41;
1, 48;  1, 48;  1, 55;  1, 55;  0, 56;
1, 57;  1, 57;  1, 57;  0, 58;  1, 59;
0, 59;  0, 60;  1, 61;  1, 62;  1, 62];

ns    = size(y,2);
ip    = size(x,2);
idist = int64(2);
nv    = int64(numel(y));
nmax  = nv;
tol   = 1e-05;

fprintf('Number of samples           = %3d\n', ns);
fprintf('Number of parameters fitted = %3d\n', ip);
fprintf('Distribution                = %3d\n', idist);
fprintf('Tolerance for ties          = %8.1e\n', tol);

[parvar, irank, zin, eta, vapvec, parest, ifail] = ...
g08ra( ...
nv, y, x, idist, nmax, tol);

% Display results
fprintf('\nScore statistic\n');
fprintf('%9.3f%9.3f\n', parest(1:ip));
fprintf('\nCovariance matrix of score statistic\n');
for j = 1:ip
fprintf('%9.3f', parvar(1:j,j));
fprintf('\n');
end
fprintf('\nParameter estimates\n');
fprintf('%9.3f', parest(ip+1:ip+ip));
fprintf('\n\nCovariance matrix of parameter estimates\n');
for j = 1:ip
fprintf('%9.3f', parvar(j+1,1:j));
fprintf('\n');
end

chisq = parest(2*ip+1);
fprintf('\nChi-squared statistic = %8.3f with %2d d.f.\n\n', chisq, ip);

sterr = reshape(parest(2*ip+2:end),[ip,2]);
fprintf('Standard errors of estimates and approximate z-statistics\n');
disp(sterr);

g08ra example results

Number of samples           =   1
Number of parameters fitted =   2
Distribution                =   2
Tolerance for ties          =  1.0e-05

Score statistic
-1.048   64.333

Covariance matrix of score statistic
0.673
-4.159  533.670

Parameter estimates
-0.852    0.114

Covariance matrix of parameter estimates
1.560
0.012    0.002

Chi-squared statistic =    8.221 with  2 d.f.

Standard errors of estimates and approximate z-statistics
1.2492   -0.6824
0.0444    2.5673