hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_nonpar_rank_regsn (g08ra)

Purpose

nag_nonpar_rank_regsn (g08ra) calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations.

Syntax

[prvr, irank, zin, eta, vapvec, parest, ifail] = g08ra(nv, y, x, idist, nmax, tol, 'ns', ns, 'ip', ip)
[prvr, irank, zin, eta, vapvec, parest, ifail] = nag_nonpar_rank_regsn(nv, y, x, idist, nmax, tol, 'ns', ns, 'ip', ip)

Description

Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for regression parameters arising from the following model.
For random variables Y1,Y2,,YnY1,Y2,,Yn we assume that, after an arbitrary monotone increasing differentiable transformation, h( . )h(.), the model
h(Yi) = xiT β + εi
h(Yi)= xiT β+εi
(1)
holds, where xixi is a known vector of explanatory variables and ββ is a vector of pp unknown regression coefficients. The εiεi are random variables assumed to be independent and identically distributed with a completely known distribution which can be one of the following: Normal, logistic, extreme value or double-exponential. In Pettitt (1982) an estimate for ββ is proposed as β̂ = MXTaβ^=MXTa with estimated variance-covariance matrix MM. The statistics aa and MM depend on the ranks riri of the observations YiYi and the density chosen for εiεi.
The matrix XX is the nn by pp matrix of explanatory variables. It is assumed that XX is of rank pp and that a column or a linear combination of columns of XX is not equal to the column vector of 11 or a multiple of it. This means that a constant term cannot be included in the model (1). The statistics aa and MM are found as follows. Let εiεi have pdf f(ε)f(ε) and let g = f / fg=-f/f. Let W1,W2,,WnW1,W2,,Wn be order statistics for a random sample of size nn with the density f( . )f(.). Define Zi = g(Wi)Zi=g(Wi), then ai = E(Zri)ai=E(Zri). To define MM we need M1 = XT(BA)XM-1=XT(B-A)X, where BB is an nn by nn diagonal matrix with Bii = E(g(Wri))Bii=E(g(Wri)) and AA is a symmetric matrix with Aij = cov(Zri,Zrj)Aij=cov(Zri,Zrj). In the case of the Normal distribution, the Z1 < < ZnZ1<<Zn are standard Normal order statistics and E(g(Wi)) = 1E(g(Wi))=1, for i = 1,2,,ni=1,2,,n.
The analysis can also deal with ties in the data. Two observations are adjudged to be tied if |YiYj| < tol|Yi-Yj|<tol, where tol is a user-supplied tolerance level.
Various statistics can be found from the analysis:
(a) The score statistic XTaXTa. This statistic is used to test the hypothesis H0 : β = 0H0:β=0, see (e).
(b) The estimated variance-covariance matrix XT(BA)XXT(B-A)X of the score statistic in (a).
(c) The estimate β̂ = MXTaβ^=MXTa.
(d) The estimated variance-covariance matrix M = (XT(BA)X)1M=(XT(B-A)X) -1 of the estimate β̂β^.
(e) The χ2χ2 statistic Q = β̂TM1β̂ = aTX(XT(BA)X)1XTaQ=β^TM-1β^=aTX(XT(B-A)X) -1XTa used to test H0 : β = 0H0:β=0. Under H0H0, QQ has an approximate χ2χ2-distribution with pp degrees of freedom.
(f) The standard errors Mii1 / 2Mii 1/2 of the estimates given in (c).
(g) Approximate zz-statistics, i.e., Zi = β̂i / se(β̂i)Zi=β^i/se(β^i) for testing H0 : βi = 0H0:βi=0. For i = 1,2,,ni=1,2,,n, ZiZi has an approximate N(0,1)N(0,1) distribution.
In many situations, more than one sample of observations will be available. In this case we assume the model
hk(Yk) = XkT β + ek,  k = 1,2,,ns,
hk(Yk)= XkT β+ek,  k=1,2,,ns,
where ns is the number of samples. In an obvious manner, YkYk and XkXk are the vector of observations and the design matrix for the kkth sample respectively. Note that the arbitrary transformation hkhk can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of ββ as β̂ = Ddβ^=Dd, where
ns
D1 = XkT(BkAk)Xk
k = 1
D-1=k=1ns XkT (Bk-Ak)Xk
and
ns
d = XkTak,
k = 1
d=k= 1ns XkT ak ,
with akak, BkBk and AkAk defined as aa, BB and AA above but for the kkth sample.
The remaining statistics are calculated as for the one sample case.

References

Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243

Parameters

Compulsory Input Parameters

1:     nv(ns) – int64int32nag_int array
ns, the dimension of the array, must satisfy the constraint ns1ns1.
The number of observations in the iith sample, for i = 1,2,,nsi=1,2,,ns.
Constraint: nv(i)1nvi1, for i = 1,2,,nsi=1,2,,ns.
2:     y(nsum) – double array
nsum, the dimension of the array, must satisfy the constraint nsum = i = 1ns nv(i) nsum= i=1 ns nvi .
The observations in each sample. Specifically, y( k = 1i1 nv(k) + j ) y k=1 i-1 nvk+j  must contain the jjth observation in the iith sample.
3:     x(ldx,ip) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnsumldxnsum.
The design matrices for each sample. Specifically, x( k = 1i1 nv(k) + j ,l) x k=1 i-1 nvk +j l  must contain the value of the llth explanatory variable for the jjth observation in the iith sample.
Constraint: xx must not contain a column with all elements equal.
4:     idist – int64int32nag_int scalar
The error distribution to be used in the analysis.
idist = 1idist=1
Normal.
idist = 2idist=2
Logistic.
idist = 3idist=3
Extreme value.
idist = 4idist=4
Double-exponential.
Constraint: 1idist41idist4.
5:     nmax – int64int32nag_int scalar
The value of the largest sample size.
Constraint: nmax = max1ins (nv(i))nmax=max1ins(nvi) and nmax > ipnmax>ip.
6:     tol – double scalar
The tolerance for judging whether two observations are tied. Thus, observations YiYi and YjYj are adjudged to be tied if |YiYj| < tol|Yi-Yj|<tol.
Constraint: tol > 0.0tol>0.0.

Optional Input Parameters

1:     ns – int64int32nag_int scalar
Default: The dimension of the array nv.
The number of samples.
Constraint: ns1ns1.
2:     ip – int64int32nag_int scalar
Default: The second dimension of the array x.
The number of parameters to be fitted.
Constraint: ip1ip1.

Input Parameters Omitted from the MATLAB Interface

nsum ldx ldprvr work lwork iwa

Output Parameters

1:     prvr(ldprvr,ip) – double array
ldprvrip + 1ldprvrip+1.
The variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for 1ijip1ijip, prvr(i,j)prvrij contains an estimate of the covariance between the iith and jjth score statistics. For 1jiip11jiip-1, prvr(i + 1,j)prvri+1j contains an estimate of the covariance between the iith and jjth parameter estimates.
2:     irank(nmax) – int64int32nag_int array
For the one sample case, irank contains the ranks of the observations.
3:     zin(nmax) – double array
For the one sample case, zin contains the expected values of the function g( . )g(.) of the order statistics.
4:     eta(nmax) – double array
For the one sample case, eta contains the expected values of the function g( . )g(.) of the order statistics.
5:     vapvec(nmax × (nmax + 1) / 2nmax×(nmax+1)/2) – double array
For the one sample case, vapvec contains the upper triangle of the variance-covariance matrix of the function g( . )g(.) of the order statistics stored column-wise.
6:     parest(4 × ip + 14×ip+1) – double array
The statistics calculated by the function.
The first ip components of parest contain the score statistics.
The next ip elements contain the parameter estimates.
parest(2 × ip + 1)parest2×ip+1 contains the value of the χ2χ2 statistic.
The next ip elements of parest contain the standard errors of the parameter estimates.
Finally, the remaining ip elements of parest contain the zz-statistics.
7:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,ns < 1ns<1,
ortol0.0tol0.0,
ornmaxipnmaxip,
orldprvr < ip + 1ldprvr<ip+1,
orldx < nsumldx<nsum,
ornmaxmax1ins (nv(i))nmaxmax1ins (nvi),
ornv(i)0nvi0, for some ii, nv(i)nvi,
ornsumi = 1nsnv(i)nsumi=1nsnvi,
orip < 1ip<1,
orlwork < nmax × (ip + 1)lwork<nmax×(ip+1).
  ifail = 2ifail=2
On entry,idist < 1idist<1,
oridist > 4idist>4.
  ifail = 3ifail=3
On entry, all the observations are adjudged to be tied. You are advised to check the value supplied for tol.
  ifail = 4ifail=4
The matrix XT(BA)XXT(B-A)X is either ill-conditioned or not positive definite. This error should only occur with extreme rankings of the data.
  ifail = 5ifail=5
The matrix XX has at least one of its columns with all elements equal.

Accuracy

The computations are believed to be stable.

Further Comments

The time taken by nag_nonpar_rank_regsn (g08ra) depends on the number of samples, the total number of observations and the number of parameters fitted.
In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See Pettitt (1982) for further details.

Example

function nag_nonpar_rank_regsn_example
nv = [int64(20)];
y = [1;
     1;
     3;
     4;
     2;
     4;
     1;
     5;
     4;
     4;
     4;
     4;
     4;
     1;
     4;
     5;
     5;
     4;
     4;
     3];
x = [1, 23;
     1, 32;
     1, 37;
     1, 41;
     1, 41;
     1, 48;
     1, 48;
     1, 55;
     1, 55;
     0, 56;
     1, 57;
     1, 57;
     1, 57;
     0, 58;
     1, 59;
     0, 59;
     0, 60;
     1, 61;
     1, 62;
     1, 62];
idist = int64(2);
nmax = int64(20);
tol = 1e-05;
[parvar, irank, zin, eta, vapvec, parest, ifail] = ...
    nag_nonpar_rank_regsn(nv, y, x, idist, nmax, tol);
 parvar, irank, zin, eta, parest, ifail
 

parvar =

    0.6733   -4.1587
    1.5604  533.6696
    0.0122    0.0020


irank =

                    1
                    2
                    6
                    8
                    5
                    9
                    3
                   18
                   10
                   11
                   12
                   13
                   14
                    4
                   15
                   19
                   20
                   16
                   17
                    7


zin =

   -0.7619
   -0.7619
   -0.7619
   -0.7619
   -0.5238
   -0.3810
   -0.3810
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.8095
    0.8095
    0.8095


eta =

    0.1948
    0.1948
    0.1948
    0.1948
    0.3463
    0.4069
    0.4069
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.1616
    0.1616
    0.1616


parest =

   -1.0476
   64.3333
   -0.8524
    0.1139
    8.2210
    1.2492
    0.0444
   -0.6824
    2.5673


ifail =

                    0


function g08ra_example
nv = [int64(20)];
y = [1;
     1;
     3;
     4;
     2;
     4;
     1;
     5;
     4;
     4;
     4;
     4;
     4;
     1;
     4;
     5;
     5;
     4;
     4;
     3];
x = [1, 23;
     1, 32;
     1, 37;
     1, 41;
     1, 41;
     1, 48;
     1, 48;
     1, 55;
     1, 55;
     0, 56;
     1, 57;
     1, 57;
     1, 57;
     0, 58;
     1, 59;
     0, 59;
     0, 60;
     1, 61;
     1, 62;
     1, 62];
idist = int64(2);
nmax = int64(20);
tol = 1e-05;
[parvar, irank, zin, eta, vapvec, parest, ifail] = g08ra(nv, y, x, idist, nmax, tol);
 parvar, irank, zin, eta, parest, ifail
 

parvar =

    0.6733   -4.1587
    1.5604  533.6696
    0.0122    0.0020


irank =

                    1
                    2
                    6
                    8
                    5
                    9
                    3
                   18
                   10
                   11
                   12
                   13
                   14
                    4
                   15
                   19
                   20
                   16
                   17
                    7


zin =

   -0.7619
   -0.7619
   -0.7619
   -0.7619
   -0.5238
   -0.3810
   -0.3810
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.1905
    0.8095
    0.8095
    0.8095


eta =

    0.1948
    0.1948
    0.1948
    0.1948
    0.3463
    0.4069
    0.4069
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.4242
    0.1616
    0.1616
    0.1616


parest =

   -1.0476
   64.3333
   -0.8524
    0.1139
    8.2210
    1.2492
    0.0444
   -0.6824
    2.5673


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013