NAG Library Function Document
nag_rank_regsn (g08rac)
1 Purpose
nag_rank_regsn (g08rac) calculates the parameter estimates, score statistics and their variance-covariance matrices for the linear model using a likelihood based on the ranks of the observations.
2 Specification
| #include <nag.h> |
| #include <nagg08.h> |
| void |
nag_rank_regsn (Nag_OrderType order,
Integer ns,
const Integer nv[],
const double y[],
Integer p,
const double x[],
Integer pdx,
Integer idist,
Integer nmax,
double tol,
double prvr[],
Integer pdparvar,
Integer irank[],
double zin[],
double eta[],
double vapvec[],
double parest[],
NagError *fail) |
|
3 Description
Analysis of data can be made by replacing observations by their ranks. The analysis produces inference for regression arguments arising from the following model.
For random variables
we assume that, after an arbitrary monotone increasing differentiable transformation,
, the model
holds, where
is a known vector of explanatory variables and
is a vector of
unknown regression coefficients. The
are random variables assumed to be independent and identically distributed with a completely known distribution which can be one of the following: Normal, logistic, extreme value or double-exponential. In
Pettitt (1982) an estimate for
is proposed as
with estimated variance-covariance matrix
. The statistics
and
depend on the ranks
of the observations
and the density chosen for
.
The matrix
is the
by
matrix of explanatory variables. It is assumed that
is of rank
and that a column or a linear combination of columns of
is not equal to the column vector of
or a multiple of it. This means that a constant term cannot be included in the model
(1). The statistics
and
are found as follows. Let
have pdf
and let
. Let
be order statistics for a random sample of size
with the density
. Define
, then
. To define
we need
, where
is an
by
diagonal matrix with
and
is a symmetric matrix with
. In the case of the Normal distribution, the
are standard Normal order statistics and
, for
.
The analysis can also deal with ties in the data. Two observations are adjudged to be tied if
, where
tol is a user-supplied tolerance level.
Various statistics can be found from the analysis:
| (a) |
The score statistic . This statistic is used to test the hypothesis , see (e). |
| (b) |
The estimated variance-covariance matrix of the score statistic in (a). |
| (c) |
The estimate . |
| (d) |
The estimated variance-covariance matrix of the estimate . |
| (e) |
The statistic used to test . Under , has an approximate -distribution with degrees of freedom. |
| (f) |
The standard errors of the estimates given in (c). |
| (g) |
Approximate -statistics, i.e., for testing . For , has an approximate distribution. |
In many situations, more than one sample of observations will be available. In this case we assume the model
where
ns is the number of samples. In an obvious manner,
and
are the vector of observations and the design matrix for the
th sample respectively. Note that the arbitrary transformation
can be assumed different for each sample since observations are ranked within the sample.
The earlier analysis can be extended to give a combined estimate of
as
, where
and
with
,
and
defined as
,
and
above but for the
th sample.
The remaining statistics are calculated as for the one sample case.
4 References
Pettitt A N (1982) Inference for the linear model using a likelihood based on ranks J. Roy. Statist. Soc. Ser. B 44 234–243
5 Arguments
- 1:
order – Nag_OrderTypeInput
-
On entry: the
order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by
. See
Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint:
or Nag_ColMajor.
- 2:
ns – IntegerInput
-
On entry:
the number of samples.
Constraint:
.
- 3:
nv[ns] – const IntegerInput
On entry: the number of observations in the th sample, for .
Constraint:
, for .
- 4:
y[] – const doubleInput
-
Note: the dimension,
dim, of the array
y
must be at least
.
On entry: the observations in each sample. Specifically, must contain the th observation in the th sample.
- 5:
p – IntegerInput
-
On entry:
the number of parameters to be fitted.
Constraint:
.
- 6:
x[] – const doubleInput
-
Note: the dimension,
dim, of the array
x
must be at least
- when ;
- when .
Where
appears in this document, it refers to the array element
- when ;
- when .
On entry: the design matrices for each sample. Specifically, must contain the value of the th explanatory variable for the th observation in the th sample.
Constraint:
must not contain a column with all elements equal.
- 7:
pdx – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
x.
Constraints:
- if ,
;
- if , .
- 8:
idist – IntegerInput
On entry: the error distribution to be used in the analysis.
- Normal.
- Logistic.
- Extreme value.
- Double-exponential.
Constraint:
.
- 9:
nmax – IntegerInput
-
On entry:
the value of the largest sample size.
Constraint:
and .
- 10:
tol – doubleInput
-
On entry: the tolerance for judging whether two observations are tied. Thus, observations and are adjudged to be tied if .
Constraint:
.
- 11:
prvr[] – doubleOutput
-
Note: the dimension,
dim, of the array
prvr
must be at least
- when ;
- when .
Where
appears in this document, it refers to the array element
- when ;
- when .
On exit: the variance-covariance matrices of the score statistics and the parameter estimates, the former being stored in the upper triangle and the latter in the lower triangle. Thus for , contains an estimate of the covariance between the th and th score statistics. For , contains an estimate of the covariance between the th and th parameter estimates.
- 12:
pdparvar – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
prvr.
Constraints:
- if ,
;
- if , .
- 13:
irank[nmax] – IntegerOutput
-
On exit: for the one sample case,
irank contains the ranks of the observations.
- 14:
zin[nmax] – doubleOutput
On exit: for the one sample case,
zin contains the expected values of the function
of the order statistics.
- 15:
eta[nmax] – doubleOutput
-
On exit: for the one sample case,
eta contains the expected values of the function
of the order statistics.
- 16:
vapvec[] – doubleOutput
On exit: for the one sample case,
vapvec contains the upper triangle of the variance-covariance matrix of the function
of the order statistics stored column-wise.
- 17:
parest[] – doubleOutput
-
On exit: the statistics calculated by the function.
The first
p components of
parest contain the score statistics.
The next
p elements contain the parameter estimates.
contains the value of the statistic.
The next
p elements of
parest contain the standard errors of the parameter estimates.
Finally, the remaining
p elements of
parest contain the
-statistics.
- 18:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
6 Error Indicators and Warnings
- NE_ALLOC_FAIL
Dynamic memory allocation failed.
- NE_BAD_PARAM
On entry, argument had an illegal value.
- NE_INT
On entry,
idist is outside the range
to
:
.
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INT_2
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and sum .
Constraint: the sum of .
- NE_INT_ARRAY
On entry, and .
Constraint: , for .
- NE_INT_ARRAY_ELEM_CONS
On entry .
Constraint: elements of array .
- NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_MAT_ILL_DEFINED
The matrix is either singular or non positive definite.
- NE_OBSERVATIONS
All the observations were adjudged to be tied.
- NE_REAL
On entry, .
Constraint: .
- NE_REAL_ARRAY_ELEM_CONS
On entry, all elements in column of are equal to .
- NE_SAMPLE
The largest sample size is
which is not equal to
nmax,
.
7 Accuracy
The computations are believed to be stable.
The time taken by nag_rank_regsn (g08rac) depends on the number of samples, the total number of observations and the number of arguments fitted.
In extreme cases the parameter estimates for certain models can be infinite, although this is unlikely to occur in practice. See
Pettitt (1982) for further details.
9 Example
A program to fit a regression model to a single sample of observations using two explanatory variables. The error distribution will be taken to be logistic.
9.1 Program Text
Program Text (g08race.c)
9.2 Program Data
Program Data (g08race.d)
9.3 Program Results
Program Results (g08race.r)