NAG Library Function Document
nag_regsn_ridge (g02kbc)
1 Purpose
nag_regsn_ridge (g02kbc) calculates a ridge regression, with ridge parameters supplied by you.
2 Specification
| #include <nag.h> |
| #include <nagg02.h> |
| void |
nag_regsn_ridge (Nag_OrderType order,
Integer n,
Integer m,
const double x[],
Integer pdx,
const Integer isx[],
Integer ip,
const double y[],
Integer lh,
const double h[],
double nep[],
Nag_ParaOption wantb,
double b[],
Integer pdb,
Nag_VIFOption wantvf,
double vf[],
Integer pdvf,
Integer lpec,
const Nag_PredictError pec[],
double pe[],
Integer pdpe,
NagError *fail) |
|
3 Description
A linear model has the form:
where
- is an by matrix of values of a dependent variable;
- is a scalar intercept term;
- is an by matrix of values of independent variables;
- is a by matrix of unknown values of parameters;
- is an by matrix of unknown random errors such that variance of .
Let
be the mean-centred
and
the mean-centred
. Furthermore,
is scaled such that the diagonal elements of the cross product matrix
are one. The linear model now takes the form:
Ridge regression estimates the parameters
in a penalised least squares sense by finding the
that minimizes
where
denotes the
-norm and
is a scalar regularization or ridge parameter. For a given value of
, the parameters estimates
are found by evaluating
Note that if the ridge regression solution is equivalent to the ordinary least squares solution.
Rather than calculate the inverse of (
) directly, nag_regsn_ridge (g02kbc) uses the singular value decomposition (SVD) of
. After decomposing
into
where
and
are orthogonal matrices and
is a diagonal matrix, the parameter estimates become
A consequence of introducing the ridge parameter is that the effective number of parameters,
, in the model is given by the sum of diagonal elements of
see
Moody (1992) for details.
Any multi-collinearity in the design matrix
may be highlighted by calculating the variance inflation factors for the fitted model. The
th variance inflation factor,
, is a scaled version of the multiple correlation coefficient between independent variable
and the other independent variables,
, and is given by
The
variance inflation factors are calculated as the diagonal elements of the matrix:
which, using the SVD of
, is equivalent to the diagonal elements of the matrix:
Given a value of
, any or all of the following prediction criteria are available:
| (a) |
Generalized cross-validation (GCV):
|
| (b) |
Unbiased estimate of variance (UEV):
|
| (c) |
Future prediction error (FPE):
|
| (d) |
Bayesian information criterion (BIC):
|
| (e) |
Leave-one-out cross-validation (LOOCV), |
where is the sum of squares of residuals.
Although parameter estimates are calculated by using , it is usual to report the parameter estimates associated with . These are calculated from , and the means and scalings of . Optionally, either or may be calculated.
4 References
Hastie T, Tibshirani R and Friedman J (2003) The Elements of Statistical Learning: Data Mining, Inference and Prediction Springer Series in Statistics
Moody J.E. (1992) The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems In: Neural Information Processing Systems (eds J E Moody, S J Hanson, and R P Lippmann) 4 847–854 Morgan Kaufmann San Mateo CA
5 Arguments
- 1:
order – Nag_OrderTypeInput
-
On entry: the
order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by
. See
Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint:
or Nag_ColMajor.
- 2:
n – IntegerInput
-
On entry: , the number of observations.
Constraint:
.
- 3:
m – IntegerInput
-
On entry:
the number of independent variables available in the data matrix .
Constraint:
.
- 4:
x[] – const doubleInput
-
Note: the dimension,
dim, of the array
x
must be at least
- when ;
- when .
The
th element of the matrix
is stored in
- when ;
- when .
On entry: the values of independent variables in the data matrix .
- 5:
pdx – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
x.
Constraints:
- if ,
;
- if , .
- 6:
isx[m] – const IntegerInput
-
On entry: indicates which
independent variables are included in the model.
- The th variable in x will be included in the model.
- Variable is excluded.
Constraint:
, for .
- 7:
ip – IntegerInput
-
On entry: , the number of independent variables in the model.
Constraints:
- ;
- exactly ip elements of isx must be equal to .
- 8:
y[n] – const doubleInput
-
On entry: the values of the dependent variable .
- 9:
lh – IntegerInput
-
On entry: the number of supplied ridge parameters.
Constraint:
.
- 10:
h[lh] – const doubleInput
-
On entry: is the value of the th ridge parameter .
Constraint:
, for .
- 11:
nep[lh] – doubleOutput
-
On exit: is the number of effective parameters, , in the th model, for .
- 12:
wantb – Nag_ParaOptionInput
-
On entry: defines the options for parameter estimates.
- Parameter estimates are not calculated and b is not referenced.
- Parameter estimates are calculated for the original data.
- Parameter estimates are calculated for the standardized data.
Constraint:
, or .
- 13:
b[] – doubleOutput
-
Note: the dimension,
dim, of the array
b
must be at least
- when
and
;
- when
and
;
- otherwise.
Where
appears in this document, it refers to the array element
- when ;
- when .
On exit: if
,
b contains the intercept and parameter estimates for the fitted ridge regression model in the order indicated by
isx.
, for
, contains the estimate for the intercept;
contains the parameter estimate for the
th independent variable in the model fitted with ridge parameter
, for
.
- 14:
pdb – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
b.
Constraints:
- if ,
- if , ;
- otherwise ;
- if ,
- if ,
;
- otherwise .
- 15:
wantvf – Nag_VIFOptionInput
-
On entry: defines the options for variance inflation factors.
- Variance inflation factors are not calculated and the array vf is not referenced.
- Variance inflation factors are calculated.
Constraints:
- or ;
- if , .
- 16:
vf[] – doubleOutput
-
Note: the dimension,
dim, of the array
vf
must be at least
- when
and
;
- when
and
;
- otherwise.
Where
appears in this document, it refers to the array element
- when ;
- when .
On exit: if , the variance inflation factors. For the th independent variable in a model fitted with ridge parameter , is the value of , for .
- 17:
pdvf – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
vf.
Constraints:
- if ,
- if , ;
- otherwise ;
- if ,
- if ,
;
- otherwise .
- 18:
lpec – IntegerInput
-
On entry:
the number of prediction error statistics to return; set for no prediction error estimates.
- 19:
pec[] – const Nag_PredictErrorInput
On entry: if
,
defines the
th prediction error, for
; otherwise
pec is not referenced.
- Bayesian information criterion (BIC).
- Future prediction error (FPE).
- Generalized cross-validation (GCV).
- Leave-one-out cross-validation (LOOCV).
- Unbiased estimate of variance (UEV).
Constraint:
if , , , , or , for .
- 20:
pe[] – doubleOutput
-
Note: the dimension,
dim, of the array
pe
must be at least
- when
and
;
- when
and
;
- otherwise.
Where
appears in this document, it refers to the array element
- when ;
- when .
On exit: if
,
pe is not referenced; otherwise
contains the prediction error of criterion
for the model fitted with ridge parameter
, for
and
.
- 21:
pdpe – IntegerInput
-
On entry: the stride separating row or column elements (depending on the value of
order) in the array
pe.
Constraints:
- if ,
- if , ;
- otherwise ;
- if ,
- if ,
;
- otherwise .
- 22:
fail – NagError *Input/Output
-
The NAG error argument (see
Section 3.6 in the Essential Introduction).
6 Error Indicators and Warnings
- NE_ALLOC_FAIL
Dynamic memory allocation failed.
- NE_BAD_PARAM
On entry, argument had an illegal value.
- NE_CONSTRAINT
On entry, and .
- NE_ENUM_INT_2
On entry, and .
Constraint: if , .
On entry, and .
Constraint: if , .
On entry, , , .
Constraint: if ,
;
otherwise .
On entry, , , .
Constraint: if ,
;
otherwise .
- NE_INT
On entry, .
Constraint: .
On entry, .
Constraint: .
- NE_INT_2
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
On entry, and .
Constraint: .
- NE_INT_3
On entry, , and .
Constraint: if ,
;
otherwise .
- NE_INT_ARG_CONS
ip does not equal the sum of elements in
isx.
- NE_INT_ARRAY_VAL_1_OR_2
On entry, or for at least one .
- NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact
NAG for assistance.
- NE_REAL_ARRAY_CONS
On entry, for at least one .
7 Accuracy
The accuracy of nag_regsn_ridge (g02kbc) is closely related to that of the singular value decomposition.
nag_regsn_ridge (g02kbc) allocates internally elements of double precision storage.
9 Example
This example reads in data from an experiment to model body fat, and a selection of ridge regression models are calculated.
9.1 Program Text
Program Text (g02kbce.c)
9.2 Program Data
Program Data (g02kbce.d)
9.3 Program Results
Program Results (g02kbce.r)