nag_regsn_ridge (g02kbc) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

NAG Library Function Document

nag_regsn_ridge (g02kbc)

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

nag_regsn_ridge (g02kbc) calculates a ridge regression, with ridge parameters supplied by you.

2  Specification

#include <nag.h>
#include <nagg02.h>
void  nag_regsn_ridge (Nag_OrderType order, Integer n, Integer m, const double x[], Integer pdx, const Integer isx[], Integer ip, const double y[], Integer lh, const double h[], double nep[], Nag_ParaOption wantb, double b[], Integer pdb, Nag_VIFOption wantvf, double vf[], Integer pdvf, Integer lpec, const Nag_PredictError pec[], double pe[], Integer pdpe, NagError *fail)

3  Description

A linear model has the form:
y = c+Xβ+ε ,
where
Let X~ be the mean-centred X and y~ the mean-centred y. Furthermore, X~ is scaled such that the diagonal elements of the cross product matrix X~TX~ are one. The linear model now takes the form:
y~ = X~ β~ + ε .
Ridge regression estimates the parameters β~ in a penalised least squares sense by finding the b~ that minimizes
X~ b~ - y~ 2 + h b~ 2 ,h>0 ,
where · denotes the 2-norm and h is a scalar regularization or ridge parameter. For a given value of h, the parameters estimates b~ are found by evaluating
b~ = X~T X~+hI -1 X~T y~ .
Note that if h=0 the ridge regression solution is equivalent to the ordinary least squares solution.
Rather than calculate the inverse of (X~TX~+hI) directly, nag_regsn_ridge (g02kbc) uses the singular value decomposition (SVD) of X~. After decomposing X~ into UDVT where U and V are orthogonal matrices and D is a diagonal matrix, the parameter estimates become
b~ = V DTD+hI -1 DUT y~ .
A consequence of introducing the ridge parameter is that the effective number of parameters, γ, in the model is given by the sum of diagonal elements of
DT D DT D+hI -1 ,
see Moody (1992) for details.
Any multi-collinearity in the design matrix X may be highlighted by calculating the variance inflation factors for the fitted model. The jth variance inflation factor, vj, is a scaled version of the multiple correlation coefficient between independent variable j and the other independent variables, Rj, and is given by
vj = 1 1-Rj ,j=1,2,,m .
The m variance inflation factors are calculated as the diagonal elements of the matrix:
X~T X~+hI -1 X~T X~ X~T X~+hI-1 ,
which, using the SVD of X~, is equivalent to the diagonal elements of the matrix:
V DT D+hI -1 DT D DT D+hI -1 VT .
Given a value of h, any or all of the following prediction criteria are available:
(a) Generalized cross-validation (GCV):
ns n-γ 2 ;
(b) Unbiased estimate of variance (UEV):
s n-γ ;
(c) Future prediction error (FPE):
1n s+ 2γs n-γ ;
(d) Bayesian information criterion (BIC):
1n s+ lognγs n-γ ;
(e) Leave-one-out cross-validation (LOOCV),
where s is the sum of squares of residuals.
Although parameter estimates b~ are calculated by using X~, it is usual to report the parameter estimates b associated with X. These are calculated from b~, and the means and scalings of X. Optionally, either b~ or b may be calculated.

4  References

Hastie T, Tibshirani R and Friedman J (2003) The Elements of Statistical Learning: Data Mining, Inference and Prediction Springer Series in Statistics
Moody J.E. (1992) The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems In: Neural Information Processing Systems (eds J E Moody, S J Hanson, and R P Lippmann) 4 847–854 Morgan Kaufmann San Mateo CA

5  Arguments

1:     orderNag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by order=Nag_RowMajor. See Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint: order=Nag_RowMajor or Nag_ColMajor.
2:     nIntegerInput
On entry: n, the number of observations.
Constraint: n1.
3:     mIntegerInput
On entry: the number of independent variables available in the data matrix X.
Constraint: mn.
4:     x[dim]const doubleInput
Note: the dimension, dim, of the array x must be at least
  • max1,pdx×m when order=Nag_ColMajor;
  • max1,n×pdx when order=Nag_RowMajor.
The i,jth element of the matrix X is stored in
  • x[j-1×pdx+i-1] when order=Nag_ColMajor;
  • x[i-1×pdx+j-1] when order=Nag_RowMajor.
On entry: the values of independent variables in the data matrix X.
5:     pdxIntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array x.
Constraints:
  • if order=Nag_ColMajor, pdxn;
  • if order=Nag_RowMajor, pdxm.
6:     isx[m]const IntegerInput
On entry: indicates which m independent variables are included in the model.
isx[j-1]=1
The jth variable in x will be included in the model.
isx[j-1]=0
Variable j is excluded.
Constraint: isx[j-1]=0 ​ or ​ 1, for j=1,2,,m.
7:     ipIntegerInput
On entry: m, the number of independent variables in the model.
Constraints:
  • 1ipm;
  • exactly ip elements of isx must be equal to 1.
8:     y[n]const doubleInput
On entry: the n values of the dependent variable y.
9:     lhIntegerInput
On entry: the number of supplied ridge parameters.
Constraint: lh>0.
10:   h[lh]const doubleInput
On entry: h[j-1] is the value of the jth ridge parameter h.
Constraint: h[j-1]0.0, for j=1,2,,lh.
11:   nep[lh]doubleOutput
On exit: nep[j-1] is the number of effective parameters, γ, in the jth model, for j=1,2,,lh.
12:   wantbNag_ParaOptionInput
On entry: defines the options for parameter estimates.
wantb=Nag_NoPara
Parameter estimates are not calculated and b is not referenced.
wantb=Nag_OrigPara
Parameter estimates b are calculated for the original data.
wantb=Nag_StandPara
Parameter estimates b~ are calculated for the standardized data.
Constraint: wantb=Nag_NoPara, Nag_OrigPara or Nag_StandPara.
13:   b[dim]doubleOutput
Note: the dimension, dim, of the array b must be at least
  • pdb×lh when wantbNag_NoPara and order=Nag_ColMajor;
  • max1,(ip+1)×pdb when wantbNag_NoPara and order=Nag_RowMajor;
  • 1 otherwise.
Where Bi,j appears in this document, it refers to the array element
  • b[j-1×pdb+i-1] when order=Nag_ColMajor;
  • b[i-1×pdb+j-1] when order=Nag_RowMajor.
On exit: if wantbNag_NoPara, b contains the intercept and parameter estimates for the fitted ridge regression model in the order indicated by isx. B1,j, for j=1,2,,lh, contains the estimate for the intercept; Bi+1,j contains the parameter estimate for the ith independent variable in the model fitted with ridge parameter h[j-1], for i=1,2,,ip.
14:   pdbIntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array b.
Constraints:
  • if order=Nag_ColMajor,
    • if wantbNag_NoPara, pdbip+1;
    • otherwise pdb1;
  • if order=Nag_RowMajor,
    • if wantbNag_NoPara, pdblh;
    • otherwise pdb1.
15:   wantvfNag_VIFOptionInput
On entry: defines the options for variance inflation factors.
wantvf=Nag_NoVIF
Variance inflation factors are not calculated and the array vf is not referenced.
wantvf=Nag_WantVIF
Variance inflation factors are calculated.
Constraints:
  • wantvf=Nag_NoVIF or Nag_WantVIF;
  • if wantb=Nag_NoPara, wantvf=Nag_WantVIF.
16:   vf[dim]doubleOutput
Note: the dimension, dim, of the array vf must be at least
  • pdvf×lh when wantvfNag_NoVIF and order=Nag_ColMajor;
  • max1,ip×pdvf when wantvfNag_NoVIF and order=Nag_RowMajor;
  • 1 otherwise.
Where VFi,j appears in this document, it refers to the array element
  • vf[j-1×pdvf+i-1] when order=Nag_ColMajor;
  • vf[i-1×pdvf+j-1] when order=Nag_RowMajor.
On exit: if wantvf=Nag_WantVIF, the variance inflation factors. For the ith independent variable in a model fitted with ridge parameter h[j-1], VFi,j is the value of vi, for i=1,2,,ip.
17:   pdvfIntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array vf.
Constraints:
  • if order=Nag_ColMajor,
    • if wantvfNag_NoVIF, pdvfip;
    • otherwise pdvf1;
  • if order=Nag_RowMajor,
    • if wantvfNag_NoVIF, pdvflh;
    • otherwise pdvf1.
18:   lpecIntegerInput
On entry: the number of prediction error statistics to return; set lpec0 for no prediction error estimates.
19:   pec[lpec]const Nag_PredictErrorInput
On entry: if lpec>0, pec[j-1] defines the jth prediction error, for j=1,2,,lpec; otherwise pec is not referenced.
pec[j-1]=Nag_BIC
Bayesian information criterion (BIC).
pec[j-1]=Nag_FPE
Future prediction error (FPE).
pec[j-1]=Nag_GCV
Generalized cross-validation (GCV).
pec[j-1]=Nag_LOOCV
Leave-one-out cross-validation (LOOCV).
pec[j-1]=Nag_EUV
Unbiased estimate of variance (UEV).
Constraint: if lpec>0, pec[j-1]=Nag_BIC, Nag_FPE, Nag_GCV, Nag_LOOCV or Nag_EUV, for j=1,2,,lpec.
20:   pe[dim]doubleOutput
Note: the dimension, dim, of the array pe must be at least
  • pdpe×lh when lpec>0 and order=Nag_ColMajor;
  • max1,lpec×pdpe when lpec>0 and order=Nag_RowMajor;
  • 1 otherwise.
Where PEi,j appears in this document, it refers to the array element
  • pe[j-1×pdpe+i-1] when order=Nag_ColMajor;
  • pe[i-1×pdpe+j-1] when order=Nag_RowMajor.
On exit: if lpec0, pe is not referenced; otherwise PEi,j contains the prediction error of criterion pec[i-1] for the model fitted with ridge parameter h[j-1], for i=1,2,,lpec and j=1,2,,lh.
21:   pdpeIntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array pe.
Constraints:
  • if order=Nag_ColMajor,
    • if lpec>0, pdpelpec;
    • otherwise pdpe1;
  • if order=Nag_RowMajor,
    • if lpec>0, pdpelh;
    • otherwise pdpe1.
22:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_CONSTRAINT
On entry, wantb=Nag_NoPara and wantvf=Nag_NoVIF.
NE_ENUM_INT_2
On entry, pdb=value and ip=value.
Constraint: if wantbNag_NoPara, pdbip+1.
On entry, pdvf=value and ip=value.
Constraint: if wantvfNag_NoVIF, pdvfip.
On entry, wantb=value, pdb=value, lh=value.
Constraint: if wantbNag_NoPara, pdblh;
otherwise pdb1.
On entry, wantvf=value, pdvf=value, lh=value.
Constraint: if wantvfNag_NoVIF, pdvflh;
otherwise pdvf1.
NE_INT
On entry, lh=value.
Constraint: lh>0.
On entry, n=value.
Constraint: n1.
NE_INT_2
On entry, m=value and n=value.
Constraint: mn.
On entry, pdpe=value and lpec=value.
Constraint: pdpelpec.
On entry, pdx=value and m=value.
Constraint: pdxm.
On entry, pdx=value and n=value.
Constraint: pdxn.
NE_INT_3
On entry, pdpe=value, lpec=value and lh=value.
Constraint: if lpec>0, pdpelh;
otherwise pdpe1.
NE_INT_ARG_CONS
ip does not equal the sum of elements in isx.
NE_INT_ARRAY_VAL_1_OR_2
On entry, isx[i-1]0 or 1 for at least one i.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARRAY_CONS
On entry, h[i-1]<0 for at least one i.

7  Accuracy

The accuracy of nag_regsn_ridge (g02kbc) is closely related to that of the singular value decomposition.

8  Further Comments

nag_regsn_ridge (g02kbc) allocates internally max5×n-1,2×ip×ip+n+3×ip+n elements of double precision storage.

9  Example

This example reads in data from an experiment to model body fat, and a selection of ridge regression models are calculated.

9.1  Program Text

Program Text (g02kbce.c)

9.2  Program Data

Program Data (g02kbce.d)

9.3  Program Results

Program Results (g02kbce.r)


nag_regsn_ridge (g02kbc) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012