g02ka:: Correlation and Regression Analysis (NAG Toolbox)

Ridge regression estimates the parameters

\tilde{β}

in a penalised least squares sense by finding the

\tilde{b}

that minimizes

{‖\tilde{X} \tilde{b} - \tilde{y}‖}^{2} + h {‖\tilde{b}‖}^{2}, h > 0,

where

‖\cdot‖

denotes the

ℓ_{2}

-norm and

h

is a scalar regularization or ridge parameter. For a given value of

h

, the parameter estimates

\tilde{b}

are found by evaluating

\tilde{b} = {({\tilde{X}}^{T} \tilde{X} + h I)}^{- 1} {\tilde{X}}^{T} \tilde{y} .

Note that if

h = 0

the ridge regression solution is equivalent to the ordinary least squares solution.

The method can adopt one of four criteria to minimize while calculating a suitable value for

h

(a)

Generalized cross-validation (GCV):

\frac{n s}{{(n - γ)}^{2}};

(b)

Unbiased estimate of variance (UEV):

\frac{s}{n - γ};

(c)

Future prediction error (FPE):

\frac{1}{n} (s + \frac{2 γ s}{n - γ});

(d)

Bayesian information criterion (BIC):

\frac{1}{n} (s + \frac{\log (n) γ s}{n - γ});

where

s

is the sum of squares of residuals. However, the function returns all four of the above prediction errors regardless of the one selected to minimize the ridge parameter,

h

. Furthermore, the function will optionally return the leave-one-out cross-validation (LOOCV) prediction error.

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

Further Comments

Example

function g02ka_example


fprintf('g02ka example results\n\n');

% Data
x = [19.5, 43.1, 29.1;
     24.7, 49.8, 28.2;
     30.7, 51.9, 37.0;
     29.8, 54.3, 31.1;
     19.1, 42.2, 30.9;
     25.6, 53.9, 23.7;
     31.4, 58.5, 27.6;
     27.9, 52.1, 30.6;
     22.1, 49.9, 23.2;
     25.5, 53.5, 24.8;
     31.1, 56.6, 30.0;
     30.4, 56.7, 28.3;
     18.7, 46.5, 23.0;
     19.7, 44.2, 28.6;
     14.6, 42.7, 21.3;
     29.5, 54.4, 30.1;
     27.7, 55.3, 25.7;
     30.2, 58.6, 24.6;
     22.7, 48.2, 27.1;
     25.2, 51.0, 27.5];

[n,m] = size(x);
isx   = ones(m,1,'int64');
ip    = int64(m);
y = [11.9; 22.8; 18.7; 20.1; 12.9; 21.7; 27.1; 25.4; 21.3; 19.3;
     25.4; 27.2; 11.7; 17.8; 12.8; 23.9; 22.6; 25.4; 14.8; 21.1];

% Parameters
h      = 0.5;
opt    = int64(1);
niter  = int64(25);
tol    = 0.0001;
orig   = int64(2);
optloo = int64(2);

% Fit ridge regression model
[h, niter, nep, b, vif, res, rss, df, perr, ifail] = ...
  g02ka( ...
         x, isx, ip, y, h, opt, niter, tol, orig, optloo);

% Display results
fprintf('Value of ridge parameter      : %10.4f\n\n', h);
fprintf('Sum of squares of residuals   : %14.4e\n', rss);
fprintf('Degrees of freedom            : %5d\n', df);
fprintf('Number of effective parameters: %10.4f\n', nep);
fprintf('\nParameter estimates\n');
ivar = double([1:ip+1]);
fprintf('%4d%11.4f\n',[ivar; b(ivar)']);
fprintf('\nNumber of iterations: %15d\n\n', niter);
if opt==1
  fprintf('Ridge parameter minimises GCV\n');
elseif opt==2
  fprintf('Ridge parameter minimises UEV\n');
elseif opt==3
  fprintf('Ridge parameter minimises FPE\n');
elseif opt==4
  fprintf('Ridge parameter minimises BIC\n');
end
fprintf('\nEstimated prediction errors:\n');
fprintf('GCV    = %10.4f\n', perr(1));
fprintf('UEV    = %10.4f\n', perr(2));
fprintf('FPE    = %10.4f\n', perr(3));
fprintf('BIC    = %10.4f\n', perr(4));
if optloo==2
  fprintf('LOO CV = %10.4f\n', perr(5));
end
fprintf('\nResiduals\n');
ivar = [1:n];
fprintf('%4d%11.4f\n',[ivar; res(ivar)']);
fprintf('\nVariance inflation factors\n');
ivar = double([1:ip]);
fprintf('%4d%11.4f\n',[ivar; vif(ivar)']);

g02ka example results

Value of ridge parameter      :     0.0712

Sum of squares of residuals   :     1.0917e+02
Degrees of freedom            :    16
Number of effective parameters:     2.9059

Parameter estimates
   1    20.1950
   2     9.7934
   3     9.9576
   4    -2.0125

Number of iterations:               6

Ridge parameter minimises GCV

Estimated prediction errors:
GCV    =     7.4718
UEV    =     6.3862
FPE    =     7.3141
BIC    =     8.2380
LOO CV =     7.5495

Residuals
   1    -1.9894
   2     3.5469
   3    -3.0392
   4    -3.0309
   5    -0.1899
   6    -0.3146
   7     0.9775
   8     4.0157
   9     2.5332
  10    -2.3560
  11     0.5446
  12     2.3989
  13    -4.0876
  14     3.2778
  15     0.2894
  16     0.7330
  17    -0.7116
  18    -0.6092
  19    -2.9995
  20     1.0110

Variance inflation factors
   1     0.2928
   2     0.4162
   3     0.8089

NAG Toolbox: nag_correg_ridge_opt (g02ka)

▸▿ Contents

Purpose

Syntax

Description