NAG C Library Function Document

nag_lars_param (g02mcc)

1
Purpose

nag_lars_param (g02mcc) calculates additional parameter estimates following Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO) as performed by nag_lars (g02mac) and nag_lars_xtx (g02mbc).

2
Specification

#include <nag.h>
#include <nagg02.h>
void  nag_lars_param (Integer nstep, Integer ip, const double b[], Integer pdb, const double fitsum[], Nag_LARSTargetType ktype, const double nk[], Integer lnk, double nb[], Integer pdnb, NagError *fail)

3
Description

nag_lars (g02mac) and nag_lars_xtx (g02mbc) fit either a LARS, forward stagewise linear regression, LASSO or positive LASSO model to a vector of n observed values, y = yi : i=1,2,,n  and an n×p design matrix X, where the jth column of X is given by the jth independent variable xj. The models are fit using the LARS algorithm of Efron et al. (2004).
GnuplotProduced by GNUPLOT 4.6 patchlevel 3 −1 0 1 2 3 4 0 20 40 60 80 100 120 140 160 180 200 220 Parameter Estimates (βkj) ||βk||1 gnuplot_plot_1 βk1 gnuplot_plot_2 βk2 gnuplot_plot_3 βk3 gnuplot_plot_4 βk4 gnuplot_plot_5 βk5 gnuplot_plot_6 βk6
Figure 1
The full solution path for all four of these models follow a similar pattern where the parameter estimate for a given variable is piecewise linear. One such path, for a LARS model with six variables p=6 can be seen in Figure 1. Both nag_lars (g02mac) and nag_lars_xtx (g02mbc) return the vector of p parameter estimates, βk, at K points along this path (so k=1,2,,K). Each point corresponds to a step of the LARS algorithm. The number of steps taken depends on the model being fitted. In the case of a LARS model, K=p and each step corresponds to a new variable being included in the model. In the case of the LASSO models, each step corresponds to either a new variable being included in the model or an existing variable being removed from the model; the value of K is therefore no longer bound by the number of parameters. For forward stagewise linear regression, each step no longer corresponds to the addition or removal of a variable; therefore the number of possible steps is often markedly greater than for a corresponding LASSO model.
nag_lars_param (g02mcc) uses the piecewise linear nature of the solution path to predict the parameter estimates, β~, at a different point on this path. The location of the solution can either be defined in terms of a (fractional) step number or a function of the L1 norm of the parameter estimates.

4
References

Efron B, Hastie T, Johnstone I and Tibshirani R (2004) Least Angle Regression The Annals of Statistics (Volume 32) 2 407–499
Hastie T, Tibshirani R and Friedman J (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction Springer (New York)
Tibshirani R (1996) Regression Shrinkage and Selection via the Lasso Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) 1 267–288
Weisberg S (1985) Applied Linear Regression Wiley

5
Arguments

1:     nstep IntegerInput
On entry: K, the number of steps carried out in the model fitting process, as returned by nag_lars (g02mac) and nag_lars_xtx (g02mbc).
Constraint: nstep0.
2:     ip IntegerInput
On entry: p, number of parameter estimates, as returned by nag_lars (g02mac) and nag_lars_xtx (g02mbc).
Constraint: ip1.
3:     b[dim] const doubleInput
Note: the dimension, dim, of the array b must be at least pdb×(nstep+1).
On entry: β the parameter estimates, as returned by nag_lars (g02mac) and nag_lars_xtx (g02mbc), with b[k-1×pdb+j-1]=βkj, the parameter estimate for the jth variable, for j=1,2,,p, at the kth step of the model fitting process.
Constraint: b should be unchanged since the last call to nag_lars (g02mac) or nag_lars_xtx (g02mbc).
4:     pdb IntegerInput
On entry: the stride separating row elements in the two-dimensional data stored in the array b.
Constraint: pdbip.
5:     fitsum[6×nstep+1] const doubleInput
On entry: summaries of the model fitting process, as returned by nag_lars (g02mac) and nag_lars_xtx (g02mbc).
Constraint: fitsum should be unchanged since the last call to nag_lars (g02mac) or nag_lars_xtx (g02mbc)..
6:     ktype Nag_LARSTargetTypeInput
On entry: indicates what target values are held in nk.
ktype=Nag_LARS_StepNumber
nk holds (fractional) LARS step numbers.
ktype=Nag_LARS_ScaledNorm
nk holds values for L1 norm of the (scaled) parameters.
ktype=Nag_LARS_ProportionScaledNorm
nk holds ratios with respect to the largest (scaled) L1 norm.
ktype=Nag_LARS_UnscaledNorm
nk holds values for the L1 norm of the (unscaled) parameters.
ktype=Nag_LARS_ProportionUnscaledNorm
nk holds ratios with respect to the largest (unscaled) L1 norm.
If nag_lars (g02mac) was called with pred=Nag_LARS_None or Nag_LARS_Centered or nag_lars_xtx (g02mbc) was called with pred=Nag_LARS_None then the model fitting routine did not rescale the independent variables, X, prior to fitting the model and therefore there is no difference between ktype=Nag_LARS_ScaledNorm or Nag_LARS_ProportionScaledNorm and ktype=Nag_LARS_UnscaledNorm or Nag_LARS_ProportionUnscaledNorm.
Constraint: ktype=Nag_LARS_StepNumber, Nag_LARS_ScaledNorm, Nag_LARS_ProportionScaledNorm, Nag_LARS_UnscaledNorm or Nag_LARS_ProportionUnscaledNorm.
7:     nk[lnk] const doubleInput
On entry: target values used for predicting the new set of parameter estimates.
Constraints:
  • if ktype=Nag_LARS_StepNumber, 0nk[i-1]nstep, for i=1,2,,lnk;
  • if ktype=Nag_LARS_ScaledNorm, 0nk[i-1]fitsum[nstep-1×6], for i=1,2,,lnk;
  • if ktype=Nag_LARS_ProportionScaledNorm or Nag_LARS_ProportionUnscaledNorm, 0nk[i-1]1, for i=1,2,,lnk;
  • if ktype=Nag_LARS_UnscaledNorm, 0nk[i-1]βK1, for i=1,2,,lnk.
8:     lnk IntegerInput
On entry: number of values supplied in nk.
Constraint: lnk1.
9:     nb[dim] doubleOutput
Note: the dimension, dim, of the array nb must be at least pdnb×lnk.
On exit: β~ the predicted parameter estimates, with b[i-1×pdb+j-1]=β~ij, the parameter estimate for variable j, j=1,2,,p at the point in the fitting process associated with nk[i-1], i=1,2,,lnk.
10:   pdnb IntegerInput
On entry: the stride separating row elements in the two-dimensional data stored in the array nb.
Constraint: pdnbip.
11:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
NE_ARRAY_SIZE
On entry, pdb=value and ip=value
Constraint: pdb or ip.
On entry, pdnb=value and ip=value.
Constraint: pdnb or ip.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_INT
On entry, ip=value.
Constraint: ip1.
On entry, lnk=value.
Constraint: lnk1.
On entry, nstep=value.
Constraint: nstep0.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_OUT_OF_RANGE
On entry, ktype=Nag_LARS_ProportionScaledNorm or Nag_LARS_ProportionUnscaledNorm, nk[value]=value.
Constraint: 0nk[i]1, for all i.
On entry, ktype=Nag_LARS_ScaledNorm, nk[value]=value, nstep=value and fitsum[nstep-1×6]=value.
Constraint: 0nk[i]fitsum[nstep-1×6], for all i.
On entry, ktype=Nag_LARS_StepNumber, nk[value]=value and nstep=value
Constraint: 0nk[i]nstep, for all i.
On entry, ktype=Nag_LARS_UnscaledNorm, nk[value]=value and βK1=value
Constraint: 0nk[i]βK1, for all i.
NE_REAL_ARRAY
b has been corrupted since the last call to nag_lars (g02mac) or nag_lars_xtx (g02mbc).
fitsum has been corrupted since the last call to nag_lars (g02mac) or nag_lars_xtx (g02mbc).

7
Accuracy

Not applicable.

8
Further Comments

None.

9
Example

This example performs a LARS on a set a simulated dataset with 20 observations and 6 independent variables.
Additional parameter estimates are obtained corresponding to a LARS step number of 0.2,1.2,3.2,4.5 and 5.2. Where, for example, 4.5 corresponds to the solution halfway between that obtained at step 4 and that obtained at step 5.

9.1
Program Text

Program Text (g02mcce.c)

9.2
Program Data

Program Data (g02mcce.d)

9.3
Program Results

Program Results (g02mcce.r)