hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_linregm_fit_onestep (g02ee)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_correg_linregm_fit_onestep (g02ee) carries out one step of a forward selection procedure in order to enable the ‘best’ linear regression model to be found.

Syntax

[istep, addvar, newvar, chrss, f, model, nterm, rss, idf, ifr, free, exss, q, p, ifail] = g02ee(istep, mean_p, x, vname, isx, y, model, nterm, rss, idf, ifr, free, q, p, 'n', n, 'm', m, 'maxip', maxip, 'wt', wt, 'fin', fin)
[istep, addvar, newvar, chrss, f, model, nterm, rss, idf, ifr, free, exss, q, p, ifail] = nag_correg_linregm_fit_onestep(istep, mean_p, x, vname, isx, y, model, nterm, rss, idf, ifr, free, q, p, 'n', n, 'm', m, 'maxip', maxip, 'wt', wt, 'fin', fin)

Description

One method of selecting a linear regression model from a given set of independent variables is by forward selection. The following procedure is used:
(i) Select the best fitting independent variable, i.e., the independent variable which gives the smallest residual sum of squares. If the F-test for this variable is greater than a chosen critical value, Fc, then include the variable in the model, else stop.
(ii) Find the independent variable that leads to the greatest reduction in the residual sum of squares when added to the current model.
(iii) If the F-test for this variable is greater than a chosen critical value, Fc, then include the variable in the model and go to (ii), otherwise stop.
At any step the variables not in the model are known as the free terms.
nag_correg_linregm_fit_onestep (g02ee) allows you to specify some independent variables that must be in the model, these are known as forced variables.
The computational procedure involves the use of QR decompositions, the R and the Q matrices being updated as each new variable is added to the model. In addition the matrix QTXfree, where Xfree is the matrix of variables not included in the model, is updated.
nag_correg_linregm_fit_onestep (g02ee) computes one step of the forward selection procedure at a call. The results produced at each step may be printed or used as inputs to nag_correg_linregm_update (g02dd), in order to compute the regression coefficients for the model fitted at that step. Repeated calls to nag_correg_linregm_fit_onestep (g02ee) should be made until F<Fc is indicated.

References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Weisberg S (1985) Applied Linear Regression Wiley

Parameters

Note:  after the initial call to nag_correg_linregm_fit_onestep (g02ee) with istep=0 all arguments except fin must not be changed by you between calls.

Compulsory Input Parameters

1:     istep int64int32nag_int scalar
Indicates which step in the forward selection process is to be carried out.
istep=0
The process is initialized.
Constraint: istep0.
2:     mean_p – string (length ≥ 1)
Indicates if a mean term is to be included.
mean_p='M'
A mean term, intercept, will be included in the model.
mean_p='Z'
The model will pass through the origin, zero-point.
Constraint: mean_p='M' or 'Z'.
3:     xldxm – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn.
xij must contain the ith observation for the jth independent variable, for i=1,2,,n and j=1,2,,m.
4:     vnamem – cell array of strings
vnamej must contain the name of the independent variable in column j of x, for j=1,2,,m.
5:     isxm int64int32nag_int array
Indicates which independent variables could be considered for inclusion in the regression.
isxj2
The variable contained in the jth column of x is automatically included in the regression model, for j=1,2,,m.
isxj=1
The variable contained in the jth column of x is considered for inclusion in the regression model, for j=1,2,,m.
isxj=0
The variable in the jth column is not considered for inclusion in the model, for j=1,2,,m.
Constraint: isxj0 and at least one value of isxj=1, for j=1,2,,m.
6:     yn – double array
The dependent variable.
7:     modelmaxip – cell array of strings
If istep=0, model need not be set.
If istep0, model must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: the declared size of model must be greater than or equal to the declared size of vname.
8:     nterm int64int32nag_int scalar
If istep=0, nterm need not be set.
If istep0, nterm must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: if istep0, nterm>0.
9:     rss – double scalar
If istep=0, rss need not be set.
If istep0, rss must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: if istep0, rss>0.0.
10:   idf int64int32nag_int scalar
If istep=0, idf need not be set.
If istep0, idf must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
11:   ifr int64int32nag_int scalar
If istep=0, ifr need not be set.
If istep0, ifr must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
12:   freemaxip – cell array of strings
If istep=0, free need not be set.
If istep0, free must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: the declared size of free must be greater than or equal to the declared size of vname.
13:   qldqmaxip+2 – double array
ldq, the first dimension of the array, must satisfy the constraint ldqn.
If istep=0, q need not be set.
If istep0, q must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
14:   pmaxip+1 – double array
If istep=0, p need not be set.
If istep0, p must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the dimension of the array y and the first dimension of the arrays x, q. (An error is raised if these dimensions are not equal.)
n, the number of observations.
Constraint: n2.
2:     m int64int32nag_int scalar
Default: the second dimension of the array x and the dimension of the arrays vname, isx. (An error is raised if these dimensions are not equal.)
m, the total number of independent variables in the dataset.
Constraint: m1.
3:     maxip int64int32nag_int scalar
Default: the dimension of the arrays model, free. (An error is raised if these dimensions are not equal.)
The maximum number of independent variables to be included in the model.
Constraints:
  • if mean_p='M', maxip1+ number of values of isx>0;
  • if mean_p='Z', maxip number of values of isx>0.
4:     wt: – double array
The dimension of the array wt must be at least n if weight='W'
If weight='W', wt must contain the weights to be used in the weighted regression, W.
If wti=0.0, the ith observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If weight='U', wt is not referenced and the effective number of observations is n.
Constraint: if weight='W', wti0.0, for i=1,2,,n.
5:     fin – double scalar
Default: 2.0 is a commonly used value in exploratory modelling.
The critical value of the F statistic for the term to be included in the model, Fc.
Constraint: fin0.0.

Output Parameters

1:     istep int64int32nag_int scalar
Is incremented by 1.
2:     addvar – logical scalar
Indicates if a variable has been added to the model.
addvar=true
A variable has been added to the model.
addvar=false
No variable had an F value greater than Fc and none were added to the model.
3:     newvar – string
If addvar=true, newvar contains the name of the variable added to the model.
4:     chrss – double scalar
If addvar=true, chrss contains the change in the residual sum of squares due to adding variable newvar.
5:     f – double scalar
If addvar=true, f contains the F statistic for the inclusion of the variable in newvar.
6:     modelmaxip – cell array of strings
The names of the variables in the current model.
7:     nterm int64int32nag_int scalar
The number of independent variables in the current model, not including the mean, if any.
8:     rss – double scalar
The residual sums of squares for the current model.
9:     idf int64int32nag_int scalar
The degrees of freedom for the residual sum of squares for the current model.
10:   ifr int64int32nag_int scalar
The number of free independent variables, i.e., the number of variables not in the model that are still being considered for selection.
11:   freemaxip – cell array of strings
The first ifr values of free contain the names of the free variables.
12:   exssmaxip – double array
The first ifr values of exss contain what would be the change in regression sum of squares if the free variables had been added to the model, i.e., the extra sum of squares for the free variables. exssi contains what would be the change in regression sum of squares if the variable freei had been added to the model.
13:   qldqmaxip+2 – double array
The results of the QR decomposition for the current model:
  • the first column of q contains c=QTy (or QTW12y where W is the vector of weights if used);
  • the upper triangular part of columns 2 to p+1 contain the R matrix;
  • the strictly lower triangular part of columns 2 to p+1 contain details of the Q matrix;
  • the remaining p+1 to p+ifr columns of contain QTXfree (or QTW12Xfree),
where p=nterm, or p=nterm+1 if mean_p='M'
14:   pmaxip+1 – double array
The first p elements of p contain details of the QR decomposition, where p=nterm, or p=nterm+1 if mean_p='M'.
15:   ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=1
On entry,n<1,
orm<1,
orldx<n,
orldq<n,
oristep<0,
oristep0 and nterm=0,
oristep0 and rss0.0,
orfin<0.0,
ormean_p'M' or 'Z',
or weight'U' or 'W'.
   ifail=2
On entry, weight='W' and a value of wt<0.0.
   ifail=3
On entry, the degrees of freedom will be zero if a variable is selected, i.e., the number of variables in the model plus 1 is equal to the effective number of observations.
   ifail=4
On entry,a value of isx<0,
orthere are no forced or free variables, i.e., no element of isx>0,
orthe value of maxip is too small for number of variables indicated by isx.
   ifail=5
On entry, the variables forced into the model are not of full rank, i.e., some of these variables are linear combinations of others.
   ifail=6
On entry,there are no free variables, i.e., no element of isx=0.
   ifail=7
The value of the change in the sum of squares is greater than the input value of rss. This may occur due to rounding errors if the true residual sum of squares for the new model is small relative to the residual sum of squares for the previous model.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

As nag_correg_linregm_fit_onestep (g02ee) uses a QR transformation the results will often be more accurate than traditional algorithms using methods based on the cross-products of the dependent and independent variables.

Further Comments

None.

Example

The data, from an oxygen uptake experiment, is given by Weisberg (1985). The names of the variables are as given in Weisberg (1985). The independent and dependent variables are read and nag_correg_linregm_fit_onestep (g02ee) is repeatedly called until addvar=false. At each step the F statistic, the free variables and their extra sum of squares are printed; also, except for when addvar=false, the new variable, the change in the residual sum of squares and the terms in the model are printed.
function g02ee_example


fprintf('g02ee example results\n\n');

x = [  0, 1125, 232, 7160, 85.9, 8905;
       7,  920, 268, 8804, 86.5, 7388;
      15,  835, 271, 8108, 85.2, 5348;
      22, 1000, 237, 6370, 83.8, 8056;
      29, 1150, 192, 6441, 82.1, 6960;
      37,  990, 202, 5154, 79.2, 5690;
      44,  840, 184, 5896, 81.2, 6932;
      58,  650, 200, 5336, 80.6, 5400;
      65,  640, 180, 5041, 78.4, 3177;
      72,  583, 165, 5012, 79.3, 4461;
      80,  570, 151, 4825, 78.7, 3901;
      86,  570, 171, 4391, 78.0, 5002;
      93,  510, 243, 4320, 72.3, 4665;
     100,  555, 147, 3709, 74.9, 4642;
     107,  460, 286, 3969, 74.4, 4840;
     122,  275, 198, 3558, 72.5, 4479;
     129,  510, 196, 4361, 57.7, 4200;
     151,  165, 210, 3301, 71.8, 3410;
     171,  244, 327, 2964, 72.5, 3360;
     220,   79, 334, 2777, 71.9, 2599];
y = [ 1.5563;  0.8976;  0.7482;  0.7160;  0.3010;
      0.3617;  0.1139;  0.1139; -0.2218; -0.1549;
      0.0000;  0.0000; -0.0969; -0.2218; -0.3979;
     -0.1549; -0.2218; -0.3979; -0.5229; -0.0458];
[n,m] = size(x);

mean_p = 'M';
isx = ones(m,1,'int64');
isx(1) = 0;
isx(m) = 2;
vname = {'DAY'; 'BOD'; 'TKN'; 'TS '; 'TVS'; 'COD'};

nzero = int64(0);
model = {'   '; '   '; '   '; '   '; '   '; '   '};
nterm = nzero;
rss = 0;
idf = nzero;
ifr = nzero;
free = model;
q = zeros(n,m+2);
p = zeros(m+1,1);

% Loop attempting to add each variable in turn
istep = nzero;
addvar = true;
while addvar
  [istep, addvar, newvar, chrss, f, model, nterm, ...
   rss, idf, ifr, free, exss, q, p, ifail] = ...
  g02ee( ...
         istep, mean_p, x, vname, isx, y, model, nterm, ...
         rss, idf, ifr, free, q, p);

  % Display the results at each step
  fprintf('Step %3d\n', istep);
  if ~addvar
    fprintf('No further variables added max  F = %7.2f\n', f);
  else
    fprintf('Added variable is %s\n', newvar);
    fprintf('Change in residual sum of squares = %13.4e\n', chrss);
    fprintf('F Statistic                       = %7.2f\n\n', f);
    fprintf('Variables in model              :');
    fprintf('     %s', model{1:nterm,1});
    fprintf('\n\nResidual sum of squares           = %13.4e\n', rss);
    fprintf('Degrees of freedom                = %2d\n\n', idf);
  end
  if ifr==0
    fprintf('No free variables remaining\n');
    addvar = false;
  else
    fprintf('Free variables                  :')
    fprintf('     %s', free{1:ifr,1});
    fprintf('\nChange in RSS for free variables:\n%33s',' ');
    fprintf('%8.4f', exss(1:ifr));
    fprintf('\n\n');
  end
end


g02ee example results

Step   1
Added variable is TS 
Change in residual sum of squares =    4.7126e-01
F Statistic                       =    7.38

Variables in model              :     COD     TS 

Residual sum of squares           =    1.0850e+00
Degrees of freedom                = 17

Free variables                  :     TKN     BOD     TVS
Change in RSS for free variables:
                                   0.1175  0.0600  0.2276

Step   2
No further variables added max  F =    1.59
Free variables                  :     TKN     BOD     TVS
Change in RSS for free variables:
                                   0.0979  0.0207  0.0217


PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015