hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_linregm_fit_onestep (g02ee)

Purpose

nag_correg_linregm_fit_onestep (g02ee) carries out one step of a forward selection procedure in order to enable the ‘best’ linear regression model to be found.

Syntax

[istep, addvar, newvar, chrss, f, model, nterm, rss, idf, ifr, free, exss, q, p, ifail] = g02ee(istep, mean, x, vname, isx, y, model, nterm, rss, idf, ifr, free, q, p, 'n', n, 'm', m, 'maxip', maxip, 'wt', wt, 'fin', fin)
[istep, addvar, newvar, chrss, f, model, nterm, rss, idf, ifr, free, exss, q, p, ifail] = nag_correg_linregm_fit_onestep(istep, mean, x, vname, isx, y, model, nterm, rss, idf, ifr, free, q, p, 'n', n, 'm', m, 'maxip', maxip, 'wt', wt, 'fin', fin)

Description

One method of selecting a linear regression model from a given set of independent variables is by forward selection. The following procedure is used:
(i) Select the best fitting independent variable, i.e., the independent variable which gives the smallest residual sum of squares. If the FF-test for this variable is greater than a chosen critical value, FcFc, then include the variable in the model, else stop.
(ii) Find the independent variable that leads to the greatest reduction in the residual sum of squares when added to the current model.
(iii) If the FF-test for this variable is greater than a chosen critical value, FcFc, then include the variable in the model and go to (ii), otherwise stop.
At any step the variables not in the model are known as the free terms.
nag_correg_linregm_fit_onestep (g02ee) allows you to specify some independent variables that must be in the model, these are known as forced variables.
The computational procedure involves the use of QRQR decompositions, the RR and the QQ matrices being updated as each new variable is added to the model. In addition the matrix QTXfreeQTXfree, where XfreeXfree is the matrix of variables not included in the model, is updated.
nag_correg_linregm_fit_onestep (g02ee) computes one step of the forward selection procedure at a call. The results produced at each step may be printed or used as inputs to nag_correg_linregm_update (g02dd), in order to compute the regression coefficients for the model fitted at that step. Repeated calls to nag_correg_linregm_fit_onestep (g02ee) should be made until F < FcF<Fc is indicated.

References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Weisberg S (1985) Applied Linear Regression Wiley

Parameters

Note:  after the initial call to nag_correg_linregm_fit_onestep (g02ee) with istep = 0istep=0 all parameters except fin must not be changed by you between calls.

Compulsory Input Parameters

1:     istep – int64int32nag_int scalar
Indicates which step in the forward selection process is to be carried out.
istep = 0istep=0
The process is initialized.
Constraint: istep0istep0.
2:     mean – string (length ≥ 1)
Indicates if a mean term is to be included.
mean = 'M'mean='M'
A mean term, intercept, will be included in the model.
mean = 'Z'mean='Z'
The model will pass through the origin, zero-point.
Constraint: mean = 'M'mean='M' or 'Z''Z'.
3:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnldxn.
x(i,j)xij must contain the iith observation for the jjth independent variable, for i = 1,2,,ni=1,2,,n and j = 1,2,,mj=1,2,,m.
4:     vname(m) – cell array of strings
m, the dimension of the array, must satisfy the constraint m1m1.
vname(j)vnamej must contain the name of the independent variable in column jj of x, for j = 1,2,,mj=1,2,,m.
5:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint m1m1.
Indicates which independent variables could be considered for inclusion in the regression.
isx(j)2isxj2
The variable contained in the jjth column of x is automatically included in the regression model, for j = 1,2,,mj=1,2,,m.
isx(j) = 1isxj=1
The variable contained in the jjth column of x is considered for inclusion in the regression model, for j = 1,2,,mj=1,2,,m.
isx(j) = 0isxj=0
The variable in the jjth column is not considered for inclusion in the model, for j = 1,2,,mj=1,2,,m.
Constraint: isx(j)0isxj0 and at least one value of isx(j) = 1isxj=1, for j = 1,2,,mj=1,2,,m.
6:     y(n) – double array
n, the dimension of the array, must satisfy the constraint n2n2.
The dependent variable.
7:     model(maxip) – cell array of strings
maxip, the dimension of the array, must satisfy the constraint
  • if mean = 'M'mean='M', maxip1 + maxip1+ number of values of isx > 0isx>0;
  • if mean = 'Z'mean='Z', maxipmaxip number of values of isx > 0isx>0.
If istep = 0istep=0, model need not be set.
If istep0istep0, model must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: the declared size of model must be greater than or equal to the declared size of vname.
8:     nterm – int64int32nag_int scalar
If istep = 0istep=0, nterm need not be set.
If istep0istep0, nterm must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: if istep0istep0, nterm > 0nterm>0.
9:     rss – double scalar
If istep = 0istep=0, rss need not be set.
If istep0istep0, rss must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: if istep0istep0, rss > 0.0rss>0.0.
10:   idf – int64int32nag_int scalar
If istep = 0istep=0, idf need not be set.
If istep0istep0, idf must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
11:   ifr – int64int32nag_int scalar
If istep = 0istep=0, ifr need not be set.
If istep0istep0, ifr must contain the value returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
12:   free(maxip) – cell array of strings
maxip, the dimension of the array, must satisfy the constraint
  • if mean = 'M'mean='M', maxip1 + maxip1+ number of values of isx > 0isx>0;
  • if mean = 'Z'mean='Z', maxipmaxip number of values of isx > 0isx>0.
If istep = 0istep=0, free need not be set.
If istep0istep0, free must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
Constraint: the declared size of free must be greater than or equal to the declared size of vname.
13:   q(ldq,maxip + 2maxip+2) – double array
ldq, the first dimension of the array, must satisfy the constraint ldqnldqn.
If istep = 0istep=0, q need not be set.
If istep0istep0, q must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).
14:   p(maxip + 1maxip+1) – double array
If istep = 0istep=0, p need not be set.
If istep0istep0, p must contain the values returned by the previous call to nag_correg_linregm_fit_onestep (g02ee).

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array y and the first dimension of the arrays x, q. (An error is raised if these dimensions are not equal.)
nn, the number of observations.
Constraint: n2n2.
2:     m – int64int32nag_int scalar
Default: The second dimension of the array x and the dimension of the arrays vname, isx. (An error is raised if these dimensions are not equal.)
mm, the total number of independent variables in the dataset.
Constraint: m1m1.
3:     maxip – int64int32nag_int scalar
Default: The dimension of the arrays model, free. (An error is raised if these dimensions are not equal.)
The maximum number of independent variables to be included in the model.
Constraints:
  • if mean = 'M'mean='M', maxip1 + maxip1+ number of values of isx > 0isx>0;
  • if mean = 'Z'mean='Z', maxipmaxip number of values of isx > 0isx>0.
4:     wt( : :) – double array
Note: the dimension of the array wt must be at least nn if weight = 'W'weight='W'.
If weight = 'W'weight='W', wt must contain the weights to be used in the weighted regression, WW.
If wt(i) = 0.0wti=0.0, the iith observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If weight = 'U'weight='U', wt is not referenced and the effective number of observations is n.
Constraint: if weight = 'W'weight='W', wt(i)0.0wti0.0, for i = 1,2,,ni=1,2,,n.
5:     fin – double scalar
The critical value of the FF statistic for the term to be included in the model, FcFc.
Default: 2.02.0 is a commonly used value in exploratory modelling.
Constraint: fin0.0fin0.0.

Input Parameters Omitted from the MATLAB Interface

weight ldx ldq wk

Output Parameters

1:     istep – int64int32nag_int scalar
Is incremented by 11.
2:     addvar – logical scalar
Indicates if a variable has been added to the model.
addvar = trueaddvar=true
A variable has been added to the model.
addvar = falseaddvar=false
No variable had an FF value greater than FcFc and none were added to the model.
3:     newvar – string
If addvar = trueaddvar=true, newvar contains the name of the variable added to the model.
4:     chrss – double scalar
If addvar = trueaddvar=true, chrss contains the change in the residual sum of squares due to adding variable newvar.
5:     f – double scalar
If addvar = trueaddvar=true, f contains the FF statistic for the inclusion of the variable in newvar.
6:     model(maxip) – cell array of strings
The names of the variables in the current model.
7:     nterm – int64int32nag_int scalar
The number of independent variables in the current model, not including the mean, if any.
8:     rss – double scalar
The residual sums of squares for the current model.
9:     idf – int64int32nag_int scalar
The degrees of freedom for the residual sum of squares for the current model.
10:   ifr – int64int32nag_int scalar
The number of free independent variables, i.e., the number of variables not in the model that are still being considered for selection.
11:   free(maxip) – cell array of strings
The first ifr values of free contain the names of the free variables.
12:   exss(maxip) – double array
The first ifr values of exss contain what would be the change in regression sum of squares if the free variables had been added to the model, i.e., the extra sum of squares for the free variables. exss(i)exssi contains what would be the change in regression sum of squares if the variable free(i)freei had been added to the model.
13:   q(ldq,maxip + 2maxip+2) – double array
ldqnldqn.
The results of the QRQR decomposition for the current model:
  • the first column of q contains c = QTyc=QTy (or QTW(1/2)yQTW12y where WW is the vector of weights if used);
  • the upper triangular part of columns 22 to p + 1p+1 contain the RR matrix;
  • the strictly lower triangular part of columns 22 to p + 1p+1 contain details of the QQ matrix;
  • the remaining p + 1p+1 to p + ifrp+ifr columns of contain QTXfreeQTXfree (or QTW(1/2)XfreeQTW12Xfree),
where p = ntermp=nterm, or p = nterm + 1p=nterm+1 if mean = 'M'mean='M'
14:   p(maxip + 1maxip+1) – double array
The first pp elements of p contain details of the QRQR decomposition, where p = ntermp=nterm, or p = nterm + 1p=nterm+1 if mean = 'M'mean='M'.
15:   ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,n < 1n<1,
orm < 1m<1,
orldx < nldx<n,
orldq < nldq<n,
oristep < 0istep<0,
oristep0istep0 and nterm = 0nterm=0,
oristep0istep0 and rss0.0rss0.0,
orfin < 0.0fin<0.0,
ormean'M'mean'M' or 'Z''Z',
or weight'U'weight'U' or 'W''W'.
  ifail = 2ifail=2
On entry, weight = 'W'weight='W' and a value of wt < 0.0wt<0.0.
  ifail = 3ifail=3
On entry, the degrees of freedom will be zero if a variable is selected, i.e., the number of variables in the model plus 11 is equal to the effective number of observations.
  ifail = 4ifail=4
On entry,a value of isx < 0isx<0,
orthere are no forced or free variables, i.e., no element of isx > 0isx>0,
orthe value of maxip is too small for number of variables indicated by isx.
  ifail = 5ifail=5
On entry, the variables forced into the model are not of full rank, i.e., some of these variables are linear combinations of others.
  ifail = 6ifail=6
On entry,there are no free variables, i.e., no element of isx = 0isx=0.
  ifail = 7ifail=7
The value of the change in the sum of squares is greater than the input value of rss. This may occur due to rounding errors if the true residual sum of squares for the new model is small relative to the residual sum of squares for the previous model.

Accuracy

As nag_correg_linregm_fit_onestep (g02ee) uses a QRQR transformation the results will often be more accurate than traditional algorithms using methods based on the cross-products of the dependent and independent variables.

Further Comments

None.

Example

function nag_correg_linregm_fit_onestep_example
istep = int64(0);
mean_p = 'M';
x = [0, 1125, 232, 7160, 85.9, 8905;
     7, 920, 268, 8804, 86.5, 7388;
     15, 835, 271, 8108, 85.2, 5348;
     22, 1000, 237, 6370, 83.8, 8056;
     29, 1150, 192, 6441, 82.1, 6960;
     37, 990, 202, 5154, 79.2, 5690;
     44, 840, 184, 5896, 81.2, 6932;
     58, 650, 200, 5336, 80.6, 5400;
     65, 640, 180, 5041, 78.4, 3177;
     72, 583, 165, 5012, 79.3, 4461;
     80, 570, 151, 4825, 78.7, 3901;
     86, 570, 171, 4391, 78, 5002;
     93, 510, 243, 4320, 72.3, 4665;
     100, 555, 147, 3709, 74.9, 4642;
     107, 460, 286, 3969, 74.4, 4840;
     122, 275, 198, 3558, 72.5, 4479;
     129, 510, 196, 4361, 57.7, 4200;
     151, 165, 210, 3301, 71.8, 3410;
     171, 244, 327, 2964, 72.5, 3360;
     220, 79, 334, 2777, 71.9, 2599];
vname = {'DAY'; 'BOD'; 'TKN'; 'TS '; 'TVS'; 'COD'};
isx = [int64(0);1;1;1;1;2];
y = [1.5563;
     0.8976;
     0.7482;
     0.716;
     0.301;
     0.3617;
     0.1139;
     0.1139;
     -0.2218;
     -0.1549;
     0;
     0;
     -0.0969;
     -0.2218;
     -0.3979;
     -0.1549;
     -0.2218;
     -0.3979;
     -0.5229;
     -0.0458];
model = {'   '; '   '; '   '; '   '; '   '; '   '};
nterm = int64(0);
rss = 0;
idf = int64(0);
ifr = int64(0);
free = {'   '; '   '; '   '; '   '; '   '; '   '};
q = zeros(20,8);
p = zeros(7,1);
[istepOut, addvar, newvar, chrss, f, modelOut, ntermOut, rssOut, idfOut, ...
    ifrOut, freeOut, exss, qOut, pOut, ifail] = ...
    nag_correg_linregm_fit_onestep(istep, mean_p, x, vname, isx, y, model, nterm, rss, idf, ifr, free, q, p)
 

istepOut =

                    1


addvar =

     1


newvar =

TS 


chrss =

    0.4713


f =

    7.3834


modelOut = 

    'COD'
    'TS '
    '   '
    '   '
    '   '
    '   '


ntermOut =

                    2


rssOut =

    1.0850


idfOut =

                   17


ifrOut =

                    3


freeOut = 

    'TKN'
    'BOD'
    'TVS'
    'TVS'
    '   '
    '   '


exss =

    0.1175
    0.0600
    0.2276
    0.2276
         0
         0


qOut =

   1.0e+04 *

   -0.0001   -0.0004   -2.3124   -2.2695   -0.0983   -0.2833   -0.0346         0
   -0.0002    0.0000   -0.7418   -0.5518    0.0017   -0.1165   -0.0019         0
   -0.0001    0.0000   -0.0000   -0.4602   -0.0010   -0.0363   -0.0011         0
   -0.0000    0.0000    0.0000   -0.0000    0.0018    0.0032    0.0001         0
   -0.0000    0.0000    0.0000   -0.0000   -0.0033    0.0266    0.0000         0
    0.0000    0.0000   -0.0000   -0.0000   -0.0018    0.0301    0.0001         0
   -0.0000    0.0000    0.0000   -0.0000   -0.0036   -0.0003    0.0001         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0023   -0.0028    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0051    0.0165    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0060    0.0005    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0075    0.0051    0.0002         0
   -0.0000    0.0000   -0.0000   -0.0000   -0.0046   -0.0008    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000    0.0025   -0.0036   -0.0003         0
   -0.0000    0.0000   -0.0000   -0.0000   -0.0067    0.0055    0.0001         0
   -0.0000    0.0000   -0.0000   -0.0000    0.0071   -0.0075   -0.0000         0
   -0.0000    0.0000   -0.0000   -0.0000   -0.0015   -0.0201   -0.0001         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0025   -0.0001   -0.0018         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0006   -0.0206   -0.0000         0
   -0.0000    0.0000   -0.0000   -0.0000    0.0113   -0.0099    0.0001         0
    0.0000    0.0000   -0.0000    0.0000    0.0118   -0.0188    0.0001         0


pOut =

    1.1062
    1.0986
    1.2981
         0
         0
         0
         0


ifail =

                    0


function g02ee_example
istep = int64(0);
mean_p = 'M';
x = [0, 1125, 232, 7160, 85.9, 8905;
     7, 920, 268, 8804, 86.5, 7388;
     15, 835, 271, 8108, 85.2, 5348;
     22, 1000, 237, 6370, 83.8, 8056;
     29, 1150, 192, 6441, 82.1, 6960;
     37, 990, 202, 5154, 79.2, 5690;
     44, 840, 184, 5896, 81.2, 6932;
     58, 650, 200, 5336, 80.6, 5400;
     65, 640, 180, 5041, 78.4, 3177;
     72, 583, 165, 5012, 79.3, 4461;
     80, 570, 151, 4825, 78.7, 3901;
     86, 570, 171, 4391, 78, 5002;
     93, 510, 243, 4320, 72.3, 4665;
     100, 555, 147, 3709, 74.9, 4642;
     107, 460, 286, 3969, 74.4, 4840;
     122, 275, 198, 3558, 72.5, 4479;
     129, 510, 196, 4361, 57.7, 4200;
     151, 165, 210, 3301, 71.8, 3410;
     171, 244, 327, 2964, 72.5, 3360;
     220, 79, 334, 2777, 71.9, 2599];
vname = {'DAY'; 'BOD'; 'TKN'; 'TS '; 'TVS'; 'COD'};
isx = [int64(0);1;1;1;1;2];
y = [1.5563;
     0.8976;
     0.7482;
     0.716;
     0.301;
     0.3617;
     0.1139;
     0.1139;
     -0.2218;
     -0.1549;
     0;
     0;
     -0.0969;
     -0.2218;
     -0.3979;
     -0.1549;
     -0.2218;
     -0.3979;
     -0.5229;
     -0.0458];
model = {'   '; '   '; '   '; '   '; '   '; '   '};
nterm = int64(0);
rss = 0;
idf = int64(0);
ifr = int64(0);
free = {'   '; '   '; '   '; '   '; '   '; '   '};
q = zeros(20,8);
p = zeros(7,1);
[istepOut, addvar, newvar, chrss, f, modelOut, ntermOut, rssOut, idfOut, ...
    ifrOut, freeOut, exss, qOut, pOut, ifail] = ...
    g02ee(istep, mean_p, x, vname, isx, y, model, nterm, rss, idf, ifr, free, q, p)
 

istepOut =

                    1


addvar =

     1


newvar =

TS 


chrss =

    0.4713


f =

    7.3834


modelOut = 

    'COD'
    'TS '
    '   '
    '   '
    '   '
    '   '


ntermOut =

                    2


rssOut =

    1.0850


idfOut =

                   17


ifrOut =

                    3


freeOut = 

    'TKN'
    'BOD'
    'TVS'
    'TVS'
    '   '
    '   '


exss =

    0.1175
    0.0600
    0.2276
    0.2276
         0
         0


qOut =

   1.0e+04 *

   -0.0001   -0.0004   -2.3124   -2.2695   -0.0983   -0.2833   -0.0346         0
   -0.0002    0.0000   -0.7418   -0.5518    0.0017   -0.1165   -0.0019         0
   -0.0001    0.0000   -0.0000   -0.4602   -0.0010   -0.0363   -0.0011         0
   -0.0000    0.0000    0.0000   -0.0000    0.0018    0.0032    0.0001         0
   -0.0000    0.0000    0.0000   -0.0000   -0.0033    0.0266    0.0000         0
    0.0000    0.0000   -0.0000   -0.0000   -0.0018    0.0301    0.0001         0
   -0.0000    0.0000    0.0000   -0.0000   -0.0036   -0.0003    0.0001         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0023   -0.0028    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0051    0.0165    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0060    0.0005    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0075    0.0051    0.0002         0
   -0.0000    0.0000   -0.0000   -0.0000   -0.0046   -0.0008    0.0002         0
   -0.0000    0.0000   -0.0000    0.0000    0.0025   -0.0036   -0.0003         0
   -0.0000    0.0000   -0.0000   -0.0000   -0.0067    0.0055    0.0001         0
   -0.0000    0.0000   -0.0000   -0.0000    0.0071   -0.0075   -0.0000         0
   -0.0000    0.0000   -0.0000   -0.0000   -0.0015   -0.0201   -0.0001         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0025   -0.0001   -0.0018         0
   -0.0000    0.0000   -0.0000    0.0000   -0.0006   -0.0206   -0.0000         0
   -0.0000    0.0000   -0.0000   -0.0000    0.0113   -0.0099    0.0001         0
    0.0000    0.0000   -0.0000    0.0000    0.0118   -0.0188    0.0001         0


pOut =

    1.1062
    1.0986
    1.2981
         0
         0
         0
         0


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013