NAG CL Interface
g02eec (linregm_​fit_​onestep)

Settings help

CL Name Style:


1 Purpose

g02eec carries out one step of a forward selection procedure in order to enable the ‘best’ linear regression model to be found.

2 Specification

#include <nag.h>
void  g02eec (Nag_OrderType order, Integer *istep, Nag_IncludeMean mean, Integer n, Integer m, const double x[], Integer pdx, const char *var_names[], const Integer sx[], Integer maxip, const double y[], const double wt[], double fin, Nag_Boolean *addvar, const char *newvar[], double *chrss, double *f, const char *model[], Integer *nterm, double *rss, Integer *idf, Integer *ifr, const char *free_vars[], double exss[], double q[], Integer pdq, double p[], NagError *fail)
The function may be called by the names: g02eec, nag_correg_linregm_fit_onestep or nag_step_regsn.

3 Description

One method of selecting a linear regression model from a given set of independent variables is by forward selection. The following procedure is used:
  1. (i)Select the best fitting independent variable, i.e., the independent variable which gives the smallest residual sum of squares. If the F-test for this variable is greater than a chosen critical value, Fc, then include the variable in the model, else stop.
  2. (ii)Find the independent variable that leads to the greatest reduction in the residual sum of squares when added to the current model.
  3. (iii)If the F-test for this variable is greater than a chosen critical value, Fc, then include the variable in the model and go to (ii), otherwise stop.
At any step the variables not in the model are known as the free terms.
g02eec allows you to specify some independent variables that must be in the model, these are known as forced variables.
The computational procedure involves the use of QR decompositions, the R and the Q matrices being updated as each new variable is added to the model. In addition the matrix QTXfree, where Xfree is the matrix of variables not included in the model, is updated.
g02eec computes one step of the forward selection procedure at a call. The results produced at each step may be printed or used as inputs to g02ddc, in order to compute the regression coefficients for the model fitted at that step. Repeated calls to g02eec should be made until F<Fc is indicated.

4 References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Weisberg S (1985) Applied Linear Regression Wiley

5 Arguments

Note:  after the initial call to g02eec with istep=0 all arguments except fin must not be changed by you between calls.
1: order Nag_OrderType Input
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by order=Nag_RowMajor. See Section 3.1.3 in the Introduction to the NAG Library CL Interface for a more detailed explanation of the use of this argument.
Constraint: order=Nag_RowMajor or Nag_ColMajor.
2: istep Integer * Input/Output
On entry: indicates which step in the forward selection process is to be carried out.
istep=0
The process is initialized.
Constraint: istep0.
On exit: is incremented by 1.
3: mean Nag_IncludeMean Input
On entry: indicates if a mean term is to be included.
mean=Nag_MeanInclude
A mean term, intercept, will be included in the model.
mean=Nag_MeanZero
The model will pass through the origin, zero-point.
Constraint: mean=Nag_MeanInclude or Nag_MeanZero.
4: n Integer Input
On entry: n, the number of observations.
Constraint: n2.
5: m Integer Input
On entry: m, the total number of independent variables in the dataset.
Constraint: m1.
6: x[dim] const double Input
Note: the dimension, dim, of the array x must be at least
  • max(1,pdx×m) when order=Nag_ColMajor;
  • max(1,n×pdx) when order=Nag_RowMajor.
where X(i,j) appears in this document, it refers to the array element
  • x[(j-1)×pdx+i-1] when order=Nag_ColMajor;
  • x[(i-1)×pdx+j-1] when order=Nag_RowMajor.
On entry: X(i,j) must contain the ith observation for the jth independent variable, for i=1,2,,n and j=1,2,,m.
7: pdx Integer Input
On entry: the stride separating row or column elements (depending on the value of order) in the array x.
Constraints:
  • if order=Nag_ColMajor, pdxn;
  • if order=Nag_RowMajor, pdxm.
8: var_names[m] const char * Input
On entry: var_names[i-1] must contain the name of the independent variable in row i of x, for i=1,2,,m.
9: sx[m] const Integer Input
On entry: indicates which independent variables could be considered for inclusion in the regression.
sx[j-1]2
The variable contained in the jth column of x is automatically included in the regression model, for j=1,2,,m.
sx[j-1]=1
The variable contained in the jth column of x is considered for inclusion in the regression model, for j=1,2,,m.
sx[j-1]=0
The variable in the jth column is not considered for inclusion in the model, for j=1,2,,m.
Constraint: sx[j-1]0 and at least one value of sx[j-1]=1, for j=1,2,,m.
10: maxip Integer Input
On entry: the maximum number of independent variables to be included in the model.
Constraints:
  • if mean=Nag_MeanInclude, maxip1+ number of values of sx>0;
  • if mean=Nag_MeanZero, maxip number of values of sx>0.
11: y[n] const double Input
On entry: the dependent variable.
12: wt[dim] const double Input
Note: the dimension, dim, of the array wt must be at least n.
On entry: if provided wt must contain the weights to be used with the model.
If wt[i-1]=0.0, the ith observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If wt is not provided the effective number of observations is n.
Constraint: if wtis notNULL, wt[i]0.0, for i=0,1,,n-1.
13: fin double Input
On entry: the critical value of the F statistic for the term to be included in the model, Fc.
Suggested value: 2.0 is a commonly used value in exploratory modelling.
Constraint: fin0.0.
14: addvar Nag_Boolean * Output
On exit: indicates if a variable has been added to the model.
addvar=Nag_TRUE
A variable has been added to the model.
addvar=Nag_FALSE
No variable had an F value greater than Fc and none were added to the model.
15: newvar[1] const char * Output
On exit: if addvar=Nag_TRUE, newvar contains the name of the variable added to the model.
16: chrss double * Output
On exit: if addvar=Nag_TRUE, chrss contains the change in the residual sum of squares due to adding variable newvar.
17: f double * Output
On exit: if addvar=Nag_TRUE, f contains the F statistic for the inclusion of the variable in newvar.
18: model[maxip] const char * Input/Output
On entry: if istep=0, model need not be set.
If istep0, model must contain the values returned by the previous call to g02eec.
On exit: the names of the variables in the current model.
19: nterm Integer * Input/Output
On entry: if istep=0, nterm need not be set.
If istep0, nterm must contain the value returned by the previous call to g02eec.
On exit: the number of independent variables in the current model, not including the mean, if any.
20: rss double * Input/Output
On entry: if istep=0, rss need not be set.
If istep0, rss must contain the value returned by the previous call to g02eec.
On exit: the residual sums of squares for the current model.
21: idf Integer * Input/Output
On entry: if istep=0, idf need not be set.
If istep0, idf must contain the value returned by the previous call to g02eec.
On exit: the degrees of freedom for the residual sum of squares for the current model.
22: ifr Integer * Input/Output
On entry: if istep=0, ifr need not be set.
If istep0, ifr must contain the value returned by the previous call to g02eec.
On exit: the number of free independent variables, i.e., the number of variables not in the model that are still being considered for selection.
23: free_vars[maxip] const char * Input/Output
On entry: if istep=0, free_vars need not be set.
If istep0, free_vars must contain the values returned by the previous call to g02eec.
On exit: the first ifr values of free_vars contain the names of the free variables.
24: exss[maxip] double Output
On exit: the first ifr values of exss contain what would be the change in regression sum of squares if the free variables had been added to the model, i.e., the extra sum of squares for the free variables. exss[i-1] contains what would be the change in regression sum of squares if the variable free_vars[i-1] had been added to the model.
25: q[dim] double Input/Output
Note: the dimension, dim, of the array q must be at least
  • max(1,pdq×maxip+2) when order=Nag_ColMajor;
  • max(1,n×pdq) when order=Nag_RowMajor.
the (i,j)th element of the matrix Q is stored in
  • q[(j-1)×pdq+i-1] when order=Nag_ColMajor;
  • q[(i-1)×pdq+j-1] when order=Nag_RowMajor.
On entry: if istep=0, q need not be set.
If istep0, q must contain the values returned by the previous call to g02eec.
On exit: the results of the QR decomposition for the current model:
  • the first column of q contains c=QTy (or QTW12y where W is the vector of weights if used);
  • the upper triangular part of columns 2 to p+1 contain the R matrix;
  • the strictly lower triangular part of columns 2 to p+1 contain details of the Q matrix;
  • the remaining p+1 to p+ifr columns of contain QTXfree (or QTW12Xfree),
where p=nterm, or p=nterm+1 if mean=Nag_MeanInclude.
26: pdq Integer Input
On entry: the stride separating row or column elements (depending on the value of order) in the array q.
Constraints:
  • if order=Nag_ColMajor, pdqn;
  • if order=Nag_RowMajor, pdqmaxip+2.
27: p[maxip+1] double Input/Output
On entry: if istep=0, p need not be set.
If istep0, p must contain the values returned by the previous call to g02eec.
On exit: the first p elements of p contain details of the QR decomposition, where p=nterm, or p=nterm+1 if mean=Nag_MeanInclude.
28: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_DENOM_ZERO
The value of the change in the sum of squares is greater than the input value of rss. This may occur due to rounding errors if the true residual sum of squares for the new model is small relative to the residual sum of squares for the previous model.
NE_FREE_VARS
There are no free variables, i.e., no element of sx=0.
NE_FULL_RANK
On entry, the variables forced into the model are not of full rank, i.e., some of these variables are linear combinations of others.
NE_INT
On entry, istep=value.
Constraint: istep0.
On entry, m=value.
Constraint: m1.
On entry, n=value.
Constraint: n2.
On entry, pdq=value.
Constraint: pdq>0.
On entry, pdx=value.
Constraint: pdx>0.
NE_INT_2
On entry, istep=value and nterm=value.
Constraint: if istep0, nterm>0.
On entry, pdq=value and n=value.
Constraint: pdqn.
On entry, pdx=value and m=value.
Constraint: pdxm.
On entry, pdx=value and n=value.
Constraint: pdxn.
NE_INT_ARRAY
On entry, maxip=value.
Constraint: maxip must be large enough to accommodate the number of terms given by sx.
NE_INT_ARRAY_ELEM_CONS
On entry, sx[value]<0.
Constraint: sx[i-1]0, for i=1,2,,m.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_REAL
On entry, fin=value.
Constraint: fin0.0.
On entry, rss=value.
Constraint: rss>0.0.
NE_REAL_ARRAY_ELEM_CONS
On entry, wt[value]<0.0.
Constraint: wt[i-1]0.0, for i=1,2,,n.
NE_ZERO_DF
Degrees of freedom for error will equal 0 if new variable is added, i.e., the number of variables in the model plus 1 is equal to the effective number of observations.
On entry, number of forced variables n.
NE_ZERO_VARS
On entry, sx[i-1]=0, for all i=1,2,,m.
Constraint: at least one value of sx must be nonzero.

7 Accuracy

As g02eec uses a QR transformation the results will often be more accurate than traditional algorithms using methods based on the cross-products of the dependent and independent variables.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.
g02eec is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02eec makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9 Further Comments

None.

10 Example

The data, from an oxygen uptake experiment, is given by Weisberg (1985). The names of the variables are as given in Weisberg (1985). The independent and dependent variables are read and g02eec is repeatedly called until addvar=Nag_FALSE. At each step the F statistic, the free variables and their extra sum of squares are printed; also, except for when addvar=Nag_FALSE, the new variable, the change in the residual sum of squares and the terms in the model are printed.

10.1 Program Text

Program Text (g02eece.c)

10.2 Program Data

Program Data (g02eece.d)

10.3 Program Results

Program Results (g02eece.r)