NAG FL Interface
g02eaf (linregm_rssq)
1
Purpose
g02eaf calculates the residual sums of squares for all possible linear regressions for a given set of independent variables.
2
Specification
Fortran Interface
Subroutine g02eaf ( 
mean, weight, n, m, x, ldx, vname, isx, y, wt, nmod, modl, ldmodl, rss, nterms, mrank, wk, ifail) 
Integer, Intent (In) 
:: 
n, m, ldx, isx(m), ldmodl 
Integer, Intent (Inout) 
:: 
ifail 
Integer, Intent (Out) 
:: 
nmod, nterms(ldmodl), mrank(ldmodl) 
Real (Kind=nag_wp), Intent (In) 
:: 
x(ldx,m), y(n), wt(*) 
Real (Kind=nag_wp), Intent (Out) 
:: 
rss(ldmodl), wk(n*(m+1)) 
Character (*), Intent (In) 
:: 
vname(m) 
Character (*), Intent (Inout) 
:: 
modl(ldmodl,m) 
Character (1), Intent (In) 
:: 
mean, weight 

C Header Interface
#include <nag.h>
void 
g02eaf_ (const char *mean, const char *weight, const Integer *n, const Integer *m, const double x[], const Integer *ldx, const char vname[], const Integer isx[], const double y[], const double wt[], Integer *nmod, char modl[], const Integer *ldmodl, double rss[], Integer nterms[], Integer mrank[], double wk[], Integer *ifail, const Charlen length_mean, const Charlen length_weight, const Charlen length_vname, const Charlen length_modl) 

C++ Header Interface
#include <nag.h> extern "C" {
void 
g02eaf_ (const char *mean, const char *weight, const Integer &n, const Integer &m, const double x[], const Integer &ldx, const char vname[], const Integer isx[], const double y[], const double wt[], Integer &nmod, char modl[], const Integer &ldmodl, double rss[], Integer nterms[], Integer mrank[], double wk[], Integer &ifail, const Charlen length_mean, const Charlen length_weight, const Charlen length_vname, const Charlen length_modl) 
}

The routine may be called by the names g02eaf or nagf_correg_linregm_rssq.
3
Description
For a set of
$\mathit{k}$ possible independent variables there are
${2}^{\mathit{k}}$ linear regression models with from zero to
$\mathit{k}$ independent variables in each model. For example if
$\mathit{k}=3$ and the variables are
$A$,
$B$ and
$C$ then the possible models are:

(i)null model

(ii)$A$

(iii)$B$

(iv)$C$

(v)$A$ and $B$

(vi)$A$ and $C$

(vii)$B$ and $C$

(viii)$A$, $B$ and $C$.
g02eaf calculates the residual sums of squares from each of the
${2}^{\mathit{k}}$ possible models. The method used involves a
$QR$ decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see
Clark (1981) and
Smith and Bremner (1989).
The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the ${2}^{\mathit{k}}$th will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables.
g02eaf allows you to specify some independent variables that must be in the model, the forced variables. The other independent variables from which the possible models are to be formed are the free variables.
4
References
Clark M R B (1981) A Givens algorithm for moving from one linear model to another without going back to the data Appl. Statist. 30 198–203
Smith D M and Bremner J M (1989) All possible subset regressions using the $QR$ decomposition Comput. Statist. Data Anal. 7 217–236
Weisberg S (1985) Applied Linear Regression Wiley
5
Arguments

1:
$\mathbf{mean}$ – Character(1)
Input

On entry: indicates if a mean term is to be included.
 ${\mathbf{mean}}=\text{'M'}$
 A mean term, intercept, will be included in the model.
 ${\mathbf{mean}}=\text{'Z'}$
 The model will pass through the origin, zeropoint.
Constraint:
${\mathbf{mean}}=\text{'M'}$ or $\text{'Z'}$.

2:
$\mathbf{weight}$ – Character(1)
Input

On entry: indicates if weights are to be used.
 ${\mathbf{weight}}=\text{'U'}$
 Least squares estimation is used.
 ${\mathbf{weight}}=\text{'W'}$
 Weighted least squares is used and weights must be supplied in array wt.
Constraint:
${\mathbf{weight}}=\text{'U'}$ or $\text{'W'}$.

3:
$\mathbf{n}$ – Integer
Input

On entry: $n$, the number of observations.
Constraints:
 ${\mathbf{n}}\ge 2$;
 ${\mathbf{n}}\ge m$, is the number of independent variables to be considered (forced plus free plus mean if included), as specified by mean and isx.

4:
$\mathbf{m}$ – Integer
Input

On entry: the number of variables contained in
x.
Constraint:
${\mathbf{m}}\ge 2$.

5:
$\mathbf{x}\left({\mathbf{ldx}},{\mathbf{m}}\right)$ – Real (Kind=nag_wp) array
Input

On entry: ${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th independent variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.

6:
$\mathbf{ldx}$ – Integer
Input

On entry: the first dimension of the array
x as declared in the (sub)program from which
g02eaf is called.
Constraint:
${\mathbf{ldx}}\ge {\mathbf{n}}$.

7:
$\mathbf{vname}\left({\mathbf{m}}\right)$ – Character(*) array
Input

On entry:
${\mathbf{vname}}\left(\mathit{j}\right)$ must contain the name of the variable in column
$\mathit{j}$ of
x, for
$\mathit{j}=1,2,\dots ,{\mathbf{m}}$.

8:
$\mathbf{isx}\left({\mathbf{m}}\right)$ – Integer array
Input

On entry: indicates which independent variables are to be considered in the model.
 ${\mathbf{isx}}\left(j\right)\ge 2$
 The variable contained in the $j$th column of x is included in all regression models, i.e., is a forced variable.
 ${\mathbf{isx}}\left(j\right)=1$
 The variable contained in the $j$th column of x is included in the set from which the regression models are chosen, i.e., is a free variable.
 ${\mathbf{isx}}\left(j\right)=0$
 The variable contained in the $j$th column of x is not included in the models.
Constraints:
 ${\mathbf{isx}}\left(\mathit{j}\right)\ge 0$, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$;
 at least one value of ${\mathbf{isx}}=1$.

9:
$\mathbf{y}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) array
Input

On entry: ${\mathbf{y}}\left(\mathit{i}\right)$ must contain the $\mathit{i}$th observation on the dependent variable, ${y}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.

10:
$\mathbf{wt}\left(*\right)$ – Real (Kind=nag_wp) array
Input

Note: the dimension of the array
wt
must be at least
${\mathbf{n}}$ if
${\mathbf{weight}}=\text{'W'}$.
On entry: if
${\mathbf{weight}}=\text{'W'}$ wt must contain the weights to be used with the model.
If ${\mathbf{wt}}\left(i\right)=0.0$, the $i$th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If
${\mathbf{weight}}=\text{'U'}$,
wt is not referenced and the effective number of observations is
$n$.
Constraint:
if ${\mathbf{weight}}=\text{'W'}$, ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.

11:
$\mathbf{nmod}$ – Integer
Output

On exit: the total number of models for which residual sums of squares have been calculated.

12:
$\mathbf{modl}\left({\mathbf{ldmodl}},{\mathbf{m}}\right)$ – Character(*) array
Output

On exit: the first
${\mathbf{nterms}}\left(i\right)$ elements of the
$i$th row of
modl contain the names of the independent variables, as given in
vname, that are included in the
$i$th model.
Constraint:
the length of
modl should be greater or equal to the length of
vname.

13:
$\mathbf{ldmodl}$ – Integer
Input

On entry: the first dimension of the array
modl and the dimension of the arrays
rss,
nterms and
mrank as declared in the (sub)program from which
g02eaf is called.
Constraint:
${\mathbf{ldmodl}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)$,
$\mathit{k}$ is the number of free variables in the model as specified in
isx, and hence
${2}^{\mathit{k}}$ is the total number of models to be generated.

On exit: ${\mathbf{rss}}\left(\mathit{i}\right)$ contains the residual sum of squares for the $\mathit{i}$th model, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.

15:
$\mathbf{nterms}\left({\mathbf{ldmodl}}\right)$ – Integer array
Output

On exit: ${\mathbf{nterms}}\left(\mathit{i}\right)$ contains the number of independent variables in the $\mathit{i}$th model, not including the mean if one is fitted, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.

16:
$\mathbf{mrank}\left({\mathbf{ldmodl}}\right)$ – Integer array
Output

On exit: ${\mathbf{mrank}}\left(i\right)$ contains the rank of the residual sum of squares for the $i$th model.

17:
$\mathbf{wk}\left({\mathbf{n}}\times \left({\mathbf{m}}+1\right)\right)$ – Real (Kind=nag_wp) array
Workspace


18:
$\mathbf{ifail}$ – Integer
Input/Output

On entry:
ifail must be set to
$0$,
$1$ or
$1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value
$1$ or
$1$ is recommended. If message printing is undesirable, then the value
$1$ is recommended. Otherwise, the value
$0$ is recommended.
When the value $\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=1$

On entry, ${\mathbf{ldmodl}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{ldmodl}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{ldx}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{n}}$.
On entry, ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{m}}\ge 2$.
On entry, ${\mathbf{mean}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{mean}}=\text{'M'}$ or $\text{'Z'}$.
On entry, ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{n}}\ge 2$.
On entry, ${\mathbf{weight}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{weight}}=\text{'W'}$ or $\text{'U'}$.
 ${\mathbf{ifail}}=2$

On entry, ${\mathbf{wt}}\left(\u2329\mathit{\text{value}}\u232a\right)<0.0$.
Constraint: ${\mathbf{wt}}\left(i\right)\ge 0.0$, for $i=1,2,\dots ,n$.
 ${\mathbf{ifail}}=3$

On entry, ${\mathbf{isx}}\left(\u2329\mathit{\text{value}}\u232a\right)<0$.
Constraint: ${\mathbf{isx}}\left(i\right)\ge 0$, for $i=1,2,\dots ,{\mathbf{m}}$.
There are no free variables, i.e., no element of ${\mathbf{isx}}=1$.
 ${\mathbf{ifail}}=4$

On entry, ${\mathbf{ldmodl}}=\u2329\mathit{\text{value}}\u232a$ and number of possible models is $\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{ldmodl}}\ge $ the number of possible models.
 ${\mathbf{ifail}}=5$

On entry, the number of independent variables to be considered (forced plus free plus mean if included) is greater or equal to the effective number of observations.
 ${\mathbf{ifail}}=6$

The full model is not of full rank, i.e., some of the independent variables may be linear combinations of other independent variables. Variables must be excluded from the model in order to give full rank.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
For a discussion of the improved accuracy obtained by using a method based on the
$QR$ decomposition see
Smith and Bremner (1989).
8
Parallelism and Performance
g02eaf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02eaf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
g02ecf may be used to compute
${R}^{2}$ and
${C}_{p}$values from the results of
g02eaf.
If a mean has been included in the model and no variables are forced in then ${\mathbf{rss}}\left(1\right)$ contains the total sum of squares and in many situations a reasonable estimate of the variance of the errors is given by ${\mathbf{rss}}\left({\mathbf{nmod}}\right)/\left({\mathbf{n}}1{\mathbf{nterms}}\left({\mathbf{nmod}}\right)\right)$.
10
Example
The data for this example is given in
Weisberg (1985). The independent variables and the dependent variable are read, as are the names of the variables. These names are as given in
Weisberg (1985). The residual sums of squares computed and printed with the names of the variables in the model.
10.1
Program Text
10.2
Program Data
10.3
Program Results