Settings help

CL Name Style:

## 1Purpose

g02eac calculates the residual sums of squares for all possible linear regressions for a given set of independent variables.

## 2Specification

 #include
 void g02eac (Nag_OrderType order, Nag_IncludeMean mean, Integer n, Integer m, const double x[], Integer pdx, const char *var_names[], const Integer sx[], const double y[], const double wt[], Integer *nmod, const char *model[], double rss[], Integer nterms[], Integer mrank[], NagError *fail)
The function may be called by the names: g02eac, nag_correg_linregm_rssq or nag_all_regsn.

## 3Description

For a set of $\mathit{k}$ possible independent variables there are ${2}^{\mathit{k}}$ linear regression models with from zero to $\mathit{k}$ independent variables in each model. For example if $\mathit{k}=3$ and the variables are $A$, $B$ and $C$ then the possible models are:
1. (i)null model
2. (ii)$A$
3. (iii)$B$
4. (iv)$C$
5. (v)$A$ and $B$
6. (vi)$A$ and $C$
7. (vii)$B$ and $C$
8. (viii)$A$, $B$ and $C$.
g02eac calculates the residual sums of squares from each of the ${2}^{\mathit{k}}$ possible models. The method used involves a $QR$ decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see Clark (1981) and Smith and Bremner (1989).
The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the ${2}^{\mathit{k}}$th will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables.
g02eac allows you to specify some independent variables that must be in the model, the forced variables. The other independent variables from which the possible models are to be formed are the free variables.
Clark M R B (1981) A Givens algorithm for moving from one linear model to another without going back to the data Appl. Statist. 30 198–203
Smith D M and Bremner J M (1989) All possible subset regressions using the $QR$ decomposition Comput. Statist. Data Anal. 7 217–236
Weisberg S (1985) Applied Linear Regression Wiley

## 5Arguments

1: $\mathbf{order}$Nag_OrderType Input
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by ${\mathbf{order}}=\mathrm{Nag_RowMajor}$. See Section 3.1.3 in the Introduction to the NAG Library CL Interface for a more detailed explanation of the use of this argument.
Constraint: ${\mathbf{order}}=\mathrm{Nag_RowMajor}$ or $\mathrm{Nag_ColMajor}$.
2: $\mathbf{mean}$Nag_IncludeMean Input
On entry: indicates if a mean term is to be included.
${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$
A mean term, intercept, will be included in the model.
${\mathbf{mean}}=\mathrm{Nag_MeanZero}$
The model will pass through the origin, zero-point.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$ or $\mathrm{Nag_MeanZero}$.
3: $\mathbf{n}$Integer Input
On entry: $n$, the number of observations.
Constraints:
• ${\mathbf{n}}\ge 2$;
• ${\mathbf{n}}\ge m$, is the number of independent variables to be considered (forced plus free plus mean if included), as specified by mean and sx.
4: $\mathbf{m}$Integer Input
On entry: the number of variables contained in x.
Constraint: ${\mathbf{m}}\ge 2$.
5: $\mathbf{x}\left[\mathit{dim}\right]$const double Input
Note: the dimension, dim, of the array x must be at least
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{pdx}}×{\mathbf{m}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{n}}×{\mathbf{pdx}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
where ${\mathbf{X}}\left(i,j\right)$ appears in this document, it refers to the array element
• ${\mathbf{x}}\left[\left(j-1\right)×{\mathbf{pdx}}+i-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• ${\mathbf{x}}\left[\left(i-1\right)×{\mathbf{pdx}}+j-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
On entry: ${\mathbf{X}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th independent variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
6: $\mathbf{pdx}$Integer Input
On entry: the stride separating row or column elements (depending on the value of order) in the array x.
Constraints:
• if ${\mathbf{order}}=\mathrm{Nag_ColMajor}$, ${\mathbf{pdx}}\ge {\mathbf{n}}$;
• if ${\mathbf{order}}=\mathrm{Nag_RowMajor}$, ${\mathbf{pdx}}\ge {\mathbf{m}}$.
7: $\mathbf{var_names}\left[{\mathbf{m}}\right]$const char * Input
On entry: ${\mathbf{var_names}}\left[\mathit{i}-1\right]$ must contain the name of the independent variable in row $\mathit{i}$ of x, for $\mathit{i}=1,2,\dots ,{\mathbf{m}}$.
8: $\mathbf{sx}\left[{\mathbf{m}}\right]$const Integer Input
On entry: indicates which independent variables are to be considered in the model.
${\mathbf{sx}}\left[j-1\right]\ge 2$
The variable contained in the $j$th column of X is included in all regression models, i.e., is a forced variable.
${\mathbf{sx}}\left[j-1\right]=1$
The variable contained in the $j$th column of X is included in the set from which the regression models are chosen, i.e., is a free variable.
${\mathbf{sx}}\left[j-1\right]=0$
The variable contained in the $j$th column of X is not included in the models.
Constraints:
• ${\mathbf{sx}}\left[\mathit{j}-1\right]\ge 0$, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$;
• at least one value of ${\mathbf{sx}}=1$.
9: $\mathbf{y}\left[{\mathbf{n}}\right]$const double Input
On entry: ${\mathbf{y}}\left[\mathit{i}-1\right]$ must contain the $\mathit{i}$th observation on the dependent variable, ${y}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
10: $\mathbf{wt}\left[\mathit{dim}\right]$const double Input
Note: the dimension, dim, of the array wt must be at least ${\mathbf{n}}$.
On entry: if provided wt must contain the weights to be used with the model.
If ${\mathbf{wt}}\left[i-1\right]=0.0$, the $i$th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
If wt is not provided the effective number of observations is $n$.
Constraint: if ${\mathbf{wt}}\phantom{\rule{0.25em}{0ex}}\text{is not}\phantom{\rule{0.25em}{0ex}}\mathbf{NULL}$, ${\mathbf{wt}}\left[\mathit{i}\right]\ge 0.0$, for $\mathit{i}=0,1,\dots ,n-1$.
11: $\mathbf{nmod}$Integer * Output
On exit: the total number of models for which residual sums of squares have been calculated.
12: $\mathbf{model}\left[\mathit{dim}\right]$const char * Output
Note: the dimension, dim, of the array model must be at least big enough to hold the names of all the free independent variables which appear in all the models. This will never exceed ${2}^{\mathit{k}}×{\mathbf{m}}$, where $\mathit{k}$ is the number of free variables in the model.
On exit: the names of the independent variables in each model, represented as pointers to the names provided by you in var_names. The model names are stored as follows:
• if the first model has three names, i.e., ${\mathbf{nterms}}\left[0\right]=3$; then ${\mathbf{model}}\left[0\right]$, ${\mathbf{model}}\left[1\right]$ and ${\mathbf{model}}\left[2\right]$ will contain these three names;
• if the second model has two names, i.e., ${\mathbf{nterms}}\left[1\right]=2$; then ${\mathbf{model}}\left[3\right]$, ${\mathbf{model}}\left[4\right]$ will contain these two names.
13: $\mathbf{rss}\left[\mathit{dim}\right]$double Output
Note: the dimension, dim, of the array rss must be at least $\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\right)$.
On exit: ${\mathbf{rss}}\left[\mathit{i}-1\right]$ contains the residual sum of squares for the $\mathit{i}$th model, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.
14: $\mathbf{nterms}\left[\mathit{dim}\right]$Integer Output
Note: the dimension, dim, of the array nterms must be at least $\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\right)$.
On exit: ${\mathbf{nterms}}\left[\mathit{i}-1\right]$ contains the number of independent variables in the $\mathit{i}$th model, not including the mean if one is fitted, for $\mathit{i}=1,2,\dots ,{\mathbf{nmod}}$.
15: $\mathbf{mrank}\left[\mathit{dim}\right]$Integer Output
Note: the dimension, dim, of the array mrank must be at least $\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\right)$.
On exit: ${\mathbf{mrank}}\left[i-1\right]$ contains the rank of the residual sum of squares for the $i$th model.
16: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

## 6Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
On entry, argument $⟨\mathit{\text{value}}⟩$ had an illegal value.
NE_FREE_VARS
There are no free variables, i.e., no element of ${\mathbf{sx}}=1$.
NE_FULL_RANK
The full model is not of full rank, i.e., some of the independent variables may be linear combinations of other independent variables. Variables must be excluded from the model in order to give full rank.
NE_INDEP_VARS_OBS
On entry, the number of independent variables to be considered (forced plus free plus mean if included) is greater or equal to the effective number of observations.
NE_INT
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}\ge 2$.
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}\ge 2$.
On entry, ${\mathbf{pdx}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{pdx}}>0$.
NE_INT_2
On entry, $\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\right)=⟨\mathit{\text{value}}⟩$ and ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: $\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\right)\ge {\mathbf{m}}$.
On entry, $\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\right)=⟨\mathit{\text{value}}⟩$ and number of possible models is $⟨\mathit{\text{value}}⟩$.
Constraint: $\left(\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({2}^{\mathit{k}},{\mathbf{m}}\right)\right)\ge$ the number of possible models.
On entry, ${\mathbf{pdx}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{pdx}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{n}}$.
NE_INT_ARRAY_ELEM_CONS
On entry, ${\mathbf{sx}}\left[⟨\mathit{\text{value}}⟩\right]<0$.
Constraint: ${\mathbf{sx}}\left[i-1\right]\ge 0$, for $i=1,2,\dots ,{\mathbf{m}}$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_REAL_ARRAY_ELEM_CONS
On entry, ${\mathbf{wt}}\left[⟨\mathit{\text{value}}⟩\right]<0.0$.
Constraint: ${\mathbf{wt}}\left[i-1\right]\ge 0.0$, for $i=1,2,\dots ,n$.

## 7Accuracy

For a discussion of the improved accuracy obtained by using a method based on the $QR$ decomposition see Smith and Bremner (1989).

## 8Parallelism and Performance

g02eac is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02eac makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

g02ecc may be used to compute ${R}^{2}$ and ${C}_{p}$-values from the results of g02eac.
If a mean has been included in the model and no variables are forced in then ${\mathbf{rss}}\left[0\right]$ contains the total sum of squares and in many situations a reasonable estimate of the variance of the errors is given by ${\mathbf{rss}}\left[{\mathbf{nmod}}-1\right]/\left({\mathbf{n}}-1-{\mathbf{nterms}}\left[{\mathbf{nmod}}-1\right]\right)$.

## 10Example

The data for this example is given in Weisberg (1985). The independent variables and the dependent variable are read, as are the names of the variables. These names are as given in Weisberg (1985). The residual sums of squares computed and printed with the names of the variables in the model.

### 10.1Program Text

Program Text (g02eace.c)

### 10.2Program Data

Program Data (g02eace.d)

### 10.3Program Results

Program Results (g02eace.r)