NAG Library Routine Document
G02EAF calculates the residual sums of squares for all possible linear regressions for a given set of independent variables.
|SUBROUTINE G02EAF (
||MEAN, WEIGHT, N, M, X, LDX, VNAME, ISX, Y, WT, NMOD, MODL, LDMODL, RSS, NTERMS, MRANK, WK, IFAIL)
||N, M, LDX, ISX(M), NMOD, LDMODL, NTERMS(LDMODL), MRANK(LDMODL), IFAIL
||X(LDX,M), Y(N), WT(*), RSS(LDMODL), WK(N*(M+1))
For a set of
possible independent variables there are
linear regression models with from zero to
independent variables in each model. For example if
and the variables are
then the possible models are:
||, and .
G02EAF calculates the residual sums of squares from each of the
possible models. The method used involves a
decomposition of the matrix of possible independent variables. Independent variables are then moved into and out of the model by a series of Givens rotations and the residual sums of squares computed for each model; see Clark (1981)
and Smith and Bremner (1989)
The computed residual sums of squares are then ordered first by increasing number of terms in the model, then by decreasing size of residual sums of squares. So the first model will always have the largest residual sum of squares and the th will always have the smallest. This aids you in selecting the best possible model from the given set of independent variables.
G02EAF allows you to specify some independent variables that must be in the model, the forced variables. The other independent variables from which the possible models are to be formed are the free variables.
Clark M R B (1981) A Givens algorithm for moving from one linear model to another without going back to the data Appl. Statist. 30 198–203
Smith D M and Bremner J M (1989) All possible subset regressions using the decomposition Comput. Statist. Data Anal. 7 217–236
Weisberg S (1985) Applied Linear Regression Wiley
- 1: MEAN – CHARACTER(1)Input
: indicates if a mean term is to be included.
- A mean term, intercept, will be included in the model.
- The model will pass through the origin, zero-point.
- 2: WEIGHT – CHARACTER(1)Input
: indicates if weights are to be used.
- Least squares estimation is used.
- Weighted least squares is used and weights must be supplied in array WT.
- 3: N – INTEGERInput
On entry: , the number of observations.
- , is the number of independent variables to be considered (forced plus free plus mean if included), as specified by MEAN and ISX.
- 4: M – INTEGERInput
: the number of variables contained in X
- 5: X(LDX,M) – REAL (KIND=nag_wp) arrayInput
On entry: must contain the th observation for the th independent variable, for and .
- 6: LDX – INTEGERInput
: the first dimension of the array X
as declared in the (sub)program from which G02EAF is called.
- 7: VNAME(M) – CHARACTER(*) arrayInput
must contain the name of the variable in column
- 8: ISX(M) – INTEGER arrayInput
: indicates which independent variables are to be considered in the model.
- The variable contained in the th column of X is included in all regression models, i.e., is a forced variable.
- The variable contained in the th column of X is included in the set from which the regression models are chosen, i.e., is a free variable.
- The variable contained in the th column of X is not included in the models.
- , for ;
- at least one value of .
- 9: Y(N) – REAL (KIND=nag_wp) arrayInput
On entry: must contain the th observation on the dependent variable, , for .
- 10: WT() – REAL (KIND=nag_wp) arrayInput
the dimension of the array WT
must be at least
must contain the weights to be used in the weighted regression.
If , the th observation is not included in the model, in which case the effective number of observations is the number of observations with nonzero weights.
is not referenced and the effective number of observations is N
if , , for .
- 11: NMOD – INTEGEROutput
On exit: the total number of models for which residual sums of squares have been calculated.
- 12: MODL(LDMODL,M) – CHARACTER(*) arrayOutput
: the first
elements of the
th row of MODL
contain the names of the independent variables, as given in VNAME
, that are included in the
the length of MODL
should be greater or equal to the length of VNAME
- 13: LDMODL – INTEGERInput
: the first dimension of the array MODL
and the dimension of the arrays RSS
as declared in the (sub)program from which G02EAF is called.
- , is the number of free variables in the model as specified in ISX, and hence is the total number of models to be generated.
On exit: contains the residual sum of squares for the th model, for .
- 15: NTERMS(LDMODL) – INTEGER arrayOutput
On exit: contains the number of independent variables in the th model, not including the mean if one is fitted, for .
- 16: MRANK(LDMODL) – INTEGER arrayOutput
On exit: contains the rank of the residual sum of squares for the th model.
- 17: WK() – REAL (KIND=nag_wp) arrayWorkspace
- 18: IFAIL – INTEGERInput/Output
must be set to
. If you are unfamiliar with this parameter you should refer to Section 3.3
in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
is recommended. If the output of error messages is undesirable, then the value
is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
. When the value is used it is essential to test the value of IFAIL on exit.
unless the routine detects an error or a warning has been flagged (see Section 6
6 Error Indicators and Warnings
If on entry
, explanatory error messages are output on the current error message unit (as defined by X04AAF
Errors or warnings detected by the routine:
|or|| or ,|
|or|| or .|
and a value of .|
|On entry,||a value of ,|
|or||there are no free variables, i.e., no element of .|
the number of possible models
is the number of free independent variables from ISX
On entry, the number of independent variables to be considered (forced plus free plus mean if included) is greater or equal to the effective number of observations.
The full model is not of full rank, i.e., some of the independent variables may be linear combinations of other independent variables. Variables must be excluded from the model in order to give full rank.
For a discussion of the improved accuracy obtained by using a method based on the
decomposition see Smith and Bremner (1989)
may be used to compute
-values from the results of G02EAF.
If a mean has been included in the model and no variables are forced in then contains the total sum of squares and in many situations a reasonable estimate of the variance of the errors is given by .
The data for this example is given in Weisberg (1985)
. The independent variables and the dependent variable are read, as are the names of the variables. These names are as given in Weisberg (1985)
. The residual sums of squares computed and printed with the names of the variables in the model.
9.1 Program Text
Program Text (g02eafe.f90)
9.2 Program Data
Program Data (g02eafe.d)
9.3 Program Results
Program Results (g02eafe.r)