linregm_rssq_stat calculates and -values from the residual sums of squares for a series of linear regression models.

For full information please refer to the NAG Library document for g02ec

https://www.nag.com/numeric/nl/nagdoc_28.5/flhtml/g02/g02ecf.html

Parameters
nint

, the number of observations used in the regression model.

sigsqfloat

The best estimate of true variance of the errors, .

tssfloat

The total sum of squares for the regression model.

ntermsint, array-like, shape

must contain the number of independent variables (not counting the mean) fitted to the th model, for .

must contain the residual sum of squares for the th model.

meanstr, length 1, optional

Indicates if a mean term is to be included.

A mean term, intercept, will be included in the model.

The model will pass through the origin, zero-point.

Returns
rsqfloat, ndarray, shape

contains the -value for the th model, for .

cpfloat, ndarray, shape

contains the -value for the th model, for .

Raises
NagValueError
(errno )

On entry, .

Constraint: or .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry: the number of parameters, , is and .

Constraint: .

(errno )

On entry, and .

Constraint: , for all .

(errno )

A value of is less than . This may occur if is too large or if , or IP are incorrect.

Notes

When selecting a linear regression model for a set of observations a balance has to be found between the number of independent variables in the model and fit as measured by the residual sum of squares. The more variables included the smaller will be the residual sum of squares. Two statistics can help in selecting the best model.

1. represents the proportion of variation in the dependent variable that is explained by the independent variables.

 where Total Sum of Squares=tss=∑(y−¯y)2 (if mean is fitted, otherwise tss=∑y2) and Regression Sum of Squares=RegSS=tss−rss, where rss=residual sum of squares=∑(y−^y)2.

The -values can be examined to find a model with a high -value but with small number of independent variables.

2. statistic.

where is the number of parameters (including the mean) in the model and is an estimate of the true variance of the errors. This can often be obtained from fitting the full model.

A well fitting model will have . is often plotted against to see which models are closest to the line.

linregm_rssq_stat may be called after linregm_rssq() which calculates the residual sums of squares for all possible linear regression models.

References

Draper, N R and Smith, H, 1985, Applied Regression Analysis, (2nd Edition), Wiley

Weisberg, S, 1985, Applied Linear Regression, Wiley