nag_regress_confid_interval (g02cbc) : NAG Library, Mark 24

nag_regress_confid_interval (g02cbc) performs a simple linear regression with or without a constant term. The data is optionally weighted, and confidence intervals are calculated for the predicted and average values of y at a given x.

2 Specification

3 Description

nag_regress_confid_interval (g02cbc) fits a straight line model of the form,

E (y) = a + b x

where

E (y)

is the expected value of the variable

y

, to the data points

(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n}),

such that

y_{i} = a + {b x}_{i} + e_{i}, i = 1, 2, \dots, n,

where the

e_{i}

values are independent random errors. The

i

th data point may have an associated weight

w_{i}

. The values of

a

and

b

are estimated by minimizing

\sum w_{i} e_{i}^{2}

(if the weights option is not selected then

w_{i} = 1.0

). The fitted values

{\hat{y}}_{i}

are calculated using

{\hat{y}}_{i} = \hat{a} + {\hat{b x}}_{i}

where

\hat{a} = \bar{y - b} \bar{x} \hat{b} = \frac{\sum w_{i} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sum w_{i} {(x_{i} - \bar{x})}^{2}}

and the weighted means

\bar{x}

and

\bar{y}

are given by

\bar{y} = \frac{\sum w_{i} y_{i}}{\sum w_{i}} and \bar{x} = \frac{\sum w_{i} x_{i}}{\sum w_{i}} .

The residuals of the regression are calculated using

{res}_{i} = y_{i} - {\hat{y}}_{i}

and the residual mean square about the regression

r m s

, is determined using

rms = \frac{\sum w_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{d f}

where

d f

(the number of degrees of freedom) has the following values

$d f = \sum w_{i} - 2$ where $mean = Nag_AboutMean$
$d f = \sum w_{i} - 1$ where $mean = Nag_AboutZero$ .

Note: the weights should be scaled to give the required degrees of freedom.

The function calculates predicted

y

estimates for a value of

x

x_{i}^{*}

, is given by

y_{i}^{*} = \hat{a} + {\hat{b x}}_{i}^{*}

this prediction has a standard error

serr_pred = \sqrt{rms} \sqrt{1 + \frac{1}{\sum w_{i}} + \frac{{(x_{i}^{*} - \bar{x})}^{2}}{\sum w_{i} {(x_{i} - \bar{x})}^{2}}} .

The

(1 - α)

confidence interval for this estimation of

y

is given by

y_{i}^{*} \pm t_{d f} (1 - α / 2) . serr_pred

where

t_{d f} (1 - α / 2)

refers to the

(1 - α / 2)

point of the

t

distribution with

d f

degrees of freedom (e.g., when

d f = 20

and

α = 0.1

t_{20} (0.95) = 2.086

). If you specify the probability

c l p = 0.9 (α = 0.1)

then the lower limit of this interval is

{y l}_{i} = y_{i - t}^{*} {i - t}_{d f} (0.95) . serr_pred

and the upper limit is

{y u}_{i} = y_{i}^{*} + t_{d f} (0.95) . serr_pred .

The mean value of

y

x_{i}

is estimated by the fitted value

{\hat{y}}_{i}

. This has a standard error of

serr_arg = \sqrt{rms} \sqrt{\frac{1}{\sum w_{i}} + \frac{{(x_{i} - \bar{x})}^{2}}{\sum w_{i} {(x_{i} - \bar{x})}^{2}}}

and a

(1 - α)

confidence interval is given by

{\hat{y}}_{i} \pm t_{d f} (1 - α / 2) . serr_arg .

For example, if you specify the probability

c l m = 0.6 (α = 0.4)

then the lower limit of this interval is

{y m l}_{i} = {\hat{y}}_{i - t} {i - t}_{d f} (0.8) . serr_arg

and the upper limit is

{y m u}_{i} = {\hat{y}}_{i} + t_{d f} (0.8) . serr_arg .

The leverage,

h_{i}

, is a measure of the influence a value

x_{i}

has on the fitted line at that point,

{\hat{y}}_{i}

. The leverage is given by

h_{i} = \frac{w_{i}}{\sum w_{i}} + \frac{w_{i} {(x_{i} - \bar{x})}^{2}}{\sum w_{i} {(x_{i} - \bar{x})}^{2}}

so it can be seen that

\begin{matrix} serr_arg & = & \sqrt{rms} \sqrt{h_{i} / w_{i}} \\ and ​ & serr_pred & = & \sqrt{rms} \sqrt{1 + h_{i} / w_{i}} \end{matrix}

5 Arguments

1: mean – Nag_SumSquareInput

On entry: indicates whether nag_regress_confid_interval (g02cbc) is to include a constant term in the regression.

$mean = Nag_AboutMean$: The constant term, $a$ , is included.
$mean = Nag_AboutZero$: The constant term, $a$ , is not included, i.e., $a = 0$ .

Constraint:

mean = Nag_AboutMean

Nag_AboutZero

2: n – IntegerInput

On entry:

N

, the number of observations.

Constraints:

if $mean = Nag_AboutMean$ , $n \geq 2$ ;
if $mean = Nag_AboutZero$ , $n \geq 1$ .

3: x[n] – const doubleInput

On entry: observations on the independent variable,

x

Constraint: all the values of

x

must not be identical.

4: y[n] – const doubleInput

On entry: observations on the dependent variable,

y

5: wt[n] – const doubleInput

On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Usually

wt [i - 1]

will be an integral value corresponding to the number of observations associated with the

i

th data point, or zero if the

i

th data point is to be ignored. The sum of the weights therefore represents the effective total number of observations used to create the regression line.

If weights are not provided then wt must be set to NULL and the effective number of observations is n.

Constraint: if

wt is not NULL

wt [i - 1] = 0.0

, for

i = 1, 2, \dots, n

6: clm – doubleInput

On entry: the confidence level for the confidence intervals for the mean.

Constraint:

0.0 < clm < 1.0

7: clp – doubleInput

On entry: the confidence level for the prediction intervals.

Constraint:

0.0 < clp < 1.0

8: yhat[n] – doubleOutput

On exit: the fitted values,

{\hat{y}}_{i}

9: yml[n] – doubleOutput

On exit:

yml [i - 1]

contains the lower limit of the confidence interval for the regression line at

x [i - 1]

10: ymu[n] – doubleOutput

On exit:

ymu [i - 1]

contains the upper limit of the confidence interval for the regression line at

x [i - 1]

11: yl[n] – doubleOutput

On exit:

yl [i - 1]

contains the lower limit of the confidence interval for the individual y value at

x [i - 1]

12: yu[n] – doubleOutput

On exit:

yu [i - 1]

contains the upper limit of the confidence interval for the individual y value at

x [i - 1]

13: h[n] – doubleOutput

On exit: the leverage of each observation on the regression.

14: res[n] – doubleOutput

On exit: the residuals of the regression.

15: rms – double *Output

On exit: the residual mean square about the regression.

16: fail – NagError *Input/Output

The NAG error argument (see Section 3.6 in the Essential Introduction).

6 Error Indicators and Warnings

NE_BAD_PARAM

On entry, argument mean had an illegal value.

NE_INT_ARG_LT

On entry,

n = ⟨value⟩

.
Constraint: if

mean = Nag_AboutMean

n \geq 2

On entry,

n = ⟨value⟩

.
Constraint: if

mean = Nag_AboutZero

n \geq 1

NE_NEG_WEIGHT

On entry, at least one of the weights is negative.

NE_REAL_ARG_GE

On entry, clm must not be greater than or equal to 1.0:

clm = ⟨value⟩

On entry, clp must not be greater than or equal to 1.0:

clp = ⟨value⟩

NE_REAL_ARG_LE

On entry, clm must not be less than or equal to 0.0:

clm = ⟨value⟩

On entry, clp must not be less than or equal to 0.0:

clp = ⟨value⟩

NE_SW_LOW

On entry, the sum of elements of wt must be greater than 1.0 if

mean = Nag_AboutZero

and 2.0 if

mean = Nag_AboutMean

NE_WT_LOW

On entry, wt must contain at least 1 positive element if

mean = Nag_AboutZero

or at least 2 positive elements if

mean = Nag_AboutMean

NE_X_IDEN

On entry, all elements of x are equal.

NW_RMS_EQ_ZERO

Residual mean sum of squares is zero, i.e., a perfect fit was obtained.

NAG Library Function Document

nag_regress_confid_interval (g02cbc)

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Documentnag_regress_confid_interval (g02cbc)

+− Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

NAG Library Function Document

nag_regress_confid_interval (g02cbc)