g02 Chapter Contents
g02 Chapter Introduction
NAG Library Manual

# NAG Library Function Documentnag_regress_confid_interval (g02cbc)

## 1  Purpose

nag_regress_confid_interval (g02cbc) performs a simple linear regression with or without a constant term. The data is optionally weighted, and confidence intervals are calculated for the predicted and average values of y at a given x.

## 2  Specification

 #include #include
 void nag_regress_confid_interval (Nag_SumSquare mean, Integer n, const double x[], const double y[], const double wt[], double clm, double clp, double yhat[], double yml[], double ymu[], double yl[], double yu[], double h[], double res[], double *rms, NagError *fail)

## 3  Description

nag_regress_confid_interval (g02cbc) fits a straight line model of the form,
 $E y = a + bx$
where $E\left(y\right)$ is the expected value of the variable $y$, to the data points
 $x 1 , y 1 , x 2 , y 2 , … , x n , y n ,$
such that
 $y i = a + bx i + e i , i = 1 , 2 , … , n ,$
where the ${e}_{i}$ values are independent random errors. The $i$th data point may have an associated weight ${w}_{i}$. The values of $a$ and $b$ are estimated by minimizing $\sum {w}_{i}{e}_{i}^{2}$ (if the weights option is not selected then ${w}_{i}=1.0$). The fitted values ${\stackrel{^}{y}}_{i}$ are calculated using
 $y ^ i = a ^ + bx ^ i$
where
 $a ^ = y-b - x - b ^ = ∑ w i x i - x - y i - y - ∑ w i x i - x - 2$
and the weighted means $\stackrel{-}{x}$ and $\stackrel{-}{y}$ are given by
 $y - = ∑ w i y i ∑ w i and x - = ∑ w i x i ∑ w i .$
The residuals of the regression are calculated using
 $res i = y i - y ^ i$
and the residual mean square about the regression $rms$, is determined using
 $rms = ∑ w i y i - y ^ i 2 df$
where $df$ (the number of degrees of freedom) has the following values
• $df=\sum {w}_{i}-2$ where ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
• $df=\sum {w}_{i}-1$ where ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$.
Note: the weights should be scaled to give the required degrees of freedom.
The function calculates predicted $y$ estimates for a value of $x$, ${x}_{i}^{*}$, is given by
 $y i * = a ^ + bx ^ i *$
this prediction has a standard error
 $serr_pred = rms 1 + 1 ∑ w i + x i * - x - 2 ∑ w i x i - x - 2 .$
The $\left(1-\alpha \right)$ confidence interval for this estimation of $y$ is given by
 $y i * ± t df 1 - α / 2 . serr_pred$
where ${t}_{df}\left(1-\alpha /2\right)$ refers to the $\left(1-\alpha /2\right)$ point of the $t$ distribution with $df$ degrees of freedom (e.g., when $df=20$ and $\alpha =0.1$, ${t}_{20}\left(0.95\right)=2.086$). If you specify the probability $clp=0.9\left(\alpha =0.1\right)$ then the lower limit of this interval is
 $yl i = y i-t * i-t df 0.95 . serr_pred$
and the upper limit is
 $yu i = y i * + t df 0.95 . serr_pred .$
The mean value of $y$ at ${x}_{i}$ is estimated by the fitted value ${\stackrel{^}{y}}_{i}$. This has a standard error of
 $serr_arg = rms 1 ∑ w i + x i - x - 2 ∑ w i x i - x - 2$
and a $\left(1-\alpha \right)$ confidence interval is given by
 $y ^ i ± t df 1 - α / 2 . serr_arg .$
For example, if you specify the probability $clm=0.6\left(\alpha =0.4\right)$ then the lower limit of this interval is
 $yml i = y ^ i-t i-t df 0.8 . serr_arg$
and the upper limit is
 $ymu i = y ^ i + t df 0.8 . serr_arg .$
The leverage, ${h}_{i}$, is a measure of the influence a value ${x}_{i}$ has on the fitted line at that point, ${\stackrel{^}{y}}_{i}$. The leverage is given by
 $h i = w i ∑ w i + w i x i - x - 2 ∑ w i x i - x - 2$
so it can be seen that
 $serr_arg = rms h i / w i and ​ serr_pred = rms 1 + h i / w i$
Similar formulae can be derived for the case when the line goes through the origin, that is $a=0$.
Snedecor G W and Cochran W G (1967) Statistical Methods Iowa State University Press

## 5  Arguments

1:    $\mathbf{mean}$Nag_SumSquareInput
On entry: indicates whether nag_regress_confid_interval (g02cbc) is to include a constant term in the regression.
${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
The constant term, $a$, is included.
${\mathbf{mean}}=\mathrm{Nag_AboutZero}$
The constant term, $a$, is not included, i.e., $a=0$.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$ or $\mathrm{Nag_AboutZero}$.
2:    $\mathbf{n}$IntegerInput
On entry: $N$, the number of observations.
Constraints:
• if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$, ${\mathbf{n}}\ge 2$;
• if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$, ${\mathbf{n}}\ge 1$.
3:    $\mathbf{x}\left[{\mathbf{n}}\right]$const doubleInput
On entry: observations on the independent variable, $x$.
Constraint: all the values of $x$ must not be identical.
4:    $\mathbf{y}\left[{\mathbf{n}}\right]$const doubleInput
On entry: observations on the dependent variable, $y$.
5:    $\mathbf{wt}\left[{\mathbf{n}}\right]$const doubleInput
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Usually ${\mathbf{wt}}\left[i-1\right]$ will be an integral value corresponding to the number of observations associated with the $i$th data point, or zero if the $i$th data point is to be ignored. The sum of the weights therefore represents the effective total number of observations used to create the regression line.
If weights are not provided then wt must be set to NULL and the effective number of observations is n.
Constraint: if ${\mathbf{wt}}\phantom{\rule{0.25em}{0ex}}\text{is not}\phantom{\rule{0.25em}{0ex}}\mathbf{NULL}$, ${\mathbf{wt}}\left[\mathit{i}-1\right]=0.0$, for $\mathit{i}=1,2,\dots ,n$.
6:    $\mathbf{clm}$doubleInput
On entry: the confidence level for the confidence intervals for the mean.
Constraint: $0.0<{\mathbf{clm}}<1.0$.
7:    $\mathbf{clp}$doubleInput
On entry: the confidence level for the prediction intervals.
Constraint: $0.0<{\mathbf{clp}}<1.0$.
8:    $\mathbf{yhat}\left[{\mathbf{n}}\right]$doubleOutput
On exit: the fitted values, ${\stackrel{^}{{\mathbf{y}}}}_{i}$.
9:    $\mathbf{yml}\left[{\mathbf{n}}\right]$doubleOutput
On exit: ${\mathbf{yml}}\left[i-1\right]$ contains the lower limit of the confidence interval for the regression line at ${\mathbf{x}}\left[i-1\right]$.
10:  $\mathbf{ymu}\left[{\mathbf{n}}\right]$doubleOutput
On exit: ${\mathbf{ymu}}\left[i-1\right]$ contains the upper limit of the confidence interval for the regression line at ${\mathbf{x}}\left[i-1\right]$.
11:  $\mathbf{yl}\left[{\mathbf{n}}\right]$doubleOutput
On exit: ${\mathbf{yl}}\left[i-1\right]$ contains the lower limit of the confidence interval for the individual y value at ${\mathbf{x}}\left[i-1\right]$.
12:  $\mathbf{yu}\left[{\mathbf{n}}\right]$doubleOutput
On exit: ${\mathbf{yu}}\left[i-1\right]$ contains the upper limit of the confidence interval for the individual y value at ${\mathbf{x}}\left[i-1\right]$.
13:  $\mathbf{h}\left[{\mathbf{n}}\right]$doubleOutput
On exit: the leverage of each observation on the regression.
14:  $\mathbf{res}\left[{\mathbf{n}}\right]$doubleOutput
On exit: the residuals of the regression.
15:  $\mathbf{rms}$double *Output
On exit: the residual mean square about the regression.
16:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

On entry, argument mean had an illegal value.
NE_INT_ARG_LT
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$, ${\mathbf{n}}\ge 2$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$, ${\mathbf{n}}\ge 1$.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_REAL_ARG_GE
On entry, clm must not be greater than or equal to 1.0: ${\mathbf{clm}}=〈\mathit{\text{value}}〉$.
On entry, clp must not be greater than or equal to 1.0: ${\mathbf{clp}}=〈\mathit{\text{value}}〉$.
NE_REAL_ARG_LE
On entry, clm must not be less than or equal to 0.0: ${\mathbf{clm}}=〈\mathit{\text{value}}〉$.
On entry, clp must not be less than or equal to 0.0: ${\mathbf{clp}}=〈\mathit{\text{value}}〉$.
NE_SW_LOW
On entry, the sum of elements of wt must be greater than 1.0 if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$ and 2.0 if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_WT_LOW
On entry, wt must contain at least 1 positive element if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$ or at least 2 positive elements if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_X_IDEN
On entry, all elements of x are equal.
NW_RMS_EQ_ZERO
Residual mean sum of squares is zero, i.e., a perfect fit was obtained.

## 7  Accuracy

The computations are believed to be stable.

Not applicable.

None.

## 10  Example

A program to calculate the fitted value of $y$ and the upper and lower limits of the confidence interval for the regression line as well as the individual $y$ values.

### 10.1  Program Text

Program Text (g02cbce.c)

### 10.2  Program Data

Program Data (g02cbce.d)

### 10.3  Program Results

Program Results (g02cbce.r)