g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_simple_linear_regression (g02cac)

## 1  Purpose

nag_simple_linear_regression (g02cac) performs a simple linear regression with or without a constant term. The data is optionally weighted.

## 2  Specification

 #include #include
 void nag_simple_linear_regression (Nag_SumSquare mean, Integer n, const double x[], const double y[], const double wt[], double *a, double *b, double *a_serr, double *b_serr, double *rsq, double *rss, double *df, NagError *fail)

## 3  Description

nag_simple_linear_regression (g02cac) fits a straight line model of the form,
 $E y = a + bx ,$
where $E\left(y\right)$ is the expected value of the variable $y$, to the data points
 $x 1 , y 1 , x 2 , y 2 , … , x n , y n ,$
such that
 $y i = a + bx i + e i , i = 1 , 2 , … , n n>2 .$
where the ${e}_{i}$ values are independent random errors. The $i$th data point may have an associated weight ${w}_{i}$, these may be used either in the situation when var $\left({\epsilon }_{i}\right)={\sigma }^{2}/{w}_{i}$ or if observations have to be removed from the regression by having zero weight or have been observed with frequency ${w}_{i}$.
The regression coefficient, $b$, and the regression constant, $a$ are estimated by minimizing
 $∑ i=1 n w i e i 2 ,$
if the weights option is not selected then ${w}_{i}=1.0$.
The following statistics are computed:
• the estimate of regression constant $\stackrel{^}{a}=\stackrel{-}{y}-\stackrel{^}{b}\stackrel{-}{x}$,
• the estimate of regression coefficient $\stackrel{^}{b}=\frac{\sum {w}_{i}\left({x}_{i}-\stackrel{-}{x}\right)\left({y}_{i}-\stackrel{-}{y}\right)}{\sum {w}_{i}{\left({x}_{i}-\stackrel{-}{x}\right)}^{2}}$,
• the residual sum of squares $rss=\sum {w}_{i}{\left({y}_{i}-{\stackrel{^}{y}}_{i}\right)}^{2}$,
where the weighted means $\stackrel{-}{x}$ and $\stackrel{-}{y}$ are
 $x - = ∑ w i x i ∑ w i and y - = ∑ w i y i ∑ w i .$
The number of degrees of freedom associated with $rss$ is
• $df=\sum {w}_{i}-2$ where ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
• $df=\sum {w}_{i}-1$ where ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$
Note: the weights should be scaled to give the correct degrees of freedom in the case var $\left({\epsilon }_{i}\right)={\sigma }^{2}/{w}_{i}$.
The ${R}^{2}$ value or coefficient of determination
 $R 2 = ∑ w i y ^ i - y - i 2 ∑ w i y i - y - 2 = ∑ w i y i - y - 2 - rss ∑ w i y i - y - 2 .$
This measures the proportion of the total variation about the mean $\stackrel{-}{y}$ that can be explained by the regression.
The standard error for the regression constant $\stackrel{^}{a}$
 $a_serr = rss df 1 ∑ w i + x - 2 ∑ w i x i - x - 2 = rss df 1 ∑ w i ∑ w i x i 2 ∑ w i x i - x - 2 .$
The standard error for the regression coefficient $\stackrel{^}{b}$
 $b_serr = rss df ∑ w i x i - x - 2 .$
Similar formulae can be derived for the case when the line goes through the origin, that is $a=0$.

## 4  References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley

## 5  Arguments

1:     meanNag_SumSquareInput
On entry: indicates whether nag_simple_linear_regression (g02cac) is to include a constant term in the regression.
${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
The regression constant $a$ is included.
${\mathbf{mean}}=\mathrm{Nag_AboutZero}$
The regression constant $a$ is not included, i.e., $a=0$.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$ or $\mathrm{Nag_AboutZero}$.
2:     nIntegerInput
On entry: the number of observations, $n$.
Constraints:
• if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$, ${\mathbf{n}}\ge 2$;
• if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$, ${\mathbf{n}}\ge 1$.
3:     x[n]const doubleInput
On entry: the values of the independent variable with the $\mathit{i}$th value stored in $x\left[\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,n$.
Constraint: all the values of $x$ must not be identical.
4:     y[n]const doubleInput
On entry: the values of the dependent variable with the $\mathit{i}$th value stored in $y\left[\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,n$.
Constraint: all the values of $y$ must not be identical.
5:     wt[n]const doubleInput
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Otherwise wt need not be defined and may be set to the null pointer NULL, i.e., (double *)0. Usually ${\mathbf{wt}}\left[i-1\right]$ will be an integral value corresponding to the number of observations associated with the $i$th data point, or zero if the $i$th data point is to be ignored. The sum of the weights therefore represents the effective total number of observations used to create the regression line. If ${\mathbf{wt}}=\mathbf{NULL}$, then the effective number of observations is $n$.
Constraint: ${\mathbf{wt}}=\mathbf{NULL}\text{​ or ​}{\mathbf{wt}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.
On exit: if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$ then a is the regression constant $\stackrel{^}{a}$, otherwise a is set to zero.
7:     bdouble *Output
On exit: the regression coefficient $\stackrel{^}{b}$.
8:     a_serrdouble *Output
On exit: the standard error of the regression constant $\stackrel{^}{a}$.
9:     b_serrdouble *Output
On exit: the standard error of the regression coefficient $\stackrel{^}{b}$.
10:   rsqdouble *Output
On exit: the coefficient of determination, ${R}^{2}$.
On exit: the sum of squares of the residuals about the regression.
12:   dfdouble *Output
On exit: the degrees of freedom associated with the residual sum of squares.
13:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

On entry, argument mean had an illegal value.
NE_INT_ARG_LT
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 1$
if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 2$
if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_SW_LOW
On entry, the sum of elements of wt must be greater than 1.0 if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$ or greater than 2.0 if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_WT_LOW
On entry, wt must contain at least 1 positive element if ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$ or at least 2 positive elements if ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$.
NE_X_OR_Y_IDEN
On entry, all elements of x and/or y are equal.
NE_ZERO_DOF_RESID
On entry, the degrees of freedom for the residual are zero, i.e., the designated number of arguments $\text{}=\text{}$ the effective number of observations.
Residual sum of squares is zero, i.e., a perfect fit was obtained.

## 7  Accuracy

The computations are believed to be stable.

## 8  Further Comments

The time taken by the function depends on $n$. The function uses a two-pass algorithm.

## 9  Example

A program to calculate regression constants, $\stackrel{^}{a}$ and $\stackrel{^}{b}$, the standard error of the regression constants, the regression coefficient of determination and the degrees of freedom about the regression.

### 9.1  Program Text

Program Text (g02cace.c)

### 9.2  Program Data

Program Data (g02cace.d)

### 9.3  Program Results

Program Results (g02cace.r)