nag_simple_linear_regression (g02cac) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

NAG Library Function Document

nag_simple_linear_regression (g02cac)

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

nag_simple_linear_regression (g02cac) performs a simple linear regression with or without a constant term. The data is optionally weighted.

2  Specification

#include <nag.h>
#include <nagg02.h>
void  nag_simple_linear_regression (Nag_SumSquare mean, Integer n, const double x[], const double y[], const double wt[], double *a, double *b, double *a_serr, double *b_serr, double *rsq, double *rss, double *df, NagError *fail)

3  Description

nag_simple_linear_regression (g02cac) fits a straight line model of the form,
E y = a + bx ,
where E y  is the expected value of the variable y , to the data points
x 1 , y 1 , x 2 , y 2 , , x n , y n ,
such that
y i = a + bx i + e i , i = 1 , 2 , , n n>2 .
where the e i  values are independent random errors. The i th data point may have an associated weight w i , these may be used either in the situation when var ε i = σ 2 / w i  or if observations have to be removed from the regression by having zero weight or have been observed with frequency w i .
The regression coefficient, b , and the regression constant, a  are estimated by minimizing
i=1 n w i e i 2 ,
if the weights option is not selected then w i = 1.0 .
The following statistics are computed: where the weighted means x -  and y -  are
x - = w i x i w i   and   y - = w i y i w i .
The number of degrees of freedom associated with rss  is
Note: the weights should be scaled to give the correct degrees of freedom in the case var ε i = σ 2 / w i .
The R 2  value or coefficient of determination
R 2 = w i y ^ i - y - i 2 w i y i - y - 2 = w i y i - y - 2 - rss w i y i - y - 2 .
This measures the proportion of the total variation about the mean y -  that can be explained by the regression.
The standard error for the regression constant a ^  
a_serr = rss df 1 w i + x - 2 w i x i - x - 2 = rss df 1 w i w i x i 2 w i x i - x - 2 .
The standard error for the regression coefficient b ^  
b_serr = rss df w i x i - x - 2 .
Similar formulae can be derived for the case when the line goes through the origin, that is a=0 .

4  References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley

5  Arguments

1:     meanNag_SumSquareInput
On entry: indicates whether nag_simple_linear_regression (g02cac) is to include a constant term in the regression.
The regression constant a  is included.
The regression constant a  is not included, i.e., a=0 .
Constraint: mean=Nag_AboutMean or Nag_AboutZero.
2:     nIntegerInput
On entry: the number of observations, n .
  • if mean=Nag_AboutMean, n2 ;
  • if mean=Nag_AboutZero, n1 .
3:     x[n]const doubleInput
On entry: the values of the independent variable with the i th value stored in x i-1 , for i=1,2,,n.
Constraint: all the values of x  must not be identical.
4:     y[n]const doubleInput
On entry: the values of the dependent variable with the i th value stored in y i-1 , for i=1,2,,n.
Constraint: all the values of y  must not be identical.
5:     wt[n]const doubleInput
On entry: if weighted estimates are required then wt must contain the weights to be used in the weighted regression. Otherwise wt need not be defined and may be set to the null pointer NULL, i.e., (double *)0. Usually wt[i-1]  will be an integral value corresponding to the number of observations associated with the i th data point, or zero if the i th data point is to be ignored. The sum of the weights therefore represents the effective total number of observations used to create the regression line. If wt = NULL , then the effective number of observations is n .
Constraint: wt = NULL ​ or ​ wt[i-1] 0.0 , for i=1,2,,n.
6:     adouble *Output
On exit: if mean=Nag_AboutMean then a is the regression constant a ^ , otherwise a is set to zero.
7:     bdouble *Output
On exit: the regression coefficient b ^ .
8:     a_serrdouble *Output
On exit: the standard error of the regression constant a ^ .
9:     b_serrdouble *Output
On exit: the standard error of the regression coefficient b ^ .
10:   rsqdouble *Output
On exit: the coefficient of determination, R 2 .
11:   rssdouble *Output
On exit: the sum of squares of the residuals about the regression.
12:   dfdouble *Output
On exit: the degrees of freedom associated with the residual sum of squares.
13:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

On entry, argument mean had an illegal value.
On entry, n=value.
Constraint: n1 
if mean=Nag_AboutZero.
On entry, n=value.
Constraint: n2 
if mean=Nag_AboutMean.
On entry, at least one of the weights is negative.
On entry, the sum of elements of wt must be greater than 1.0 if mean=Nag_AboutZero or greater than 2.0 if mean=Nag_AboutMean.
On entry, wt must contain at least 1 positive element if mean=Nag_AboutZero or at least 2 positive elements if mean=Nag_AboutMean.
On entry, all elements of x and/or y are equal.
On entry, the degrees of freedom for the residual are zero, i.e., the designated number of arguments =  the effective number of observations.
Residual sum of squares is zero, i.e., a perfect fit was obtained.

7  Accuracy

The computations are believed to be stable.

8  Further Comments

The time taken by the function depends on n . The function uses a two-pass algorithm.

9  Example

A program to calculate regression constants, a ^  and b ^ , the standard error of the regression constants, the regression coefficient of determination and the degrees of freedom about the regression.

9.1  Program Text

Program Text (g02cace.c)

9.2  Program Data

Program Data (g02cace.d)

9.3  Program Results

Program Results (g02cace.r)

nag_simple_linear_regression (g02cac) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012