nag_sum_sqs (g02buc) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

NAG Library Function Document

nag_sum_sqs (g02buc)

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

nag_sum_sqs (g02buc) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

2  Specification

#include <nag.h>
#include <nagg02.h>
void  nag_sum_sqs (Nag_OrderType order, Nag_SumSquare mean, Integer n, Integer m, const double x[], Integer pdx, const double wt[], double *sw, double wmean[], double c[], NagError *fail)

3  Description

nag_sum_sqs (g02buc) is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of n observations on m variables Xj, for j=1,2,,m. The algorithm makes a single pass through the data.
For the first i-1 observations let the mean of the jth variable be x-ji-1, the cross-product about the mean for the jth and kth variables be cjki-1 and the sum of weights be Wi-1. These are updated by the ith observation, xij, for j=1,2,,m, with weight wi as follows:
Wi = Wi-1 + wi x-j i = x-j i-1 + wiWi xj - x-j i-1 ,   j=1,2,,m
and
cjk i = cjk i- 1 + wi Wi xj - x-j i- 1 xk - x-k i-1 Wi-1 ,   j=1,2,,m ​ and ​ k=j,j+ 1,,m .
The algorithm is initialized by taking x-j1=x1j, the first observation, and cij1=0.0.
For the unweighted case wi=1 and Wi=i for all i.
Note that only the upper triangle of the matrix is calculated and returned packed by column.

4  References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5  Arguments

1:     orderNag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by order=Nag_RowMajor. See Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint: order=Nag_RowMajor or Nag_ColMajor.
2:     meanNag_SumSquareInput
On entry: indicates whether nag_sum_sqs (g02buc) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
mean=Nag_AboutMean
The sums of squares and cross-products of deviations about the mean are calculated.
mean=Nag_AboutZero
The sums of squares and cross-products are calculated.
Constraint: mean=Nag_AboutMean or Nag_AboutZero.
3:     nIntegerInput
On entry: n, the number of observations in the dataset.
Constraint: n1.
4:     mIntegerInput
On entry: m, the number of variables.
Constraint: m1.
5:     x[dim]const doubleInput
Note: the dimension, dim, of the array x must be at least
  • max1,pdx×m when order=Nag_ColMajor;
  • max1,n×pdx when order=Nag_RowMajor.
Where Xi,j appears in this document, it refers to the array element
  • x[j-1×pdx+i-1] when order=Nag_ColMajor;
  • x[i-1×pdx+j-1] when order=Nag_RowMajor.
On entry: Xi,j must contain the ith observation on the jth variable, for i=1,2,,n and j=1,2,,m.
6:     pdxIntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array x.
Constraints:
  • if order=Nag_ColMajor, pdxn;
  • if order=Nag_RowMajor, pdxm.
7:     wt[dim]const doubleInput
Note: the dimension, dim, of the array wt must be at least n.
On entry: the optional weights of each observation. If weights are not provided then wt must be set to the null pointer, i.e., (double *)0, otherwise wt[i-1] must contain the weight for the i-1th observation.
Constraint: if wt is not NULL, wt[i]0.0, for i=0,1,,n-1.
8:     swdouble *Output
On exit: the sum of weights.
If wt is NULL, sw contains the number of observations, n.
9:     wmean[m]doubleOutput
On exit: the sample means. wmean[j-1] contains the mean for the jth variable.
10:   c[m×m+m/2]doubleOutput
On exit: the cross-products.
If mean=Nag_AboutMean, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If mean=Nag_AboutZero, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the jth and kth variable, kj, is stored in c[k×k-1/2+j-1].
11:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_INT
On entry, m=value.
Constraint: m1.
On entry, n=value.
Constraint: n1.
On entry, pdx=value.
Constraint: pdx>0.
NE_INT_2
On entry, pdx=value and m=value.
Constraint: pdxm.
On entry, pdx=value and n=value.
Constraint: pdxn.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARRAY_ELEM_CONS
On entry, wt[value]<0.0.

7  Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

8  Further Comments

nag_cov_to_corr (g02bwc) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled to give a variance-covariance matrix.
The means and cross-products produced by nag_sum_sqs (g02buc) may be updated by adding or removing observations using nag_sum_sqs_update (g02btc).

9  Example

A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of 3 observations of 3 variables.

9.1  Program Text

Program Text (g02buce.c)

9.2  Program Data

Program Data (g02buce.d)

9.3  Program Results

Program Results (g02buce.r)


nag_sum_sqs (g02buc) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012