g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_sum_sqs (g02buc)

## 1  Purpose

nag_sum_sqs (g02buc) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

## 2  Specification

 #include #include
 void nag_sum_sqs (Nag_OrderType order, Nag_SumSquare mean, Integer n, Integer m, const double x[], Integer pdx, const double wt[], double *sw, double wmean[], double c[], NagError *fail)

## 3  Description

nag_sum_sqs (g02buc) is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of $n$ observations on $m$ variables ${X}_{j}$, for $\mathit{j}=1,2,\dots ,m$. The algorithm makes a single pass through the data.
For the first $i-1$ observations let the mean of the $j$th variable be ${\stackrel{-}{x}}_{j}\left(i-1\right)$, the cross-product about the mean for the $j$th and $k$th variables be ${c}_{jk}\left(i-1\right)$ and the sum of weights be ${W}_{i-1}$. These are updated by the $i$th observation, ${x}_{ij}$, for $\mathit{j}=1,2,\dots ,m$, with weight ${w}_{i}$ as follows:
 $Wi = Wi-1 + wi x-j i = x-j i-1 + wiWi xj - x-j i-1 , j=1,2,…,m$
and
 $cjk i = cjk i- 1 + wi Wi xj - x-j i- 1 xk - x-k i-1 Wi-1 , j=1,2,…,m ​ and ​ k=j,j+ 1,…,m .$
The algorithm is initialized by taking ${\stackrel{-}{x}}_{j}\left(1\right)={x}_{1j}$, the first observation, and ${c}_{ij}\left(1\right)=0.0$.
For the unweighted case ${w}_{i}=1$ and ${W}_{i}=i$ for all $i$.
Note that only the upper triangle of the matrix is calculated and returned packed by column.

## 4  References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

## 5  Arguments

1:     orderNag_OrderTypeInput
On entry: the order argument specifies the two-dimensional storage scheme being used, i.e., row-major ordering or column-major ordering. C language defined storage is specified by ${\mathbf{order}}=\mathrm{Nag_RowMajor}$. See Section 3.2.1.3 in the Essential Introduction for a more detailed explanation of the use of this argument.
Constraint: ${\mathbf{order}}=\mathrm{Nag_RowMajor}$ or Nag_ColMajor.
2:     meanNag_SumSquareInput
On entry: indicates whether nag_sum_sqs (g02buc) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
The sums of squares and cross-products of deviations about the mean are calculated.
${\mathbf{mean}}=\mathrm{Nag_AboutZero}$
The sums of squares and cross-products are calculated.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$ or $\mathrm{Nag_AboutZero}$.
3:     nIntegerInput
On entry: $n$, the number of observations in the dataset.
Constraint: ${\mathbf{n}}\ge 1$.
4:     mIntegerInput
On entry: $m$, the number of variables.
Constraint: ${\mathbf{m}}\ge 1$.
5:     x[$\mathit{dim}$]const doubleInput
Note: the dimension, dim, of the array x must be at least
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{pdx}}×{\mathbf{m}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{n}}×{\mathbf{pdx}}\right)$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
Where ${\mathbf{X}}\left(i,j\right)$ appears in this document, it refers to the array element
• ${\mathbf{x}}\left[\left(j-1\right)×{\mathbf{pdx}}+i-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_ColMajor}$;
• ${\mathbf{x}}\left[\left(i-1\right)×{\mathbf{pdx}}+j-1\right]$ when ${\mathbf{order}}=\mathrm{Nag_RowMajor}$.
On entry: ${\mathbf{X}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
6:     pdxIntegerInput
On entry: the stride separating row or column elements (depending on the value of order) in the array x.
Constraints:
• if ${\mathbf{order}}=\mathrm{Nag_ColMajor}$, ${\mathbf{pdx}}\ge {\mathbf{n}}$;
• if ${\mathbf{order}}=\mathrm{Nag_RowMajor}$, ${\mathbf{pdx}}\ge {\mathbf{m}}$.
7:     wt[$\mathit{dim}$]const doubleInput
Note: the dimension, dim, of the array wt must be at least ${\mathbf{n}}$.
On entry: the optional weights of each observation. If weights are not provided then wt must be set to the null pointer, i.e., (double *)0, otherwise ${\mathbf{wt}}\left[i-1\right]$ must contain the weight for the $i-1$th observation.
Constraint: if , ${\mathbf{wt}}\left[\mathit{i}\right]\ge 0.0$, for $\mathit{i}=0,1,\dots ,n-1$.
8:     swdouble *Output
On exit: the sum of weights.
If , sw contains the number of observations, $n$.
9:     wmean[m]doubleOutput
On exit: the sample means. ${\mathbf{wmean}}\left[j-1\right]$ contains the mean for the $j$th variable.
10:   c[$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$]doubleOutput
On exit: the cross-products.
If ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If ${\mathbf{mean}}=\mathrm{Nag_AboutZero}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the $j$th and $k$th variable, $k\ge j$, is stored in ${\mathbf{c}}\left[k×\left(k-1\right)/2+j-1\right]$.
11:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_INT
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\ge 1$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 1$.
On entry, ${\mathbf{pdx}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdx}}>0$.
NE_INT_2
On entry, ${\mathbf{pdx}}=〈\mathit{\text{value}}〉$ and ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{pdx}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{pdx}}\ge {\mathbf{n}}$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARRAY_ELEM_CONS
On entry, ${\mathbf{wt}}\left[〈\mathit{\text{value}}〉\right]<0.0$.

## 7  Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

nag_cov_to_corr (g02bwc) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled to give a variance-covariance matrix.
The means and cross-products produced by nag_sum_sqs (g02buc) may be updated by adding or removing observations using nag_sum_sqs_update (g02btc).

## 9  Example

A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of $3$ observations of $3$ variables.

### 9.1  Program Text

Program Text (g02buce.c)

### 9.2  Program Data

Program Data (g02buce.d)

### 9.3  Program Results

Program Results (g02buce.r)