Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_ssqmat (g02bu)

## Purpose

nag_correg_ssqmat (g02bu) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

## Syntax

[sw, wmean, c, ifail] = g02bu(x, 'mean_p', mean_p, 'n', n, 'm', m, 'wt', wt)
[sw, wmean, c, ifail] = nag_correg_ssqmat(x, 'mean_p', mean_p, 'n', n, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 24: mean_p was made optional At Mark 22: n was made optional

## Description

nag_correg_ssqmat (g02bu) is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of $n$ observations on $m$ variables ${X}_{j}$, for $\mathit{j}=1,2,\dots ,m$. The algorithm makes a single pass through the data.
For the first $i-1$ observations let the mean of the $j$th variable be ${\stackrel{-}{x}}_{j}\left(i-1\right)$, the cross-product about the mean for the $j$th and $k$th variables be ${c}_{jk}\left(i-1\right)$ and the sum of weights be ${W}_{i-1}$. These are updated by the $i$th observation, ${x}_{ij}$, for $\mathit{j}=1,2,\dots ,m$, with weight ${w}_{i}$ as follows:
 $Wi = Wi-1 + wi x-j i = x-j i-1 + wiWi xj - x-j i-1 , j=1,2,…,m$
and
 $cjk i = cjk i- 1 + wi Wi xj - x-j i- 1 xk - x-k i-1 Wi-1 , j=1,2,…,m ​ and ​ k=j,j+ 1,…,m .$
The algorithm is initialized by taking ${\stackrel{-}{x}}_{j}\left(1\right)={x}_{1j}$, the first observation, and ${c}_{ij}\left(1\right)=0.0$.
For the unweighted case ${w}_{i}=1$ and ${W}_{i}=i$ for all $i$.
Note that only the upper triangle of the matrix is calculated and returned packed by column.

## References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{m}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.

### Optional Input Parameters

1:     $\mathrm{mean_p}$ – string (length ≥ 1)
Default: $\text{'M'}$
Indicates whether nag_correg_ssqmat (g02bu) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
${\mathbf{mean_p}}=\text{'M'}$
The sums of squares and cross-products of deviations about the mean are calculated.
${\mathbf{mean_p}}=\text{'Z'}$
The sums of squares and cross-products are calculated.
Constraint: ${\mathbf{mean_p}}=\text{'M'}$ or $\text{'Z'}$.
2:     $\mathrm{n}$int64int32nag_int scalar
Default: the first dimension of the array x.
$n$, the number of observations in the dataset.
Constraint: ${\mathbf{n}}\ge 1$.
3:     $\mathrm{m}$int64int32nag_int scalar
Default: the second dimension of the array x.
$m$, the number of variables.
Constraint: ${\mathbf{m}}\ge 1$.
4:     $\mathrm{wt}\left(:\right)$ – double array
The dimension of the array wt must be at least ${\mathbf{n}}$ if $\mathit{weight}=\text{'W'}$, and at least $1$ otherwise
The optional weights of each observation.
If $\mathit{weight}=\text{'U'}$, wt is not referenced.
If $\mathit{weight}=\text{'W'}$, ${\mathbf{wt}}\left(i\right)$ must contain the weight for the $i$th observation.
Constraint: if $\mathit{weight}=\text{'W'}$, ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.

### Output Parameters

1:     $\mathrm{sw}$ – double scalar
The sum of weights.
If $\mathit{weight}=\text{'U'}$, sw contains the number of observations, $n$.
2:     $\mathrm{wmean}\left({\mathbf{m}}\right)$ – double array
The sample means. ${\mathbf{wmean}}\left(j\right)$ contains the mean for the $j$th variable.
3:     $\mathrm{c}\left(\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2\right)$ – double array
The cross-products.
If ${\mathbf{mean_p}}=\text{'M'}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If ${\mathbf{mean_p}}=\text{'Z'}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the $j$th and $k$th variable, $k\ge j$, is stored in ${\mathbf{c}}\left(k×\left(k-1\right)/2+j\right)$.
4:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{m}}<1$, or ${\mathbf{n}}<1$, or $\mathit{ldx}<{\mathbf{n}}$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{mean_p}}\ne \text{'M'}$ or $\text{'Z'}$.
${\mathbf{ifail}}=3$
 On entry, $\mathit{weight}\ne \text{'W'}$ or $\text{'U'}$.
${\mathbf{ifail}}=4$
 On entry, $\mathit{weight}=\text{'W'}$, and a value of ${\mathbf{wt}}<0.0$.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

nag_correg_ssqmat_to_corrmat (g02bw) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled using to give a variance-covariance matrix.
The means and cross-products produced by nag_correg_ssqmat (g02bu) may be updated by adding or removing observations using nag_correg_ssqmat_update (g02bt).
Two sets of means and cross-products, as produced by nag_correg_ssqmat (g02bu), can be combined using nag_correg_ssqmat_combine (g02bz).

## Example

A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of $3$ observations of $3$ variables.
```function g02bu_example

fprintf('g02bu example results\n\n');

wt = [0.1300  1.3070  0.3700];
x  = [9.1231  0.9310  0.0009;
3.7011  0.0900  0.0099;
4.5230  0.8870  0.0999];
[m,n] = size(x);
cn = (m*(m+1))/2;
m = int64(m);

[sw, wmean, c, ifail] = g02bu(x', 'wt', wt);

disp('Means');
disp(wmean');
disp('Weights');
disp(wt);

mtitle = 'Sums of squares and cross-products:';
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
uplo, diag, m, c, mtitle);

% Convert the sums of squares and cross-products to a variance matrix
v = c/(sw-1);
fprintf('\n');
mtitle = 'Variance matrix:';
[ifail] = x04cc( ...
uplo, diag, m, v, mtitle);

```
```g02bu example results

Means
1.3299    0.3334    0.9874

Weights
0.1300    1.3070    0.3700

Sums of squares and cross-products:
1          2          3
1      8.7569     3.6978     4.0707
2                 1.5905     1.6861
3                            1.9297

Variance matrix:
1          2          3
1     10.8512     4.5822     5.0443
2                 1.9709     2.0893
3                            2.3912
```