Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_ssqmat_update (g02bt)

Purpose

nag_correg_ssqmat_update (g02bt) updates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean, for a new observation. The data may be weighted.

Syntax

[sw, xbar, c, ifail] = g02bt(wt, x, sw, xbar, c, 'mean_p', mean_p, 'm', m, 'incx', incx)
[sw, xbar, c, ifail] = nag_correg_ssqmat_update(wt, x, sw, xbar, c, 'mean_p', mean_p, 'm', m, 'incx', incx)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 24: mean_p was made optional At Mark 23: incx was made optional (default 1)

Description

nag_correg_ssqmat_update (g02bt) is an adaptation of West's WV2 algorithm; see West (1979). This function updates the weighted means of variables and weighted sums of squares and cross-products or weighted sums of squares and cross-products of deviations about the mean for observations on $m$ variables ${X}_{j}$, for $j=1,2,\dots ,m$. For the first $i-1$ observations let the mean of the $j$th variable be ${\stackrel{-}{x}}_{j}\left(i-1\right)$, the cross-product about the mean for the $j$th and $k$th variables be ${c}_{jk}\left(i-1\right)$ and the sum of weights be ${W}_{i-1}$. These are updated by the $i$th observation, ${x}_{ij}$, for $\mathit{j}=1,2,\dots ,m$, with weight ${w}_{i}$ as follows:
 $Wi=Wi-1+wi, x-ji=x-ji-1+wiWixj-x-ji-1, j=1,2,…,m$
and
 $cjki=cjki- 1+wiWixj-x-ji- 1xk-x-ki- 1Wi- 1, j= 1,2,…,m;k=j,j+ 1,2,…,m.$
The algorithm is initialized by taking ${\stackrel{-}{x}}_{j}\left(1\right)={x}_{1j}$, the first observation and ${c}_{ij}\left(1\right)=0.0$.
For the unweighted case ${w}_{i}=1$ and ${W}_{i}=i$ for all $i$.

References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

Parameters

Compulsory Input Parameters

1:     $\mathrm{wt}$ – double scalar
The weight to use for the current observation, ${w}_{i}$.
For unweighted means and cross-products set ${\mathbf{wt}}=1.0$. The use of a suitable negative value of wt, e.g., $-{w}_{i}$ will have the effect of deleting the observation.
2:     $\mathrm{x}\left({\mathbf{m}}×{\mathbf{incx}}\right)$ – double array
${\mathbf{x}}\left(\left(j-1\right)×{\mathbf{incx}}+1\right)$ must contain the value of the $j$th variable for the current observation, $j=1,2,\dots ,m$.
3:     $\mathrm{sw}$ – double scalar
The sum of weights for the previous observations, ${W}_{i-1}$.
${\mathbf{sw}}=0.0$
The update procedure is initialized.
${\mathbf{sw}}+{\mathbf{wt}}=0.0$
All elements of xbar and c are set to zero.
Constraint: ${\mathbf{sw}}\ge 0.0$ and ${\mathbf{sw}}+{\mathbf{wt}}\ge 0.0$.
4:     $\mathrm{xbar}\left({\mathbf{m}}\right)$ – double array
If ${\mathbf{sw}}=0.0$, xbar is initialized, otherwise ${\mathbf{xbar}}\left(\mathit{j}\right)$ must contain the weighted mean of the $\mathit{j}$th variable for the previous $\left(\mathit{i}-1\right)$ observations, ${\stackrel{-}{x}}_{\mathit{j}}\left(\mathit{i}-1\right)$, for $\mathit{j}=1,2,\dots ,m$.
5:     $\mathrm{c}\left(\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2\right)$ – double array
If ${\mathbf{sw}}\ne 0.0$, c must contain the upper triangular part of the matrix of weighted sums of squares and cross-products or weighted sums of squares and cross-products of deviations about the mean. It is stored packed form by column, i.e., the cross-product between the $j$th and $k$th variable, $k\ge j$, is stored in ${\mathbf{c}}\left(k×\left(k-1\right)/2+j\right)$.

Optional Input Parameters

1:     $\mathrm{mean_p}$ – string (length ≥ 1)
Default: $\text{'M'}$
Indicates whether nag_correg_ssqmat_update (g02bt) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
${\mathbf{mean_p}}=\text{'M'}$
The sums of squares and cross-products of deviations about the mean are calculated.
${\mathbf{mean_p}}=\text{'Z'}$
The sums of squares and cross-products are calculated.
Constraint: ${\mathbf{mean_p}}=\text{'M'}$ or $\text{'Z'}$.
2:     $\mathrm{m}$int64int32nag_int scalar
Default: the dimension of the array xbar.
$m$, the number of variables.
Constraint: ${\mathbf{m}}\ge 1$.
3:     $\mathrm{incx}$int64int32nag_int scalar
Default: $1$
The increment of x. Two situations are common.
If ${\mathbf{incx}}=1$, the data values are to be found in consecutive locations in x, i.e., in a column.
If ${\mathbf{incx}}=\mathit{ldx}$, for some positive integer $\mathit{ldx}$, the data values are to be found as a row of an array with first dimension $\mathit{ldx}$.
Constraint: ${\mathbf{incx}}>0$.

Output Parameters

1:     $\mathrm{sw}$ – double scalar
Contains the updated sum of weights, ${W}_{i}$.
2:     $\mathrm{xbar}\left({\mathbf{m}}\right)$ – double array
${\mathbf{xbar}}\left(\mathit{j}\right)$ contains the weighted mean of the $\mathit{j}$th variable, ${\stackrel{-}{x}}_{\mathit{j}}\left(\mathit{i}\right)$, for $\mathit{j}=1,2,\dots ,m$.
3:     $\mathrm{c}\left(\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2\right)$ – double array
The update sums of squares and cross-products stored as on input.
4:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{m}}<1$, or ${\mathbf{incx}}<1$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{sw}}<0.0$.
${\mathbf{ifail}}=3$
 On entry, $\left({\mathbf{sw}}+{\mathbf{wt}}\right)<0.0$, the current weight causes the sum of weights to be less than $0.0$.
${\mathbf{ifail}}=4$
 On entry, ${\mathbf{mean_p}}\ne \text{'M'}$ or $\text{'Z'}$.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

Accuracy

For a detailed discussion of the accuracy of this method see Chan et al. (1982) and West (1979).

nag_correg_ssqmat_update (g02bt) may be used to update the results returned by nag_correg_ssqmat (g02bu).
nag_correg_ssqmat_to_corrmat (g02bw) may be used to calculate the correlation matrix from the matrix of sums of squares and cross-products of deviations about the mean and the matrix may be scaled using to produce a variance-covariance matrix.

Example

A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of $3$ observations of $3$ variables.
```function g02bt_example

fprintf('g02bt example results\n\n');

wt = [0.1300  1.3070  0.3700];
x  = [9.1231  0.9310  0.0009;
3.7011  0.0900  0.0099;
4.5230  0.8870  0.0999];
[m,n] = size(x);
cn = (m*(m+1))/2;
m = int64(m);

sw   = 0;
xbar = zeros(n,1);
c    = zeros(cn,1);

% Update one observatio at a time
for j = 1:n
[sw, xbar, c, ifail] = g02bt( ...
wt(j), x(:,j), sw, xbar, c);
end

disp('Means');
disp(xbar');

mtitle = 'Sums of squares and cross-products:';
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
uplo, diag, m, c, mtitle);

% Convert the sums of squares and cross-products to a variance matrix
v = c/(sw-1);
fprintf('\n');
mtitle = 'Variance matrix:';
[ifail] = x04cc( ...
uplo, diag, m, v, mtitle);

```
```g02bt example results

Means
1.3299    0.3334    0.9874

Sums of squares and cross-products:
1          2          3
1      8.7569     3.6978     4.0707
2                 1.5905     1.6861
3                            1.9297

Variance matrix:
1          2          3
1     10.8512     4.5822     5.0443
2                 1.9709     2.0893
3                            2.3912
```