hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_ssqmat (g02bu)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_correg_ssqmat (g02bu) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

Syntax

[sw, wmean, c, ifail] = g02bu(x, 'mean_p', mean_p, 'n', n, 'm', m, 'wt', wt)
[sw, wmean, c, ifail] = nag_correg_ssqmat(x, 'mean_p', mean_p, 'n', n, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 24: mean_p was made optional
At Mark 22: n was made optional

Description

nag_correg_ssqmat (g02bu) is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of n observations on m variables Xj, for j=1,2,,m. The algorithm makes a single pass through the data.
For the first i-1 observations let the mean of the jth variable be x-ji-1, the cross-product about the mean for the jth and kth variables be cjki-1 and the sum of weights be Wi-1. These are updated by the ith observation, xij, for j=1,2,,m, with weight wi as follows:
Wi = Wi-1 + wi x-j i = x-j i-1 + wiWi xj - x-j i-1 ,   j=1,2,,m  
and
cjk i = cjk i- 1 + wi Wi xj - x-j i- 1 xk - x-k i-1 Wi-1 ,   j=1,2,,m ​ and ​ k=j,j+ 1,,m .  
The algorithm is initialized by taking x-j1=x1j, the first observation, and cij1=0.0.
For the unweighted case wi=1 and Wi=i for all i.
Note that only the upper triangle of the matrix is calculated and returned packed by column.

References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

Parameters

Compulsory Input Parameters

1:     xldxm – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn.
xij must contain the ith observation on the jth variable, for i=1,2,,n and j=1,2,,m.

Optional Input Parameters

1:     mean_p – string (length ≥ 1)
Default: 'M'
Indicates whether nag_correg_ssqmat (g02bu) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
mean_p='M'
The sums of squares and cross-products of deviations about the mean are calculated.
mean_p='Z'
The sums of squares and cross-products are calculated.
Constraint: mean_p='M' or 'Z'.
2:     n int64int32nag_int scalar
Default: the first dimension of the array x.
n, the number of observations in the dataset.
Constraint: n1.
3:     m int64int32nag_int scalar
Default: the second dimension of the array x.
m, the number of variables.
Constraint: m1.
4:     wt: – double array
The dimension of the array wt must be at least n if weight='W', and at least 1 otherwise
The optional weights of each observation.
If weight='U', wt is not referenced.
If weight='W', wti must contain the weight for the ith observation.
Constraint: if weight='W', wti0.0, for i=1,2,,n.

Output Parameters

1:     sw – double scalar
The sum of weights.
If weight='U', sw contains the number of observations, n.
2:     wmeanm – double array
The sample means. wmeanj contains the mean for the jth variable.
3:     cm×m+m/2 – double array
The cross-products.
If mean_p='M', c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If mean_p='Z', c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the jth and kth variable, kj, is stored in ck×k-1/2+j.
4:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=1
On entry,m<1,
orn<1,
orldx<n.
   ifail=2
On entry,mean_p'M' or 'Z'.
   ifail=3
On entry,weight'W' or 'U'.
   ifail=4
On entry, weight='W', and a value of wt<0.0.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

Further Comments

nag_correg_ssqmat_to_corrmat (g02bw) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled using to give a variance-covariance matrix.
The means and cross-products produced by nag_correg_ssqmat (g02bu) may be updated by adding or removing observations using nag_correg_ssqmat_update (g02bt).
Two sets of means and cross-products, as produced by nag_correg_ssqmat (g02bu), can be combined using nag_correg_ssqmat_combine (g02bz).

Example

A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of 3 observations of 3 variables.
function g02bu_example


fprintf('g02bu example results\n\n');

wt = [0.1300  1.3070  0.3700];
x  = [9.1231  0.9310  0.0009;
      3.7011  0.0900  0.0099;
      4.5230  0.8870  0.0999];
[m,n] = size(x);
cn = (m*(m+1))/2;
m = int64(m);

[sw, wmean, c, ifail] = g02bu(x', 'wt', wt);

disp('Means');
disp(wmean');
disp('Weights');
disp(wt);

mtitle = 'Sums of squares and cross-products:';
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
                 uplo, diag, m, c, mtitle);

% Convert the sums of squares and cross-products to a variance matrix
v = c/(sw-1);
fprintf('\n');
mtitle = 'Variance matrix:';
[ifail] = x04cc( ...
                 uplo, diag, m, v, mtitle);



g02bu example results

Means
    1.3299    0.3334    0.9874

Weights
    0.1300    1.3070    0.3700

 Sums of squares and cross-products:
             1          2          3
 1      8.7569     3.6978     4.0707
 2                 1.5905     1.6861
 3                            1.9297

 Variance matrix:
             1          2          3
 1     10.8512     4.5822     5.0443
 2                 1.9709     2.0893
 3                            2.3912

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015