hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_ssqmat (g02bu)

Purpose

nag_correg_ssqmat (g02bu) calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

Syntax

[sw, wmean, c, ifail] = g02bu(x, 'mean', mean, 'n', n, 'm', m, 'wt', wt)
[sw, wmean, c, ifail] = nag_correg_ssqmat(x, 'mean', mean, 'n', n, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: n has been made optional
Mark 24: mean optional
.

Description

nag_correg_ssqmat (g02bu) is an adaptation of West's WV2 algorithm; see West (1979). This function calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of nn observations on mm variables XjXj, for j = 1,2,,mj=1,2,,m. The algorithm makes a single pass through the data.
For the first i1i-1 observations let the mean of the jjth variable be xj(i1)x-j(i-1), the cross-product about the mean for the jjth and kkth variables be cjk(i1)cjk(i-1) and the sum of weights be Wi1Wi-1. These are updated by the iith observation, xijxij, for j = 1,2,,mj=1,2,,m, with weight wiwi as follows:
Wi = Wi1 + wi
xj (i) = xj (i1) + (wi)/(Wi) (xjxj(i1)) ,   j = 1,2,,m
Wi = Wi-1 + wi x-j (i) = x-j (i-1) + wiWi ( xj - x-j (i-1) ) ,   j=1,2,,m
and
cjk (i) = cjk (i1) + (wi)/(Wi) (xjxj(i1)) (xkxk(i1)) Wi1 ,   j = 1,2,,m ​ and ​ k = j,j + 1,,m .
cjk (i) = cjk (i- 1) + wi Wi ( xj - x-j (i- 1) ) ( xk - x-k (i-1) ) Wi-1 ,   j=1,2,,m ​ and ​ k=j,j+ 1,,m .
The algorithm is initialized by taking xj(1) = x1jx-j(1)=x1j, the first observation, and cij(1) = 0.0cij(1)=0.0.
For the unweighted case wi = 1wi=1 and Wi = iWi=i for all ii.
Note that only the upper triangle of the matrix is calculated and returned packed by column.

References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

Parameters

Compulsory Input Parameters

1:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnldxn.
x(i,j)xij must contain the iith observation on the jjth variable, for i = 1,2,,ni=1,2,,n and j = 1,2,,mj=1,2,,m.

Optional Input Parameters

1:     mean – string (length ≥ 1)
Indicates whether nag_correg_ssqmat (g02bu) is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
mean = 'M'mean='M'
The sums of squares and cross-products of deviations about the mean are calculated.
mean = 'Z'mean='Z'
The sums of squares and cross-products are calculated.
Default: 'M''M'
Constraint: mean = 'M'mean='M' or 'Z''Z'.
2:     n – int64int32nag_int scalar
Default: The first dimension of the array x.
nn, the number of observations in the dataset.
Constraint: n1n1.
3:     m – int64int32nag_int scalar
Default: The second dimension of the array x.
mm, the number of variables.
Constraint: m1m1.
4:     wt( : :) – double array
Note: the dimension of the array wt must be at least nn if weight = 'W'weight='W', and at least 11 otherwise.
The optional weights of each observation.
If weight = 'U'weight='U', wt is not referenced.
If weight = 'W'weight='W', wt(i)wti must contain the weight for the iith observation.
Constraint: if weight = 'W'weight='W', wt(i)0.0wti0.0, for i = 1,2,,ni=1,2,,n.

Input Parameters Omitted from the MATLAB Interface

weight ldx

Output Parameters

1:     sw – double scalar
The sum of weights.
If weight = 'U'weight='U', sw contains the number of observations, nn.
2:     wmean(m) – double array
The sample means. wmean(j)wmeanj contains the mean for the jjth variable.
3:     c((m × m + m) / 2(m×m+m)/2) – double array
The cross-products.
If mean = 'M'mean='M', c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If mean = 'Z'mean='Z', c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the jjth and kkth variable, kjkj, is stored in c(k × (k1) / 2 + j)ck×(k-1)/2+j.
4:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,m < 1m<1,
orn < 1n<1,
orldx < nldx<n.
  ifail = 2ifail=2
On entry,mean'M'mean'M' or 'Z''Z'.
  ifail = 3ifail=3
On entry,weight'W'weight'W' or 'U''U'.
  ifail = 4ifail=4
On entry, weight = 'W'weight='W', and a value of wt < 0.0wt<0.0.

Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

Further Comments

nag_correg_ssqmat_to_corrmat (g02bw) may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled using to give a variance-covariance matrix.
The means and cross-products produced by nag_correg_ssqmat (g02bu) may be updated by adding or removing observations using nag_correg_ssqmat_update (g02bt).

Example

function nag_correg_ssqmat_example
wt = [0.13, 1.307, 0.37];
x = [9.1231, 3.7011, 4.523;
     0.931, 0.09, 0.887;
     0.0009, 0.0099, 0.0999];
[sw, wmean, c, ifail] = nag_correg_ssqmat(x, 'wt', wt)
 

sw =

    1.8070


wmean =

    1.3299
    0.3334
    0.9874


c =

    8.7569
    3.6978
    1.5905
    4.0707
    1.6861
    1.9297


ifail =

                    0


function g02bu_example
wt = [0.13, 1.307, 0.37];
x = [9.1231, 3.7011, 4.523;
     0.931, 0.09, 0.887;
     0.0009, 0.0099, 0.0999];
[sw, wmean, c, ifail] = g02bu(x, 'wt', wt)
 

sw =

    1.8070


wmean =

    1.3299
    0.3334
    0.9874


c =

    8.7569
    3.6978
    1.5905
    4.0707
    1.6861
    1.9297


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013