Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_ssqmat_combine (g02bz)

## Purpose

nag_correg_ssqmat_combine (g02bz) combines two sets of sample means and sums of squares and cross-products matrices. It is designed to be used in conjunction with nag_correg_ssqmat (g02bu) to allow large datasets to be summarised.

## Syntax

[xsw, xmean, xc, ifail] = g02bz(xsw, xmean, xc, ysw, ymean, yc, 'mean', mean, 'm', m)
[xsw, xmean, xc, ifail] = nag_correg_ssqmat_combine(xsw, xmean, xc, ysw, ymean, yc, 'mean', mean, 'm', m)

## Description

Let X$X$ and Y$Y$ denote two sets of data, each with m$m$ variables and nx${n}_{x}$ and ny${n}_{y}$ observations respectively. Let μx${\mu }_{x}$ denote the (optionally weighted) vector of m$m$ means for the first dataset and Cx${C}_{x}$ denote either the sums of squares and cross-products of deviations from μx${\mu }_{x}$
 Cx = (X − eμxT)T Dx (X − eμxT) $Cx= ( X-e⁢ μxT )T ⁢ Dx ⁢ ( X-e⁢ μxT )$
or the sums of squares and cross-products, in which case
 Cx = XT Dx X $Cx = XT ⁢ Dx ⁢X$
where e$e$ is a vector of nx${n}_{x}$ ones and Dx${D}_{x}$ is a diagonal matrix of (optional) weights and Wx${W}_{x}$ is defined as the sum of the diagonal elements of D$D$. Similarly, let μy${\mu }_{y}$, Cy${C}_{y}$ and Wy${W}_{y}$ denote the same quantities for the second dataset.
Given μx, μy, Cx, Cy, Wx ${\mu }_{x},{\mu }_{y},{C}_{x},{C}_{y},{W}_{x}$ and Wy ${W}_{y}$ nag_correg_ssqmat_combine (g02bz) calculates μz${\mu }_{z}$, Cz${C}_{z}$ and Wz${W}_{z}$ as if a dataset Z$Z$, with m$m$ variables and nx + ny${n}_{x}+{n}_{y}$ observations were supplied to nag_correg_ssqmat (g02bu), with Z$Z$ constructed as
Z =
 X Y
.
$Z = ( X Y ) .$
nag_correg_ssqmat_combine (g02bz) has been designed to combine the results from two calls to nag_correg_ssqmat (g02bu) allowing large datasets, or cases where all the data is not available at the same time, to be summarised.

## References

Bennett J, Pebay P, Roe D and Thompson D (2009) Numerically stable, single-pass, parallel statistics algorithms Proceedings of IEEE International Conference on Cluster Computing

## Parameters

### Compulsory Input Parameters

1:     xsw – double scalar
Wx${W}_{x}$, the sum of weights, from the first set of data, X$X$. If the data is unweighted then this will be the number of observations in the first dataset.
Constraint: xsw0${\mathbf{xsw}}\ge 0$.
2:     xmean(m) – double array
m, the dimension of the array, must satisfy the constraint m1${\mathbf{m}}\ge 1$.
μx${\mu }_{x}$, the sample means for the first set of data, X$X$.
3:     xc((m × m + m) / 2$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$) – double array
Cx${C}_{x}$, the sums of squares and cross-products matrix for the first set of data, X$X$, as returned by nag_correg_ssqmat (g02bu).
nag_correg_ssqmat (g02bu), returns this matrix packed by columns, i.e., the cross-product between the j$j$th and k$k$th variable, kj$k\ge j$, is stored in xc(k × (k1) / 2 + j)${\mathbf{xc}}\left(k×\left(k-1\right)/2+j\right)$.
No check is made that Cx${C}_{x}$ is a valid cross-products matrix.
4:     ysw – double scalar
Wy${W}_{y}$, the sum of weights, from the second set of data, Y$Y$. If the data is unweighted then this will be the number of observations in the second dataset.
Constraint: ysw0${\mathbf{ysw}}\ge 0$.
5:     ymean(m) – double array
m, the dimension of the array, must satisfy the constraint m1${\mathbf{m}}\ge 1$.
μy${\mu }_{y}$, the sample means for the second set of data, Y$Y$.
6:     yc((m × m + m) / 2$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$) – double array
Cy${C}_{y}$, the sums of squares and cross-products matrix for the second set of data, Y$Y$, as returned by nag_correg_ssqmat (g02bu).
nag_correg_ssqmat (g02bu), returns this matrix packed by columns, i.e., the cross-product between the j$j$th and k$k$th variable, kj$k\ge j$, is stored in yc(k × (k1) / 2 + j)${\mathbf{yc}}\left(k×\left(k-1\right)/2+j\right)$.
No check is made that Cy${C}_{y}$ is a valid cross-products matrix.

### Optional Input Parameters

1:     mean – string (length ≥ 1)
Indicates whether the matrices supplied in xc and yc are sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
mean = 'M'${\mathbf{mean}}=\text{'M'}$
Sums of squares and cross-products of deviations about the mean have been supplied.
mean = 'Z'${\mathbf{mean}}=\text{'Z'}$
Sums of squares and cross-products have been supplied.
Default: 'M'$\text{'M'}$
Constraint: mean = 'M'${\mathbf{mean}}=\text{'M'}$ or 'Z'$\text{'Z'}$.
2:     m – int64int32nag_int scalar
Default: The dimension of the array xmean and the dimension of the array ymean. (An error is raised if these dimensions are not equal.)
m$m$, the number of variables.
Constraint: m1${\mathbf{m}}\ge 1$.

None.

### Output Parameters

1:     xsw – double scalar
Wz${W}_{z}$, the sum of weights, from the combined dataset, Z$Z$. If both datasets are unweighted then this will be the number of observations in the combined dataset.
2:     xmean(m) – double array
μz${\mu }_{z}$, the sample means for the combined data, Z$Z$.
3:     xc((m × m + m) / 2$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$) – double array
Cz${C}_{z}$, the sums of squares and cross-products matrix for the combined dataset, Z$Z$.
This matrix is again stored packed by columns.
4:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 11${\mathbf{ifail}}=11$
On entry, mean = _${\mathbf{mean}}=_$.
Constraint: mean = 'M'${\mathbf{mean}}=\text{'M'}$ or 'Z'$\text{'Z'}$.
ifail = 21${\mathbf{ifail}}=21$
Constraint: m1${\mathbf{m}}\ge 1$.
ifail = 31${\mathbf{ifail}}=31$
Constraint: xsw0.0${\mathbf{xsw}}\ge 0.0$.
ifail = 61${\mathbf{ifail}}=61$
Constraint: ysw0.0${\mathbf{ysw}}\ge 0.0$.

Not applicable.

None.

## Example

```function nag_correg_ssqmat_combine_example
x1 = [-1.10, 4.06, -0.95, 8.53,10.41;
1.63,-3.22, -1.15,-1.30, 3.78;
-2.23,-8.19, -3.50, 4.31,-1.11;
0.92, 0.33, -1.60, 5.80,-1.15];

x2 = [2.12, 5.00,-11.69,-1.22, 2.86;
4.82,-7.23, -4.67, 0.83, 3.46;
-0.51,-1.12, -1.76, 1.45, 0.26;
-4.32, 4.89,  1.34,-1.12,-2.49;
0.02,-0.74,  0.94,-0.99,-2.61];

wt = [2; 0.89; 0.32; 4.19; 4.33];

x3 = [ 1.37, 0.00, -0.53,-7.98, 3.32;
4.15,-2.81, -4.09,-7.96,-2.13;
13.09,-1.43,  5.16,-1.83, 1.58];

for b=1:3

switch b
case 1
% This is the first block of data, so summarise the data into xmean and xc
[xsw, xmean, xc, ifail] = nag_correg_ssqmat(x1);
case 2
[ysw, ymean, yc, ifail] = nag_correg_ssqmat(x2, 'wt', wt);
case 3
[ysw, ymean, yc, ifail] = nag_correg_ssqmat(x3);
end

if b ~= 1
% Update the running summaries
[xsw, xmean, xc, ifail] = nag_correg_ssqmat_combine(xsw, xmean, xc, ysw, ymean, yc);
end
end

% Display results
fprintf('\nMeans\n');
disp(xmean');
nag_file_print_matrix_real_packed('u', 'non-unit', int64(5), xc, 'Sums of squares and cross-products');

if xsw > 1
% Scale the sums of squares and cross-products matrix xc, and so convert it
% to a covariance matrix
nag_file_print_matrix_real_packed('u', 'non-unit', int64(5), xc/(xsw-1), 'Covariance Matrix');
end
```
```

Means
0.4369    0.4929   -1.3387   -0.5684    0.0987

Sums of squares and cross-products
1          2          3          4          5
1    304.5052  -123.7700   -27.1830   -60.7092    83.4830
2               298.9148   -17.3196    -2.1710     5.2072
3                          332.1639    -3.9445   -96.9299
4                                     264.7684    79.6211
5                                                225.5948
Covariance Matrix
1          2          3          4          5
1     17.1746    -6.9808    -1.5332    -3.4241     4.7086
2                16.8593    -0.9769    -0.1224     0.2937
3                           18.7346    -0.2225    -5.4670
4                                      14.9334     4.4908
5                                                 12.7239

```
```function g02bz_example
x1 = [-1.10, 4.06, -0.95, 8.53,10.41;
1.63,-3.22, -1.15,-1.30, 3.78;
-2.23,-8.19, -3.50, 4.31,-1.11;
0.92, 0.33, -1.60, 5.80,-1.15];

x2 = [2.12, 5.00,-11.69,-1.22, 2.86;
4.82,-7.23, -4.67, 0.83, 3.46;
-0.51,-1.12, -1.76, 1.45, 0.26;
-4.32, 4.89,  1.34,-1.12,-2.49;
0.02,-0.74,  0.94,-0.99,-2.61];

wt = [2; 0.89; 0.32; 4.19; 4.33];

x3 = [ 1.37, 0.00, -0.53,-7.98, 3.32;
4.15,-2.81, -4.09,-7.96,-2.13;
13.09,-1.43,  5.16,-1.83, 1.58];

for b=1:3

switch b
case 1
% This is the first block of data, so summarise the data into xmean and xc
[xsw, xmean, xc, ifail] = g02bu(x1);
case 2
[ysw, ymean, yc, ifail] = g02bu(x2, 'wt', wt);
case 3
[ysw, ymean, yc, ifail] = g02bu(x3);
end

if b ~= 1
% Update the running summaries
[xsw, xmean, xc, ifail] = g02bz(xsw, xmean, xc, ysw, ymean, yc);
end
end

% Display results
fprintf('\nMeans\n');
disp(xmean');
x04cc('u', 'non-unit', int64(5), xc, 'Sums of squares and cross-products');

if xsw > 1
% Scale the sums of squares and cross-products matrix xc, and so convert it
% to a covariance matrix
x04cc('u', 'non-unit', int64(5), xc/(xsw-1), 'Covariance Matrix');
end
```
```

Means
0.4369    0.4929   -1.3387   -0.5684    0.0987

Sums of squares and cross-products
1          2          3          4          5
1    304.5052  -123.7700   -27.1830   -60.7092    83.4830
2               298.9148   -17.3196    -2.1710     5.2072
3                          332.1639    -3.9445   -96.9299
4                                     264.7684    79.6211
5                                                225.5948
Covariance Matrix
1          2          3          4          5
1     17.1746    -6.9808    -1.5332    -3.4241     4.7086
2                16.8593    -0.9769    -0.1224     0.2937
3                           18.7346    -0.2225    -5.4670
4                                      14.9334     4.4908
5                                                 12.7239

```