g02 Chapter Contents
g02 Chapter Introduction
NAG Library Manual

# NAG Library Function Documentnag_sum_sqs_combine (g02bzc)

## 1  Purpose

nag_sum_sqs_combine (g02bzc) combines two sets of sample means and sums of squares and cross-products matrices. It is designed to be used in conjunction with nag_sum_sqs (g02buc) to allow large datasets to be summarised.

## 2  Specification

 #include #include
 void nag_sum_sqs_combine (Nag_SumSquare mean, Integer m, double *xsw, double xmean[], double xc[], double ysw, const double ymean[], const double yc[], NagError *fail)

## 3  Description

Let $X$ and $Y$ denote two sets of data, each with $m$ variables and ${n}_{x}$ and ${n}_{y}$ observations respectively. Let ${\mu }_{x}$ denote the (optionally weighted) vector of $m$ means for the first dataset and ${C}_{x}$ denote either the sums of squares and cross-products of deviations from ${\mu }_{x}$
 $Cx= X-e⁢ μxT T ⁢ Dx ⁢ X-e⁢ μxT$
or the sums of squares and cross-products, in which case
 $Cx = XT ⁢ Dx ⁢X$
where $e$ is a vector of ${n}_{x}$ ones and ${D}_{x}$ is a diagonal matrix of (optional) weights and ${W}_{x}$ is defined as the sum of the diagonal elements of $D$. Similarly, let ${\mu }_{y}$, ${C}_{y}$ and ${W}_{y}$ denote the same quantities for the second dataset.
Given ${\mu }_{x},{\mu }_{y},{C}_{x},{C}_{y},{W}_{x}$ and ${W}_{y}$ nag_sum_sqs_combine (g02bzc) calculates ${\mu }_{z}$, ${C}_{z}$ and ${W}_{z}$ as if a dataset $Z$, with $m$ variables and ${n}_{x}+{n}_{y}$ observations were supplied to nag_sum_sqs (g02buc), with $Z$ constructed as
 $Z = X Y .$
nag_sum_sqs_combine (g02bzc) has been designed to combine the results from two calls to nag_sum_sqs (g02buc) allowing large datasets, or cases where all the data is not available at the same time, to be summarised.

## 4  References

Bennett J, Pebay P, Roe D and Thompson D (2009) Numerically stable, single-pass, parallel statistics algorithms Proceedings of IEEE International Conference on Cluster Computing

## 5  Arguments

1:     meanNag_SumSquareInput
On entry: indicates whether the matrices supplied in xc and yc are sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
${\mathbf{mean}}=\mathrm{Nag_AboutMean}$
Sums of squares and cross-products of deviations about the mean have been supplied.
${\mathbf{mean}}=\mathrm{Nag_AboutZero}$
Sums of squares and cross-products have been supplied.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_AboutMean}$ or $\mathrm{Nag_AboutZero}$.
2:     mIntegerInput
On entry: $m$, the number of variables.
Constraint: ${\mathbf{m}}\ge 1$.
3:     xswdouble *Input/Output
On entry: ${W}_{x}$, the sum of weights, from the first set of data, $X$. If the data is unweighted then this will be the number of observations in the first dataset.
On exit: ${W}_{z}$, the sum of weights, from the combined dataset, $Z$. If both datasets are unweighted then this will be the number of observations in the combined dataset.
Constraint: ${\mathbf{xsw}}\ge 0$.
4:     xmean[m]doubleInput/Output
On entry: ${\mu }_{x}$, the sample means for the first set of data, $X$.
On exit: ${\mu }_{z}$, the sample means for the combined data, $Z$.
5:     xc[$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$]doubleInput/Output
On entry: ${C}_{x}$, the sums of squares and cross-products matrix for the first set of data, $X$, as returned by nag_sum_sqs (g02buc).
nag_sum_sqs (g02buc), returns this matrix packed by columns, i.e., the cross-product between the $j$th and $k$th variable, $k\ge j$, is stored in ${\mathbf{xc}}\left[k×\left(k-1\right)/2+j-1\right]$.
No check is made that ${C}_{x}$ is a valid cross-products matrix.
On exit: ${C}_{z}$, the sums of squares and cross-products matrix for the combined dataset, $Z$.
This matrix is again stored packed by columns.
6:     yswdoubleInput
On entry: ${W}_{y}$, the sum of weights, from the second set of data, $Y$. If the data is unweighted then this will be the number of observations in the second dataset.
Constraint: ${\mathbf{ysw}}\ge 0$.
7:     ymean[m]const doubleInput
On entry: ${\mu }_{y}$, the sample means for the second set of data, $Y$.
8:     yc[$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$]const doubleInput
On entry: ${C}_{y}$, the sums of squares and cross-products matrix for the second set of data, $Y$, as returned by nag_sum_sqs (g02buc).
nag_sum_sqs (g02buc), returns this matrix packed by columns, i.e., the cross-product between the $j$th and $k$th variable, $k\ge j$, is stored in ${\mathbf{yc}}\left[k×\left(k-1\right)/2+j-1\right]$.
No check is made that ${C}_{y}$ is a valid cross-products matrix.
9:     failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

On entry, argument $⟨\mathit{\text{value}}⟩$ had an illegal value.
NE_INT
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}\ge 1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL
On entry, ${\mathbf{xsw}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{xsw}}\ge 0.0$.
On entry, ${\mathbf{ysw}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ysw}}\ge 0.0$.

Not applicable.

## 8  Parallelism and Performance

nag_sum_sqs_combine (g02bzc) is not threaded by NAG in any implementation.
nag_sum_sqs_combine (g02bzc) makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.

None.

## 10  Example

This example illustrates the use of nag_sum_sqs_combine (g02bzc) by dividing a dataset into three blocks of $4$, $5$ and $3$ observations respectively. Each block of data is summarised using nag_sum_sqs (g02buc) and then the three summaries combined using nag_sum_sqs_combine (g02bzc).
The resulting sums of squares and cross-products matrix is then scaled to obtain the covariance matrix for the whole dataset.

### 10.1  Program Text

Program Text (g02bzce.c)

### 10.2  Program Data

Program Data (g02bzce.d)

### 10.3  Program Results

Program Results (g02bzce.r)