Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_corrmat (g02bx)

## Purpose

nag_correg_corrmat (g02bx) calculates the sample means, the standard deviations, the variance-covariance matrix, and the matrix of Pearson product-moment correlation coefficients for a set of data. Weights may be used.

## Syntax

[xbar, std, v, r, ifail] = g02bx(x, 'nonzwt', nonzwt, 'n', n, 'm', m, 'wt', wt)
[xbar, std, v, r, ifail] = nag_correg_corrmat(x, 'nonzwt', nonzwt, 'n', n, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 23: nonzwt was added to the interface; weight was removed from the interface; wt was made optional At Mark 22: n was made optional

## Description

For $n$ observations on $m$ variables the one-pass algorithm of West (1979) as implemented in nag_correg_ssqmat (g02bu) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for $p$ selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
 $x - j = ∑ i=1 n w i x ij ∑ i=1 n w i j = 1 , … , p$
(b) The variance-covariance matrix
 $C jk = ∑ i=1 n w i x ij - x - j x ik - x - k ∑ i=1 n w i - 1 j , k = 1 , … , p$
(c) The standard deviations
 $s j = C jj j = 1 , … , p$
(d) The Pearson product-moment correlation coefficients
 $R jk = C jk C jj C kk j , k = 1 , … , p$
where ${x}_{ij}$ is the value of the $i$th observation on the $j$th variable and ${w}_{i}$ is the weight for the $i$th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is ${\sum }_{i=1}^{n}{w}_{i}-1$, so the weights should be scaled so that the sum of weights reflects the true sample size.

## References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{m}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.

### Optional Input Parameters

1:     $\mathrm{nonzwt}$ – string (length ≥ 1)
Default: $\text{'W'}$
The variance calculation uses a divisor which is either the number of weights or the number of nonzero weights.
Constraint: ${\mathbf{nonzwt}}=\text{'W'}$ or $\text{'V'}$.
2:     $\mathrm{n}$int64int32nag_int scalar
Default: the first dimension of the array x.
The number of data observations in the sample.
Constraint: ${\mathbf{n}}>1$.
3:     $\mathrm{m}$int64int32nag_int scalar
Default: the second dimension of the array x.
The number of variables.
Constraint: ${\mathbf{m}}\ge 1$.
4:     $\mathrm{wt}\left({\mathbf{n}}\right)$ – double array
The dimension of the array wt must be at least ${\mathbf{n}}$ if $\mathit{weight}=\text{'W'}$ or $\text{'V'}$, and at least $1$ otherwise
$w$, the optional frequency weighting for each observation, with ${\mathbf{wt}}\left(i\right)={w}_{i}$. Usually ${w}_{i}$ will be an integral value corresponding to the number of observations associated with the $i$th data value, or zero if the $i$th data value is to be ignored. If $\mathit{weight}=\text{'U'}$, ${w}_{i}$ is set to $1$ for all $i$ and wt is not referenced.
Constraint: if $\mathit{weight}=\text{'W'}$ or $\text{'V'}$, $\sum _{\mathit{i}=1}^{{\mathbf{n}}}{\mathbf{wt}}\left(\mathit{i}\right)>1.0$, ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$.

### Output Parameters

1:     $\mathrm{xbar}\left({\mathbf{m}}\right)$ – double array
The sample means. ${\mathbf{xbar}}\left(j\right)$ contains the mean of the $j$th variable.
2:     $\mathrm{std}\left({\mathbf{m}}\right)$ – double array
The standard deviations. ${\mathbf{std}}\left(j\right)$ contains the standard deviation for the $j$th variable.
3:     $\mathrm{v}\left(\mathit{ldv},{\mathbf{m}}\right)$ – double array
The variance-covariance matrix. ${\mathbf{v}}\left(\mathit{j},\mathit{k}\right)$ contains the covariance between variables $\mathit{j}$ and $\mathit{k}$, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$ and $\mathit{k}=1,2,\dots ,{\mathbf{m}}$.
4:     $\mathrm{r}\left(\mathit{ldv},{\mathbf{m}}\right)$ – double array
The matrix of Pearson product-moment correlation coefficients. ${\mathbf{r}}\left(j,k\right)$ contains the correlation coefficient between variables $j$ and $k$.
5:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Note: nag_correg_corrmat (g02bx) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

${\mathbf{ifail}}=1$
 On entry, ${\mathbf{m}}<1$, or ${\mathbf{n}}\le 1$, or $\mathit{ldx}<{\mathbf{n}}$, or $\mathit{ldv}<{\mathbf{m}}$.
${\mathbf{ifail}}=2$
 On entry, $\mathit{weight}\ne \text{'U'}$, $\text{'V'}$ or $\text{'W'}$.
${\mathbf{ifail}}=3$
 On entry, $\mathit{weight}=\text{'W'}$ or $\text{'V'}$ and a value of ${\mathbf{wt}}<0.0$.
${\mathbf{ifail}}=4$
$\mathit{weight}=\text{'W'}$ and the sum of weights is not greater than $1.0$, or $\mathit{weight}=\text{'V'}$ and fewer than $2$ observations have nonzero weights.
W  ${\mathbf{ifail}}=5$
A variable has a zero variance. In this case v and std are returned as calculated but r will contain zero for any correlation involving a variable with zero variance.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

None.

## Example

The data are some of the results from 1988 Olympic Decathlon. They are the times (in seconds) for the 100m and 400m races and the distances (in metres) for the long jump, high jump and shot. Twenty observations are input and the correlation matrix is computed and printed.
```function g02bx_example

fprintf('g02bx example results\n\n');

x = [11.25  48.9  7.43  2.270  15.48;
10.87  47.7  7.45  1.971  14.97;
11.18  48.2  7.44  1.979  14.20;
10.62  49.0  7.38  2.026  15.02;
11.02  47.4  7.43  1.974  12.92;
10.83  48.3  7.72  2.124  13.58;
11.18  49.3  7.05  2.064  14.12;
11.05  48.2  6.95  2.001  15.34;
11.15  49.1  7.12  2.035  14.52;
11.23  48.6  7.28  1.970  15.25;
10.94  49.9  7.45  1.974  15.34;
11.18  49.0  7.34  1.942  14.48;
11.02  48.2  7.29  2.063  12.92;
10.99  47.8  7.37  1.973  13.61;
11.03  48.9  7.45  1.974  14.20;
11.09  48.8  7.08  2.039  14.51;
11.46  51.2  6.75  2.008  16.07;
11.57  49.8  7.00  1.944  16.60;
11.07  47.9  7.04  1.947  13.41;
10.89  49.6  7.07  1.798  15.84];

[xbar, std, v, r, ifail] = g02bx(x);

disp('   Means');
disp(xbar');
disp('   Standard deviations');
disp(std');
mtitle = '  Correlation matrix:';
matrix = 'Upper';
diag   = 'Non-unit';

[ifail] = x04ca( ...
matrix, diag, r, mtitle);

```
```g02bx example results

Means
11.0810   48.7900    7.2545    2.0038   14.6190

Standard deviations
0.2132    0.9002    0.2349    0.0902    1.0249

Correlation matrix:
1       2       3       4       5
1   1.0000  0.4416 -0.5427  0.0696  0.3912
2           1.0000 -0.5058 -0.0678  0.7057
3                   1.0000  0.2768 -0.4352
4                           1.0000 -0.1494
5                                   1.0000
```