g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_corr_cov (g02bxc)

## 1  Purpose

nag_corr_cov (g02bxc) calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

## 2  Specification

 #include #include
 void nag_corr_cov (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)

## 3  Description

For $n$ observations on $m$ variables a one-pass updating algorithm (see West (1979)) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for $p$ selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
 $x - j = ∑ i=1 n w i x ij ∑ i=1 n w i j = 1 , … , p$
(b) The variance-covariance matrix
 $C jk = ∑ i=1 n w i x ij - x - j x ik - x - k ∑ i=1 n w i - 1 j , k = 1 , … , p$
(c) The standard deviations
 $s j = C jj j = 1 , … , p$
(d) The Pearson product-moment correlation coefficients
 $R jk = C jk C jj C kk j , k = 1 , … , p$
where ${x}_{ij}$ is the value of the $i$th observation on the $j$th variable and ${w}_{i}$ is the weight for the $i$th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is ${\sum }_{i=1}^{n}{w}_{i}-1$, so the weights should be scaled so that the sum of weights reflects the true sample size.

## 4  References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

## 5  Arguments

1:     nIntegerInput
On entry: the number of observations in the dataset, $n$.
Constraint: ${\mathbf{n}}>1$.
2:     mIntegerInput
On entry: the total number of variables, $m$.
Constraint: ${\mathbf{m}}\ge 1$.
3:     x[${\mathbf{n}}×{\mathbf{tdx}}$]const doubleInput
On entry: the data ${\mathbf{x}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdx}}+\mathit{j}-1\right]$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th variable, ${x}_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
4:     tdxIntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
5:     sx[m]const IntegerInput
On entry: indicates which $p$ variables to include in the analysis.
${\mathbf{sx}}\left[j-1\right]>0$
The $j$th variable is to be included.
${\mathbf{sx}}\left[j-1\right]=0$
The $j$th variable is not to be included.
sx is set to the null pointer (Integer *)0
All variables are included in the analysis, i.e., $p=m$.
Constraint: ${\mathbf{sx}}\left[\mathit{i}\right]\ge 0$, for $\mathit{i}=1,2,\dots ,m$.
6:     wt[n]const doubleInput
On entry: the optional frequency weighting for each observation. ${\mathbf{wt}}\left[i-1\right]$ contains the weight for the $i$th data value. Usually ${\mathbf{wt}}\left[i-1\right]$ will be an integral value corresponding to the number of observations associated with the $i$th data value, or zero if the $i$th data value is to be ignored. If wt is set to the null pointer (double *)0 then wt is not referenced.
Constraint: ${\mathbf{wt}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.
7:     swdouble *Output
On exit: the sum of weights if wt is not the null pointer, otherwise sw contains the number of observations, $n$.
8:     wmean[m]doubleOutput
On exit: the sample means. ${\mathbf{wmean}}\left[j-1\right]$ contains the mean for the $j$th variable.
9:     std[m]doubleOutput
On exit: the standard deviations. ${\mathbf{std}}\left[j-1\right]$ contains the standard deviation for the $j$th variable.
10:   r[${\mathbf{m}}×{\mathbf{tdr}}$]doubleOutput
On exit: the matrix of Pearson product-moment correlation coefficients. ${\mathbf{r}}\left[\left(j-1\right)×{\mathbf{tdr}}+k-1\right]$ contains the correlation between variables $j$ and $k$, for $j,k=1,\dots ,p$.
11:   tdrIntegerInput
On entry: the stride separating matrix column elements in the array r.
Constraint: ${\mathbf{tdr}}\ge {\mathbf{m}}$.
12:   v[${\mathbf{m}}×{\mathbf{tdv}}$]doubleOutput
On exit: the variance-covariance matrix. ${\mathbf{v}}\left[\left(j-1\right)×{\mathbf{tdv}}+k-1\right]$ contains the covariance between variables $j$ and $k$, for $j,k=1,\dots ,p$.
13:   tdvIntegerInput
On entry: the stride separating matrix column elements in the array v.
Constraint: ${\mathbf{tdv}}\ge {\mathbf{m}}$.
14:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{tdr}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
The arguments must satisfy ${\mathbf{tdr}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{tdv}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdv}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{tdx}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_INT_ARG_LE
On entry, n must be greater than 1: ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
NE_INT_ARG_LT
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\ge 1$.
NE_NEG_SX
On entry, at least one element of sx is negative.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_POS_SX
On entry, no element of sx is positive.
NE_SW_LT_ONE
On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

## 7  Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

Correlation coefficients based on ranks can be computed using nag_ken_spe_corr_coeff (g02brc).

## 9  Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

### 9.1  Program Text

Program Text (g02bxce.c)

### 9.2  Program Data

Program Data (g02bxce.d)

### 9.3  Program Results

Program Results (g02bxce.r)