# NAG CL Interfaceg02bxc (corrmat)

Settings help

CL Name Style:

## 1Purpose

g02bxc calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

## 2Specification

 #include
 void g02bxc (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)
The function may be called by the names: g02bxc, nag_correg_corrmat or nag_corr_cov.

## 3Description

For $n$ observations on $m$ variables the one-pass algorithm of West (1979) as implemented in g02buc is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for $p$ selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
 $x ¯ j = ∑ i=1 n w i x ij ∑ i=1 n w i j = 1 , … , p$
(b) The variance-covariance matrix
 $C jk = ∑ i=1 n w i ( x ij - x ¯ j ) ( x ik - x ¯ k ) ∑ i=1 n w i - 1 j , k = 1 , … , p$
(c) The standard deviations
 $s j = C jj j = 1 , … , p$
(d) The Pearson product-moment correlation coefficients
 $R jk = C jk C jj C kk j , k = 1 , … , p$
where ${x}_{ij}$ is the value of the $i$th observation on the $j$th variable and ${w}_{i}$ is the weight for the $i$th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is ${\sum }_{i=1}^{n}{w}_{i}-1$, so the weights should be scaled so that the sum of weights reflects the true sample size.
Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

## 5Arguments

1: $\mathbf{n}$Integer Input
On entry: the number of observations in the dataset, $n$.
Constraint: ${\mathbf{n}}>1$.
2: $\mathbf{m}$Integer Input
On entry: the total number of variables, $m$.
Constraint: ${\mathbf{m}}\ge 1$.
3: $\mathbf{x}\left[{\mathbf{n}}×{\mathbf{tdx}}\right]$const double Input
On entry: the data ${\mathbf{x}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdx}}+\mathit{j}-1\right]$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th variable, ${x}_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
4: $\mathbf{tdx}$Integer Input
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
5: $\mathbf{sx}\left[{\mathbf{m}}\right]$const Integer Input
On entry: indicates which $p$ variables to include in the analysis.
${\mathbf{sx}}\left[j-1\right]>0$
The $j$th variable is to be included.
${\mathbf{sx}}\left[j-1\right]=0$
The $j$th variable is not to be included.
sx is set to NULL
All variables are included in the analysis, i.e., $p=m$.
Constraint: ${\mathbf{sx}}\left[\mathit{i}\right]\ge 0$, for $\mathit{i}=1,2,\dots ,m$.
6: $\mathbf{wt}\left[{\mathbf{n}}\right]$const double Input
On entry: $w$, the optional frequency weighting for each observation, with ${\mathbf{wt}}\left[i-1\right]={w}_{i}$. Usually ${w}_{i}$ will be an integral value corresponding to the number of observations associated with the $i$th data value, or zero if the $i$th data value is to be ignored. If wt is NULL then ${w}_{i}$ is set to $1$ for all $i$.
Constraints:
if wt is not NULL,
• ${\mathbf{wt}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$;
• $\sum _{\mathit{i}=1}^{{\mathbf{n}}}{\mathbf{wt}}\left[\mathit{i}-1\right]>1.0$.
7: $\mathbf{sw}$double * Output
On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations, $n$.
8: $\mathbf{wmean}\left[{\mathbf{m}}\right]$double Output
On exit: the sample means. ${\mathbf{wmean}}\left[j-1\right]$ contains the mean for the $j$th variable.
9: $\mathbf{std}\left[{\mathbf{m}}\right]$double Output
On exit: the standard deviations. ${\mathbf{std}}\left[j-1\right]$ contains the standard deviation for the $j$th variable.
10: $\mathbf{r}\left[{\mathbf{m}}×{\mathbf{tdr}}\right]$double Output
On exit: the matrix of Pearson product-moment correlation coefficients. ${\mathbf{r}}\left[\left(j-1\right)×{\mathbf{tdr}}+k-1\right]$ contains the correlation between variables $j$ and $k$, for $j,k=1,\dots ,p$.
11: $\mathbf{tdr}$Integer Input
On entry: the stride separating matrix column elements in the array r.
Constraint: ${\mathbf{tdr}}\ge {\mathbf{m}}$.
12: $\mathbf{v}\left[{\mathbf{m}}×{\mathbf{tdv}}\right]$double Output
On exit: the variance-covariance matrix. ${\mathbf{v}}\left[\left(j-1\right)×{\mathbf{tdv}}+k-1\right]$ contains the covariance between variables $j$ and $k$, for $j,k=1,\dots ,p$.
13: $\mathbf{tdv}$Integer Input
On entry: the stride separating matrix column elements in the array v.
Constraint: ${\mathbf{tdv}}\ge {\mathbf{m}}$.
14: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

## 6Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{tdr}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
The arguments must satisfy ${\mathbf{tdr}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{tdv}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdv}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{tdx}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_INT_ARG_LE
On entry, n must be greater than 1: ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
NE_INT_ARG_LT
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}\ge 1$.
NE_NEG_SX
On entry, at least one element of sx is negative.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_POS_SX
On entry, no element of sx is positive.
NE_SW_LT_ONE
On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

## 7Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

## 8Parallelism and Performance

g02bxc is not threaded in any implementation.

Correlation coefficients based on ranks can be computed using g02brc.

## 10Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

### 10.1Program Text

Program Text (g02bxce.c)

### 10.2Program Data

Program Data (g02bxce.d)

### 10.3Program Results

Program Results (g02bxce.r)