NAG C Library Function Document

nag_corr_cov (g02bxc) calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

2

Specification

#include <nag.h>

#include <nagg02.h>

void	nag_corr_cov (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError fail)

3

Description

For

n

observations on

m

variables the one-pass algorithm of West (1979) as implemented in nag_sum_sqs (g02buc) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for

p

selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:

(a) The means

{\bar{x}}_{j} = \frac{\sum_{i = 1}^{n} w_{i} x_{i j}}{\sum_{i = 1}^{n} w_{i}} j = 1, \dots, p

(b) The variance-covariance matrix

C_{j k} = \frac{\sum_{i = 1}^{n} w_{i} (x_{i j} - {\bar{x}}_{j}) (x_{i k} - {\bar{x}}_{k})}{\sum_{i = 1}^{n} w_{i} - 1} j, k = 1, \dots, p

s_{j} = \sqrt{C_{j j}} j = 1, \dots, p

(d) The Pearson product-moment correlation coefficients

R_{j k} = \frac{C_{j k}}{\sqrt{C_{j j} C_{k k}}} j, k = 1, \dots, p

where

x_{i j}

is the value of the

i

th observation on the

j

th variable and

w_{i}

is the weight for the

i

th observation which will be 1 in the unweighted case.

Note that the denominator for the variance-covariance is

\sum_{i = 1}^{n} w_{i} - 1

, so the weights should be scaled so that the sum of weights reflects the true sample size.

4

References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag

West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5

Arguments

1: $n$ – IntegerInput

On entry: the number of observations in the dataset,

n

Constraint:

n > 1

2: $m$ – IntegerInput

On entry: the total number of variables,

m

Constraint:

m \geq 1

3: $x [n \times tdx]$ – const doubleInput

On entry: the data

x [(i - 1) \times tdx + j - 1]

must contain the

i

th observation on the

j

th variable,

x_{i j}

, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

4: $tdx$ – IntegerInput

On entry: the stride separating matrix column elements in the array x.

Constraint:

tdx \geq m

5: $sx [m]$ – const IntegerInput

On entry: indicates which

p

variables to include in the analysis.

$sx [j - 1] > 0$: The $j$ th variable is to be included.
$sx [j - 1] = 0$: The $j$ th variable is not to be included.
sx is set to NULL: All variables are included in the analysis, i.e., $p = m$ .

Constraint:

sx [i] \geq 0

, for

i = 1, 2, \dots, m

6: $wt [n]$ – const doubleInput

On entry:

w

, the optional frequency weighting for each observation, with

wt [i - 1] = w_{i}

. Usually

w_{i}

will be an integral value corresponding to the number of observations associated with the

i

th data value, or zero if the

i

th data value is to be ignored. If wt is NULL then

w_{i}

is set to

1

for all

i

Constraint: if wt is not NULL,

\sum_{i = 1}^{n} wt [i - 1] > 1.0

wt [i - 1] \geq 0.0

, for

i = 1, 2, \dots, n

7: $sw$ – double *Output

On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations,

n

8: $wmean [m]$ – doubleOutput

On exit: the sample means.

wmean [j - 1]

contains the mean for the

j

th variable.

9: $std [m]$ – doubleOutput

On exit: the standard deviations.

std [j - 1]

contains the standard deviation for the

j

th variable.

10: $r [m \times tdr]$ – doubleOutput

On exit: the matrix of Pearson product-moment correlation coefficients.

r [(j - 1) \times tdr + k - 1]

contains the correlation between variables

j

and

k

, for

j, k = 1, \dots, p

11: $tdr$ – IntegerInput

On entry: the stride separating matrix column elements in the array r.

Constraint:

tdr \geq m

12: $v [m \times tdv]$ – doubleOutput

On exit: the variance-covariance matrix.

v [(j - 1) \times tdv + k - 1]

contains the covariance between variables

j

and

k

, for

j, k = 1, \dots, p

13: $tdv$ – IntegerInput

On entry: the stride separating matrix column elements in the array v.

Constraint:

tdv \geq m

14: $fail$ – NagError *Input/Output

The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6

Error Indicators and Warnings

NE_2_INT_ARG_LT: On entry, $tdr = 〈value〉$ while $m = 〈value〉$ .
The arguments must satisfy $tdr \geq m$ .

On entry, $tdv = 〈value〉$ while $m = 〈value〉$ . These arguments must satisfy $tdv \geq m$ .

On entry, $tdx = 〈value〉$ while $m = 〈value〉$ . These arguments must satisfy $tdx \geq m$ .
NE_ALLOC_FAIL: Dynamic memory allocation failed.
NE_INT_ARG_LE: On entry, n must be greater than 1: $n = 〈value〉$ .
NE_INT_ARG_LT: On entry, $m = 〈value〉$ .
Constraint: $m \geq 1$ .
NE_NEG_SX: On entry, at least one element of sx is negative.
NE_NEG_WEIGHT: On entry, at least one of the weights is negative.
NE_POS_SX: On entry, no element of sx is positive.
NE_SW_LT_ONE: On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO: A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

7

Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8

Parallelism and Performance

nag_corr_cov (g02bxc) is not threaded in any implementation.

9

Further Comments

Correlation coefficients based on ranks can be computed using nag_ken_spe_corr_coeff (g02brc).

10

Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

NAG C Library Function Document

nag_corr_cov (g02bxc)

▸▿ Contents

1 Purpose

2 Specification

3 Description

4 References

5 Arguments

6 Error Indicators and Warnings

7 Accuracy

8 Parallelism and Performance

9 Further Comments

10 Example

10.1 Program Text

10.2 Program Data

10.3 Program Results

1

Purpose

2

Specification

3

Description

4

References

5

Arguments

6

Error Indicators and Warnings

7

Accuracy

8

Parallelism and Performance

9

Further Comments

10

Example

10.1

Program Text

10.2

Program Data

10.3

Program Results