nag_corr_cov (g02bxc) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

NAG Library Function Document

nag_corr_cov (g02bxc)

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

nag_corr_cov (g02bxc) calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

2  Specification

#include <nag.h>
#include <nagg02.h>
void  nag_corr_cov (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)

3  Description

For n  observations on m  variables a one-pass updating algorithm (see West (1979)) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for p  selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
x - j = i=1 n w i x ij i=1 n w i j = 1 , , p
(b) The variance-covariance matrix
C jk = i=1 n w i x ij - x - j x ik - x - k i=1 n w i - 1 j , k = 1 , , p
(c) The standard deviations
s j = C jj j = 1 , , p
(d) The Pearson product-moment correlation coefficients
R jk = C jk C jj C kk j , k = 1 , , p
where x ij  is the value of the i th observation on the j th variable and w i  is the weight for the i th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is i=1 n w i - 1 , so the weights should be scaled so that the sum of weights reflects the true sample size.

4  References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5  Arguments

1:     nIntegerInput
On entry: the number of observations in the dataset, n .
Constraint: n>1 .
2:     mIntegerInput
On entry: the total number of variables, m .
Constraint: m1 .
3:     x[n×tdx]const doubleInput
On entry: the data x[i-1×tdx+j-1]  must contain the i th observation on the j th variable, x ij , for i=1,2,,n and j=1,2,,m.
4:     tdxIntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: tdxm .
5:     sx[m]const IntegerInput
On entry: indicates which p  variables to include in the analysis.
sx[j-1] > 0
The j th variable is to be included.
sx[j-1] = 0
The j th variable is not to be included.
sx is set to the null pointer (Integer *)0
All variables are included in the analysis, i.e., p=m .
Constraint: sx[i] 0 , for i=1,2,,m.
6:     wt[n]const doubleInput
On entry: the optional frequency weighting for each observation. wt[i-1]  contains the weight for the i th data value. Usually wt[i-1]  will be an integral value corresponding to the number of observations associated with the i th data value, or zero if the i th data value is to be ignored. If wt is set to the null pointer (double *)0 then wt is not referenced.
Constraint: wt[i-1] 0.0 , for i=1,2,,n.
7:     swdouble *Output
On exit: the sum of weights if wt is not the null pointer, otherwise sw contains the number of observations, n .
8:     wmean[m]doubleOutput
On exit: the sample means. wmean[j-1]  contains the mean for the j th variable.
9:     std[m]doubleOutput
On exit: the standard deviations. std[j-1]  contains the standard deviation for the j th variable.
10:   r[m×tdr]doubleOutput
On exit: the matrix of Pearson product-moment correlation coefficients. r[j-1×tdr+k-1]  contains the correlation between variables j  and k , for j , k = 1 , , p .
11:   tdrIntegerInput
On entry: the stride separating matrix column elements in the array r.
Constraint: tdrm .
12:   v[m×tdv]doubleOutput
On exit: the variance-covariance matrix. v[j-1×tdv+k-1]  contains the covariance between variables j  and k , for j , k = 1 , , p .
13:   tdvIntegerInput
On entry: the stride separating matrix column elements in the array v.
Constraint: tdvm .
14:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, tdr=value  while m=value .
The arguments must satisfy tdrm .
On entry, tdv=value  while m=value . These arguments must satisfy tdvm .
On entry, tdx=value  while m=value . These arguments must satisfy tdxm .
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_INT_ARG_LE
On entry, n must be greater than 1: n=value .
NE_INT_ARG_LT
On entry, m=value.
Constraint: m1.
NE_NEG_SX
On entry, at least one element of sx is negative.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_POS_SX
On entry, no element of sx is positive.
NE_SW_LT_ONE
On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

7  Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8  Further Comments

Correlation coefficients based on ranks can be computed using nag_ken_spe_corr_coeff (g02brc).

9  Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

9.1  Program Text

Program Text (g02bxce.c)

9.2  Program Data

Program Data (g02bxce.d)

9.3  Program Results

Program Results (g02bxce.r)


nag_corr_cov (g02bxc) (PDF version)
g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012