NAG CL Interface
g03adc (canon_​corr)

1 Purpose

g03adc performs canonical correlation analysis upon input data matrices.

2 Specification

#include <nag.h>
void  g03adc (Integer n, Integer m, const double z[], Integer tdz, const Integer isz[], Integer nx, Integer ny, const double wt[], double e[], Integer tde, Integer *ncv, double cvx[], Integer tdcvx, double cvy[], Integer tdcvy, double tol, NagError *fail)
The function may be called by the names: g03adc or nag_mv_canon_corr.

3 Description

Let there be two sets of variables, x and y . For a sample of n observations on n x variables in a data matrix X and n y variables in a data matrix Y , canonical correlation analysis seeks to find a small number of linear combinations of each set of variables in order to explain or summarise the relationships between them. The variables thus formed are known as canonical variates.
Let the variance-covariance matrix of the two datasets be
S xx S xy S yx S yy  
and let
Σ = S yy -1 S yx S xx -1 S xy  
then the canonical correlations can be calculated from the eigenvalues of the matrix Σ . However, g03adc calculates the canonical correlations by means of a singular value decomposition (SVD) of a matrix V . If the rank of the data matrix X is k x and the rank of the data matrix Y is k y , and both X and Y have had variable (column) means subtracted, then the k x by k y matrix V is given by:
V = QxT Q y ,  
where Q x is the first k x rows of the orthogonal matrix Q either from the QR decomposition of X if X is of full column rank, i.e., k x = n x :
X = Q x R x  
or from the SVD of X if k x < n x :
X = Q x D x PxT .  
Similarly Q y is the first k y rows of the orthogonal matrix Q either from the QR decomposition of Y if Y is of full column rank, i.e., k y = n y :
Y = Q y R y  
or from the SVD of Y if k y < n y :
Y = Q y D y PyT .  
Let the SVD of V be:
V = U x Δ UyT  
then the nonzero elements of the diagonal matrix Δ , δ i , for i=1,2,,l, are the l canonical correlations associated with the l canonical variates, where l = min k x , k y .
The eigenvalues, λ i 2 , of the matrix Σ are given by:
λ i 2 = δ i 2 1 + δ i 2 .  
The value of π i = λ i 2 / λ i 2 gives the proportion of variation explained by the i th canonical variate. The values of the π i give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than i the χ 2 statistic:
n - 1 2 k x + k y + 3 j = i + 1 l log 1 + λ j 2  
can be used. This is asymptotically distributed as a χ 2 distribution with k x - i k y - i degrees of freedom. If the test for i = k min is not significant, then the remaining tests for i > k min should be ignored.
The loadings for the canonical variates are calculated from the matrices U x and U y respectively. These matrices are scaled so that the canonical variates have unit variance.

4 References

Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

5 Arguments

1: n Integer Input
On entry: the number of observations, n .
Constraint: n > nx + ny .
2: m Integer Input
On entry: the total number of variables, m .
Constraint: m nx + ny .
3: z[n×tdz] const double Input
On entry: z[i-1×tdz+j-1] must contain the i th observation for the j th variable, for i=1,2,,n and j=1,2,,m.
Both x and y variables are to be included in z, the indicator array, isz, being used to assign the variables in z to the x or y sets as appropriate.
4: tdz Integer Input
On entry: the stride separating matrix column elements in the array z.
Constraint: tdzm .
5: isz[m] const Integer Input
On entry: isz[j-1] indicates whether or not the j th variable is to be included in the analysis and to which set of variables it belongs.
If isz[j-1] > 0 , then the variable contained in the j th column of z is included as an x variable in the analysis.
If isz[j-1] < 0 , then the variable contained in the j th column of z is included as a y variable in the analysis.
If isz[j-1] = 0 , then the variable contained in the j th column of z is not included in the analysis.
Constraint: only nx elements of isz can be > 0 and only ny elements of isz can be < 0 .
6: nx Integer Input
On entry: the number of x variables in the analysis, n x .
Constraint: nx1 .
7: ny Integer Input
On entry: the number of y variables in the analysis, n y .
Constraint: ny1 .
8: wt[n] const double Input
On entry: the elements of wt must contain the weights to be used in the analysis. The effective number of observations is the sum of the weights. If wt[i-1] = 0.0 then the i th observation is not included in the analysis.
If weights are not provided then wt must be set to NULL and the effective number of observations is n.
Constraints:
  • if wt is not NULL, wt[i-1] 0.0 , for i=1,2,,n;
  • i=1 n wt[i-1] nx + ny + 1 .
9: e[minnx,ny×tde] double Output
On exit: the statistics of the canonical variate analysis. e[i-1×tde] , the canonical correlations, δ i , for i=1,2,,l.
e[i-1×tde+1] , the eigenvalues of Σ , λ i 2 , for i=1,2,,l.
e[i-1×tde+2] , the proportion of variation explained by the i th canonical variate, for i=1,2,,l.
e[i-1×tde+3] , the χ 2 statistic for the i th canonical variate, for i=1,2,,l.
e[i-1×tde+4] , the degrees of freedom for χ 2 statistic for the i th canonical variate, for i=1,2,,l.
e[i-1×tde+5] , the significance level for the χ 2 statistic for the i th canonical variate, for i=1,2,,l.
10: tde Integer Input
On entry: the stride separating matrix column elements in the array e.
Constraint: tde6 .
11: ncv Integer * Output
On exit: the number of canonical correlations, l . This will be the minimum of the rank of X and the rank of Y .
12: cvx[nx×tdcvx] double Output
On exit: the canonical variate loadings for the x variables. cvx[i-1×tdcvx+j-1] contains the loading coefficient for the i th x variable on the j th canonical variate.
13: tdcvx Integer Input
On entry: the stride separating matrix column elements in the array cvx.
Constraint: tdcvxmin (nx,ny).
14: cvy[ny×tdcvy] double Output
On exit: the canonical variate loadings for the y variables. cvy[i-1×tdcvy+j-1] contains the loading coefficient for the i th y variable on the j th canonical variate.
15: tdcvy Integer Input
On entry: the stride separating matrix column elements in the array cvy.
Constraint: tdcvymin (nx,ny).
16: tol double Input
On entry: the value of tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of tol the stricter the criterion for selecting the singular value decomposition. If a non-negative value of tol less than machine precision is entered, then the square root of machine precision is used instead.
Constraint: tol0.0 .
17: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, tdz=value while m=value . These arguments must satisfy tdzm .
NE_3_INT_ARG_CONS
On entry, m=value , nx=value and ny=value . These arguments must satisfy m nx + ny .
On entry, n=value , nx=value and ny=value . These arguments must satisfy n > nx + ny .
On entry, tdcvx=value , nx=value and ny=value . These arguments must satisfy tdcvxmin (nx,ny).
On entry, tdcvy=value , nx=value and ny=value . These arguments must satisfy tdcvymin (nx,ny).
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_CANON_CORR_1
A canonical correlation is equal to one. This will happen if the x and y variables are perfectly correlated.
NE_INT_ARG_LT
On entry, nx=value.
Constraint: nx1.
On entry, ny=value.
Constraint: ny1.
On entry, tde=value.
Constraint: tde6.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_MAT_RANK_ZERO
The rank of the X matrix or the rank of the Y matrix is zero. This will happen if all the x and y variables are constants.
NE_NEG_WEIGHT_ELEMENT
On entry, wt[value] = value.
Constraint: when referenced, all elements of wt must be non-negative.
NE_OBSERV_LT_VAR
With weighted data, the effective number of observations given by the sum of weights =value , while number of variables included in the analysis, nx + ny = value.
Constraint: Effective number of observations nx + ny + 1 .
NE_REAL_ARG_LT
On entry, tol must not be less than 0.0 : tol=value .
NE_SVD_NOT_CONV
The singular value decomposition has failed to converge. This is an unlikely error exit.
NE_VAR_INCL_INDICATED
The number of variables, nx in the analysis =value , while the number of x variables included in the analysis via array isz=value .
Constraint: these two numbers must be the same.
The number of variables, ny in the analysis =value , while the number of y variables included in the analysis via array isz=value .
Constraint: these two numbers must be the same.

7 Accuracy

As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, g03adc should be less affected by ill conditioned problems.

8 Parallelism and Performance

g03adc is not threaded in any implementation.

9 Further Comments

None.

10 Example

A sample of nine observations with two variables in each set is read in. The second and third variables are x variables while the first and last are y variables. Canonical variate analysis is performed and the results printed.

10.1 Program Text

Program Text (g03adce.c)

10.2 Program Data

Program Data (g03adce.d)

10.3 Program Results

Program Results (g03adce.r)