NAG Library Routine Document
g03acf (canon_var)
1
Purpose
g03acf performs a canonical variate (canonical discrimination) analysis.
2
Specification
Fortran Interface
Subroutine g03acf ( 
weight, n, m, x, ldx, isx, nx, ing, ng, wt, nig, cvm, ldcvm, e, lde, ncv, cvx, ldcvx, tol, irankx, wk, iwk, ifail) 
Integer, Intent (In)  ::  n, m, ldx, isx(m), nx, ing(n), ng, ldcvm, lde, ldcvx, iwk  Integer, Intent (Inout)  ::  ifail  Integer, Intent (Out)  ::  nig(ng), ncv, irankx  Real (Kind=nag_wp), Intent (In)  ::  x(ldx,m), wt(*), tol  Real (Kind=nag_wp), Intent (Inout)  ::  cvm(ldcvm,nx), e(lde,6), cvx(ldcvx,ng1)  Real (Kind=nag_wp), Intent (Out)  ::  wk(iwk)  Character (1), Intent (In)  ::  weight 

C Header Interface
#include nagmk26.h
void 
g03acf_ (const char *weight, const Integer *n, const Integer *m, const double x[], const Integer *ldx, const Integer isx[], const Integer *nx, const Integer ing[], const Integer *ng, const double wt[], Integer nig[], double cvm[], const Integer *ldcvm, double e[], const Integer *lde, Integer *ncv, double cvx[], const Integer *ldcvx, const double *tol, Integer *irankx, double wk[], const Integer *iwk, Integer *ifail, const Charlen length_weight) 

3
Description
Let a sample of $n$ observations on ${n}_{x}$ variables in a data matrix come from ${n}_{g}$ groups with ${n}_{1},{n}_{2},\dots ,{n}_{{n}_{g}}$ observations in each group, $\sum {n}_{i}=n$. Canonical variate analysis finds the linear combination of the ${n}_{x}$ variables that maximizes the ratio of betweengroup to withingroup variation. The variables formed, the canonical variates can then be used to discriminate between groups.
The canonical variates can be calculated from the eigenvectors of the withingroup sums of squares and crossproducts matrix. However,
g03acf calculates the canonical variates by means of a singular value decomposition (SVD) of a matrix
$V$. Let the data matrix with variable (column) means subtracted be
$X$, and let its rank be
$k$; then the
$k$ by (
${n}_{g}1$) matrix
$V$ is given by:
where
${Q}_{g}$ is an
$n$ by
$\left({n}_{g}1\right)$ orthogonal matrix that defines the groups and
${Q}_{X}$ is the first
$k$ rows of the orthogonal matrix
$Q$ either from the
$QR$ decomposition of
$X$:
if
$X$ is of full column rank, i.e.,
$k={n}_{x}$, else from the SVD of
$X$:
Let the SVD of
$V$ be:
then the nonzero elements of the diagonal matrix
$\Delta $,
${\delta}_{\mathit{i}}$, for
$\mathit{i}=1,2,\dots ,l$, are the
$l$ canonical correlations associated with the
$l=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(k,{n}_{g}1\right)$ canonical variates, where
$l=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(k,{n}_{g}\right)$.
The eigenvalues,
${\lambda}_{i}^{2}$, of the withingroup sums of squares matrix are given by:
and the value of
${\pi}_{i}={\lambda}_{i}^{2}/\sum {\lambda}_{i}^{2}$ gives the proportion of variation explained by the
$i$th canonical variate. The values of the
${\pi}_{i}$'s give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than
$i$ the
${\chi}^{2}$ statistic:
can be used. This is asymptotically distributed as a
${\chi}^{2}$distribution with
$\left(ki\right)\left({n}_{g}1i\right)$ degrees of freedom. If the test for
$i=h$ is not significant, then the remaining tests for
$i>h$ should be ignored.
The loadings for the canonical variates are calculated from the matrix ${U}_{x}$. This matrix is scaled so that the canonical variates have unit withingroup variance.
In addition to the canonical variates loadings the means for each canonical variate are calculated for each group.
Weights can be used with the analysis, in which case the weighted means are subtracted from each column and then each row is scaled by an amount $\sqrt{{w}_{i}}$, where ${w}_{i}$ is the weight for the $i$th observation (row).
4
References
Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Gnanadesikan R (1977) Methods for Statistical Data Analysis of Multivariate Observations Wiley
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Kendall M G and Stuart A (1969) The Advanced Theory of Statistics (Volume 1) (3rd Edition) Griffin
5
Arguments
 1: $\mathbf{weight}$ – Character(1)Input

On entry: indicates if weights are to be used.
 ${\mathbf{weight}}=\text{'U'}$
 No weights are used.
 ${\mathbf{weight}}=\text{'W'}$ or $\text{'V'}$
 Weights are used and must be supplied in wt.
If ${\mathbf{weight}}=\text{'W'}$, the weights are treated as frequencies and the effective number of observations is the sum of the weights.
If ${\mathbf{weight}}=\text{'V'}$, the weights are treated as being inversely proportional to the variance of the observations and the effective number of observations is the number of observations with nonzero weights.
Constraint:
${\mathbf{weight}}=\text{'U'}$, $\text{'W'}$ or $\text{'V'}$.
 2: $\mathbf{n}$ – IntegerInput

On entry: $n$, the number of observations.
Constraint:
${\mathbf{n}}\ge {\mathbf{nx}}+{\mathbf{ng}}$.
 3: $\mathbf{m}$ – IntegerInput

On entry: $m$, the total number of variables.
Constraint:
${\mathbf{m}}\ge {\mathbf{nx}}$.
 4: $\mathbf{x}\left({\mathbf{ldx}},{\mathbf{m}}\right)$ – Real (Kind=nag_wp) arrayInput

On entry: ${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
 5: $\mathbf{ldx}$ – IntegerInput

On entry: the first dimension of the array
x as declared in the (sub)program from which
g03acf is called.
Constraint:
${\mathbf{ldx}}\ge {\mathbf{n}}$.
 6: $\mathbf{isx}\left({\mathbf{m}}\right)$ – Integer arrayInput

On entry:
${\mathbf{isx}}\left(j\right)$ indicates whether or not the
$j$th variable is to be included in the analysis.
If
${\mathbf{isx}}\left(\mathit{j}\right)>0$, the variables contained in the
$\mathit{j}$th column of
x is included in the canonical variate analysis, for
$\mathit{j}=1,2,\dots ,m$.
Constraint:
${\mathbf{isx}}\left(j\right)>0$ for
nx values of
$j$.
 7: $\mathbf{nx}$ – IntegerInput

On entry: the number of variables in the analysis, ${n}_{x}$.
Constraint:
${\mathbf{nx}}\ge 1$.
 8: $\mathbf{ing}\left({\mathbf{n}}\right)$ – Integer arrayInput

On entry: ${\mathbf{ing}}\left(\mathit{i}\right)$ indicates which group the $\mathit{i}$th observation is in, for $\mathit{i}=1,2,\dots ,n$. The effective number of groups is the number of groups with nonzero membership.
Constraint:
$1\le {\mathbf{ing}}\left(\mathit{i}\right)\le {\mathbf{ng}}$, for $\mathit{i}=1,2,\dots ,n$.
 9: $\mathbf{ng}$ – IntegerInput

On entry: the number of groups, ${n}_{g}$.
Constraint:
${\mathbf{ng}}\ge 2$.
 10: $\mathbf{wt}\left(*\right)$ – Real (Kind=nag_wp) arrayInput

Note: the dimension of the array
wt
must be at least
${\mathbf{n}}$ if
${\mathbf{weight}}=\text{'W'}$ or
$\text{'V'}$, and at least
$1$ otherwise.
On entry: if
${\mathbf{weight}}=\text{'W'}$ or
$\text{'V'}$, the first
$n$ elements of
wt must contain the weights to be used in the analysis.
If ${\mathbf{wt}}\left(i\right)=0.0$, the $i$th observation is not included in the analysis.
If
${\mathbf{weight}}=\text{'U'}$,
wt is not referenced.
Constraints:
 ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$;
 $\sum _{1}^{n}}{\mathbf{wt}}\left(i\right)\ge {\mathbf{nx}}+\text{effective number of groups$.
 11: $\mathbf{nig}\left({\mathbf{ng}}\right)$ – Integer arrayOutput

On exit: ${\mathbf{nig}}\left(\mathit{j}\right)$ gives the number of observations in group $\mathit{j}$, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
 12: $\mathbf{cvm}\left({\mathbf{ldcvm}},{\mathbf{nx}}\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: ${\mathbf{cvm}}\left(\mathit{i},\mathit{j}\right)$ contains the mean of the $\mathit{j}$th canonical variate for the $\mathit{i}$th group, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=1,2,\dots ,l$; the remaining columns, if any, are used as workspace.
 13: $\mathbf{ldcvm}$ – IntegerInput

On entry: the first dimension of the array
cvm as declared in the (sub)program from which
g03acf is called.
Constraint:
${\mathbf{ldcvm}}\ge {\mathbf{ng}}$.
 14: $\mathbf{e}\left({\mathbf{lde}},6\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the statistics of the canonical variate analysis.
 ${\mathbf{e}}\left(i,1\right)$
 The canonical correlations,
${\delta}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,l$.
 ${\mathbf{e}}\left(i,2\right)$
 The eigenvalues of the withingroup sum of squares matrix,
${\lambda}_{\mathit{i}}^{2}$, for $\mathit{i}=1,2,\dots ,l$.
 ${\mathbf{e}}\left(i,3\right)$
 The proportion of variation explained by the
$\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
 ${\mathbf{e}}\left(i,4\right)$
 The ${\chi}^{2}$ statistic for the
$\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
 ${\mathbf{e}}\left(i,5\right)$
 The degrees of freedom for ${\chi}^{2}$ statistic for the
$\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
 ${\mathbf{e}}\left(i,6\right)$
 The significance level for the ${\chi}^{2}$ statistic for the
$\mathit{i}$th canonical variate, for $\mathit{i}=1,2,\dots ,l$.
 15: $\mathbf{lde}$ – IntegerInput

On entry: the first dimension of the array
e as declared in the (sub)program from which
g03acf is called.
Constraint:
${\mathbf{lde}}\ge \mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ng}}1\right)$.
 16: $\mathbf{ncv}$ – IntegerOutput

On exit: the number of canonical variates,
$l$. This will be the minimum of
${n}_{g}1$ and the rank of
x.
 17: $\mathbf{cvx}\left({\mathbf{ldcvx}},{\mathbf{ng}}1\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the canonical variate loadings.
${\mathbf{cvx}}\left(\mathit{i},\mathit{j}\right)$ contains the loading coefficient for the $\mathit{i}$th variable on the $\mathit{j}$th canonical variate, for $\mathit{i}=1,2,\dots ,{n}_{x}$ and $\mathit{j}=1,2,\dots ,l$; the remaining columns, if any, are used as workspace.
 18: $\mathbf{ldcvx}$ – IntegerInput

On entry: the first dimension of the array
cvx as declared in the (sub)program from which
g03acf is called.
Constraint:
${\mathbf{ldcvx}}\ge {\mathbf{nx}}$.
 19: $\mathbf{tol}$ – Real (Kind=nag_wp)Input

On entry: the value of
tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of
tol the stricter the criterion for selecting the singular value decomposition. If a nonnegative value of
tol less than
machine precision is entered, the square root of
machine precision is used instead.
Constraint:
${\mathbf{tol}}\ge 0.0$.
 20: $\mathbf{irankx}$ – IntegerOutput

On exit: the rank of the dependent variables.
If the variables are of full rank then ${\mathbf{irankx}}={\mathbf{nx}}$.
If the variables are not of full rank then
irankx is an estimate of the rank of the dependent variables.
irankx is calculated as the number of singular values greater than
${\mathbf{tol}}\times \text{(largest singular value)}$.
 21: $\mathbf{wk}\left({\mathbf{iwk}}\right)$ – Real (Kind=nag_wp) arrayWorkspace
 22: $\mathbf{iwk}$ – IntegerInput

On entry: the dimension of the array
wk as declared in the (sub)program from which
g03acf is called.
Constraints:
 if ${\mathbf{nx}}\ge {\mathbf{ng}}1$, ${\mathbf{iwk}}\ge {\mathbf{n}}\times {\mathbf{nx}}+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(5\times \left({\mathbf{nx}}1\right)+\left({\mathbf{nx}}+1\right)\times {\mathbf{nx}},{\mathbf{n}}\right)+1$;
 if ${\mathbf{nx}}<{\mathbf{ng}}1$, ${\mathbf{iwk}}\ge {\mathbf{n}}\times {\mathbf{nx}}+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(5\times \left({\mathbf{nx}}1\right)+\left({\mathbf{ng}}1\right)\times {\mathbf{nx}},{\mathbf{n}}\right)+1$.
 23: $\mathbf{ifail}$ – IntegerInput/Output

On entry:
ifail must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this argument you should refer to
Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{ or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=1$

On entry,  ${\mathbf{nx}}<1$, 
or  ${\mathbf{ng}}<2$, 
or  ${\mathbf{m}}<{\mathbf{nx}}$, 
or  ${\mathbf{n}}<{\mathbf{nx}}+{\mathbf{ng}}$, 
or  ${\mathbf{ldx}}<{\mathbf{n}}$, 
or  ${\mathbf{ldcvx}}<{\mathbf{nx}}$, 
or  ${\mathbf{ldcvm}}<{\mathbf{ng}}$, 
or  ${\mathbf{lde}}<\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{nx}},{\mathbf{ng}}1\right)$, 
or  ${\mathbf{nx}}\ge {\mathbf{ng}}1$ and ${\mathbf{iwk}}<{\mathbf{n}}\times {\mathbf{nx}}+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(5\times \left({\mathbf{nx}}1\right)+\left({\mathbf{nx}}+1\right)\times {\mathbf{nx}},{\mathbf{n}}\right)$, 
or  ${\mathbf{nx}}<{\mathbf{ng}}1$ and ${\mathbf{iwk}}<{\mathbf{n}}\times {\mathbf{nx}}+\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(5\times \left({\mathbf{nx}}1\right)+\left({\mathbf{ng}}1\right)\times {\mathbf{nx}},{\mathbf{n}}\right)$, 
or  ${\mathbf{weight}}\ne \text{'U'}$, $\text{'W'}$ or $\text{'V'}$, 
or  ${\mathbf{tol}}<0.0$. 
 ${\mathbf{ifail}}=2$

On entry,  ${\mathbf{weight}}=\text{'W'}$ or $\text{'V'}$ and a value of ${\mathbf{wt}}<0.0$. 
 ${\mathbf{ifail}}=3$

On entry,  a value of ${\mathbf{ing}}<1$, 
or  a value of ${\mathbf{ing}}>{\mathbf{ng}}$. 
 ${\mathbf{ifail}}=4$

On entry, the number of variables to be included in the analysis as indicated by
isx is not equal to
nx.
 ${\mathbf{ifail}}=5$

A singular value decomposition has failed to converge. This is an unlikely error exit.
 ${\mathbf{ifail}}=6$

A canonical correlation is equal to $1$. This will happen if the variables provide an exact indication as to which group every observation is allocated.
 ${\mathbf{ifail}}=7$

On entry,  less than two groups have nonzero membership, i.e., the effective number of groups is less than $2$, 
or  the effective number of groups plus the number of variables, nx, is greater than the effective number of observations. 
 ${\mathbf{ifail}}=8$

The rank of the variables is $0$. This will happen if all the variables are constants.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 3.9 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 3.8 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 3.7 in How to Use the NAG Library and its Documentation for further information.
7
Accuracy
As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, g03acf should be less affected by illconditioned problems.
8
Parallelism and Performance
g03acf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g03acf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
None.
10
Example
This example uses a sample of nine observations, each consisting of three variables plus a group indicator. There are three groups. An unweighted canonical variate analysis is performed and the results printed.
10.1
Program Text
Program Text (g03acfe.f90)
10.2
Program Data
Program Data (g03acfe.d)
10.3
Program Results
Program Results (g03acfe.r)