hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_mv_canon_var (g03ac)

Purpose

nag_mv_canon_var (g03ac) performs a canonical variate (canonical discrimination) analysis.

Syntax

[nig, cvm, e, ncv, cvx, irankx, ifail] = g03ac(weight, x, isx, nx, ing, ng, wt, tol, 'n', n, 'm', m)
[nig, cvm, e, ncv, cvx, irankx, ifail] = nag_mv_canon_var(weight, x, isx, nx, ing, ng, wt, tol, 'n', n, 'm', m)

Description

Let a sample of nn observations on nxnx variables in a data matrix come from ngng groups with n1,n2,,nngn1,n2,,nng observations in each group, ni = nni=n. Canonical variate analysis finds the linear combination of the nxnx variables that maximizes the ratio of between-group to within-group variation. The variables formed, the canonical variates can then be used to discriminate between groups.
The canonical variates can be calculated from the eigenvectors of the within-group sums of squares and cross-products matrix. However, nag_mv_canon_var (g03ac) calculates the canonical variates by means of a singular value decomposition (SVD) of a matrix VV. Let the data matrix with variable (column) means subtracted be XX, and let its rank be kk; then the kk by (ng1ng-1) matrix VV is given by:
V = QXT Qg ,
V = QXT Qg ,
where QgQg is an nn by (ng1)(ng-1) orthogonal matrix that defines the groups and QXQX is the first kk rows of the orthogonal matrix QQ either from the QRQR decomposition of XX:
X = QR
X=QR
if XX is of full column rank, i.e., k = nxk=nx, else from the SVD of XX:
X = QDPT .
X=QDPT .
Let the SVD of VV be:
V = Ux Δ UgT
V = Ux Δ UgT
then the nonzero elements of the diagonal matrix ΔΔ, δiδi, for i = 1,2,,li=1,2,,l, are the ll canonical correlations associated with the l = min (k,ng1) l = min(k,ng-1)  canonical variates, where l = min (k,ng) l = min(k,ng) .
The eigenvalues, λi2λi2, of the within-group sums of squares matrix are given by:
λi2 = (δi2)/(1δi2)
λi2=δi2 1-δi2
and the value of πi = λi2 / λi2πi=λi2/λi2 gives the proportion of variation explained by the iith canonical variate. The values of the πiπi's give an indication as to how many canonical variates are needed to adequately describe the data, i.e., the dimensionality of the problem.
To test for a significant dimensionality greater than ii the χ2χ2 statistic:
l
(n1ng(1/2)(kng))log(1 + λj2)
j = i + 1
( n-1-ng-12(k-ng) ) j=i+1 l log( 1 + λj2 )
can be used. This is asymptotically distributed as a χ2χ2-distribution with (ki)(ng1i)(k-i)(ng-1-i) degrees of freedom. If the test for i = hi=h is not significant, then the remaining tests for i > hi>h should be ignored.
The loadings for the canonical variates are calculated from the matrix UxUx. This matrix is scaled so that the canonical variates have unit within-group variance.
In addition to the canonical variates loadings the means for each canonical variate are calculated for each group.
Weights can be used with the analysis, in which case the weighted means are subtracted from each column and then each row is scaled by an amount sqrt(wi)wi, where wiwi is the weight for the iith observation (row).

References

Chatfield C and Collins A J (1980) Introduction to Multivariate Analysis Chapman and Hall
Gnanadesikan R (1977) Methods for Statistical Data Analysis of Multivariate Observations Wiley
Hammarling S (1985) The singular value decomposition in multivariate statistics SIGNUM Newsl. 20(3) 2–25
Kendall M G and Stuart A (1969) The Advanced Theory of Statistics (Volume 1) (3rd Edition) Griffin

Parameters

Compulsory Input Parameters

1:     weight – string (length ≥ 1)
Indicates if weights are to be used.
weight = 'U'weight='U'
No weights are used.
weight = 'W'weight='W' or 'V''V'
Weights are used and must be supplied in wt.
If weight = 'W'weight='W', the weights are treated as frequencies and the effective number of observations is the sum of the weights.
If weight = 'V'weight='V', the weights are treated as being inversely proportional to the variance of the observations and the effective number of observations is the number of observations with nonzero weights.
Constraint: weight = 'U'weight='U', 'W''W' or 'V''V'.
2:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnldxn.
x(i,j)xij must contain the iith observation for the jjth variable, for i = 1,2,,ni=1,2,,n and j = 1,2,,mj=1,2,,m.
3:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint mnxmnx.
isx(j)isxj indicates whether or not the jjth variable is to be included in the analysis.
If isx(j) > 0isxj>0, the variables contained in the jjth column of x is included in the canonical variate analysis, for j = 1,2,,mj=1,2,,m.
Constraint: isx(j) > 0isxj>0 for nx values of jj.
4:     nx – int64int32nag_int scalar
The number of variables in the analysis, nxnx.
Constraint: nx1nx1.
5:     ing(n) – int64int32nag_int array
n, the dimension of the array, must satisfy the constraint nnx + ngnnx+ng.
ing(i)ingi indicates which group the iith observation is in, for i = 1,2,,ni=1,2,,n. The effective number of groups is the number of groups with nonzero membership.
Constraint: 1ing(i)ng1inging, for i = 1,2,,ni=1,2,,n.
6:     ng – int64int32nag_int scalar
The number of groups, ngng.
Constraint: ng2ng2.
7:     wt( : :) – double array
Note: the dimension of the array wt must be at least nn if weight = 'W'weight='W' or 'V''V', and at least 11 otherwise.
If weight = 'W'weight='W' or 'V''V', the first nn elements of wt must contain the weights to be used in the analysis.
If wt(i) = 0.0wti=0.0, the iith observation is not included in the analysis.
If weight = 'U'weight='U', wt is not referenced.
Constraints:
  • wt(i)0.0wti0.0, for i = 1,2,,ni=1,2,,n;
  • 1nwt(i)nx + effective number of groups1nwtinx+effective number of groups.
8:     tol – double scalar
The value of tol is used to decide if the variables are of full rank and, if not, what is the rank of the variables. The smaller the value of tol the stricter the criterion for selecting the singular value decomposition. If a non-negative value of tol less than machine precision is entered, the square root of machine precision is used instead.
Constraint: tol0.0tol0.0.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array ing and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
nn, the number of observations.
Constraint: nnx + ngnnx+ng.
2:     m – int64int32nag_int scalar
Default: The dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
mm, the total number of variables.
Constraint: mnxmnx.

Input Parameters Omitted from the MATLAB Interface

ldx ldcvm lde ldcvx wk iwk

Output Parameters

1:     nig(ng) – int64int32nag_int array
nig(j)nigj gives the number of observations in group jj, for j = 1,2,,ngj=1,2,,ng.
2:     cvm(ldcvm,nx) – double array
ldcvmngldcvmng.
cvm(i,j)cvmij contains the mean of the jjth canonical variate for the iith group, for i = 1,2,,ngi=1,2,,ng and j = 1,2,,lj=1,2,,l; the remaining columns, if any, are used as workspace.
3:     e(lde,66) – double array
ldemin (nx,ng1)ldemin(nx,ng-1).
The statistics of the canonical variate analysis.
e(i,1)ei1
The canonical correlations, δiδi, for i = 1,2,,li=1,2,,l.
e(i,2)ei2
The eigenvalues of the within-group sum of squares matrix, λi2λi2, for i = 1,2,,li=1,2,,l.
e(i,3)ei3
The proportion of variation explained by the iith canonical variate, for i = 1,2,,li=1,2,,l.
e(i,4)ei4
The χ2χ2 statistic for the iith canonical variate, for i = 1,2,,li=1,2,,l.
e(i,5)ei5
The degrees of freedom for χ2χ2 statistic for the iith canonical variate, for i = 1,2,,li=1,2,,l.
e(i,6)ei6
The significance level for the χ2χ2 statistic for the iith canonical variate, for i = 1,2,,li=1,2,,l.
4:     ncv – int64int32nag_int scalar
The number of canonical variates, ll. This will be the minimum of ng1ng-1 and the rank of x.
5:     cvx(ldcvx,ng1ng-1) – double array
ldcvxnxldcvxnx.
The canonical variate loadings. cvx(i,j)cvxij contains the loading coefficient for the iith variable on the jjth canonical variate, for i = 1,2,,nxi=1,2,,nx and j = 1,2,,lj=1,2,,l; the remaining columns, if any, are used as workspace.
6:     irankx – int64int32nag_int scalar
The rank of the dependent variables.
If the variables are of full rank then irankx = nxirankx=nx.
If the variables are not of full rank then irankx is an estimate of the rank of the dependent variables. irankx is calculated as the number of singular values greater than tol × (largest singular value)tol×(largest singular value).
7:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

  ifail = 1ifail=1
On entry,nx < 1nx<1,
orng < 2ng<2,
orm < nxm<nx,
orn < nx + ngn<nx+ng,
orldx < nldx<n,
orldcvx < nxldcvx<nx,
orldcvm < ngldcvm<ng,
or lde < min (nx,ng1) lde < min(nx,ng-1) ,
ornxng1nxng-1 and iwk < n × nx + max (5 × (nx1) + (nx + 1) × nx,n) iwk < n × nx + max(5×(nx-1)+(nx+1)×nx,n) ,
ornx < ng1nx<ng-1 and iwk < n × nx + max (5 × (nx1) + (ng1) × nx,n) iwk < n × nx + max(5×(nx-1)+(ng-1)×nx,n) ,
orweight'U'weight'U', 'W''W' or 'V''V',
ortol < 0.0tol<0.0.
  ifail = 2ifail=2
On entry,weight = 'W'weight='W' or 'V''V' and a value of wt < 0.0wt<0.0.
  ifail = 3ifail=3
On entry,a value of ing < 1ing<1,
ora value of ing > nging>ng.
  ifail = 4ifail=4
On entry, the number of variables to be included in the analysis as indicated by isx is not equal to nx.
  ifail = 5ifail=5
A singular value decomposition has failed to converge. This is an unlikely error exit.
W ifail = 6ifail=6
A canonical correlation is equal to 11. This will happen if the variables provide an exact indication as to which group every observation is allocated.
  ifail = 7ifail=7
On entry,less than two groups have nonzero membership, i.e., the effective number of groups is less than 22,
orthe effective number of groups plus the number of variables, nx, is greater than the effective number of observations.
W ifail = 8ifail=8
The rank of the variables is 00. This will happen if all the variables are constants.

Accuracy

As the computation involves the use of orthogonal matrices and a singular value decomposition rather than the traditional computing of a sum of squares matrix and the use of an eigenvalue decomposition, nag_mv_canon_var (g03ac) should be less affected by ill-conditioned problems.

Further Comments

None.

Example

function nag_mv_canon_var_example
weight = 'U';
x = [13.3, 10.6, 21.2;
     13.6, 10.2, 21;
     14.2, 10.7, 21.1;
     13.4, 9.4, 21;
     13.2, 9.6, 20.1;
     13.9, 10.4, 19.8;
     12.9, 10, 20.5;
     12.2, 9.9, 20.7;
     13.9, 11, 19.1];
isx = [int64(1);1;1];
nx = int64(3);
ing = [int64(1);2;3;1;2;3;1;2;3];
ng = int64(3);
wt = [];
tol = 1e-06;
[nig, cvm, e, ncv, cvx, irankx, ifail] = nag_mv_canon_var(weight, x, isx, nx, ing, ng, wt, tol)
 

nig =

                    3
                    3
                    3


cvm =

    0.9841    0.2797   -0.1653
    1.1805   -0.2632    0.0177
   -2.1646   -0.0164    0.1476


e =

    0.8826    3.5238    0.9795    7.9032    6.0000    0.2453
    0.2623    0.0739    0.0205    0.3564    2.0000    0.8368


ncv =

                    2


cvx =

   -1.7070    0.7277
   -1.3481    0.3138
    0.9327    1.2199


irankx =

                    3


ifail =

                    0


function g03ac_example
weight = 'U';
x = [13.3, 10.6, 21.2;
     13.6, 10.2, 21;
     14.2, 10.7, 21.1;
     13.4, 9.4, 21;
     13.2, 9.6, 20.1;
     13.9, 10.4, 19.8;
     12.9, 10, 20.5;
     12.2, 9.9, 20.7;
     13.9, 11, 19.1];
isx = [int64(1);1;1];
nx = int64(3);
ing = [int64(1);2;3;1;2;3;1;2;3];
ng = int64(3);
wt = [];
tol = 1e-06;
[nig, cvm, e, ncv, cvx, irankx, ifail] = g03ac(weight, x, isx, nx, ing, ng, wt, tol)
 

nig =

                    3
                    3
                    3


cvm =

    0.9841    0.2797   -0.1653
    1.1805   -0.2632    0.0177
   -2.1646   -0.0164    0.1476


e =

    0.8826    3.5238    0.9795    7.9032    6.0000    0.2453
    0.2623    0.0739    0.0205    0.3564    2.0000    0.8368


ncv =

                    2


cvx =

   -1.7070    0.7277
   -1.3481    0.3138
    0.9327    1.2199


irankx =

                    3


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013