hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_mv_discrim (g03da)

Purpose

nag_mv_discrim (g03da) computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.

Syntax

[nig, gmn, det, gc, stat, df, sig, ifail] = g03da(x, isx, nvar, ing, ng, 'n', n, 'm', m, 'wt', wt)
[nig, gmn, det, gc, stat, df, sig, ifail] = nag_mv_discrim(x, isx, nvar, ing, ng, 'n', n, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 24: drop weight, wt optional
.

Description

Let a sample of nn observations on pp variables come from ngng groups with njnj observations in the jjth group and nj = nnj=n. If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the jjth group ΣjΣj, then to test for equality of the variance-covariance matrices between groups, that is, Σ1 = Σ2 = = Σng = ΣΣ1=Σ2==Σng=Σ, the following likelihood-ratio test statistic, GG, can be used;
G = C
( ng )
(nng)log|S|(nj1)log|Sj|
j = 1
,
G=C {(n-ng)log|S|-j=1ng(nj-1)log|Sj|} ,
where
C = 1(2p2 + 3p 1)/(6(p + 1)(ng1))
(ng )
1/((nj1))1/((nng))
j = 1
,
C= 1-2p2+3p- 1 6(p+ 1)(ng- 1) (j= 1ng1 (nj- 1) -1 (n-ng) ) ,
and SjSj are the within-group variance-covariance matrices and SS is the pooled variance-covariance matrix given by
S = (j = 1ng(nj1)Sj)/((nng)).
S=j=1ng(nj-1)Sj (n-ng) .
For large nn, GG is approximately distributed as a χ2χ2 variable with (1/2)p(p + 1)(ng1)12p(p+1)(ng-1) degrees of freedom, see Morrison (1967) for further comments. If weights are used, then SS and SjSj are the weighted pooled and within-group variance-covariance matrices and nn is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, nag_mv_discrim (g03da) uses a QRQR decomposition. The group means are subtracted from the data and then for each group, a QRQR decomposition is computed to give an upper triangular matrix Rj * Rj*. This matrix can be scaled to give a matrix RjRj such that Sj = RjTRjSj=RjTRj. The pooled RR matrix is then computed from the RjRj matrices. The values of |S||S| and the |Sj||Sj| can then be calculated from the diagonal elements of RR and the RjRj.
This approach means that the Mahalanobis squared distances for a vector observation xx can be computed as zTzzTz, where Rjz = (xxj)Rjz=(x-x-j), xjx-j being the vector of means of the jjth group. These distances can be calculated by nag_mv_discrim_mahal (g03db). The distances are used in discriminant analysis and nag_mv_discrim_group (g03dc) uses the results of nag_mv_discrim (g03da) to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.

References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

Parameters

Compulsory Input Parameters

1:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnldxn.
x(k,l)xkl must contain the kkth observation for the llth variable, for k = 1,2,,nk=1,2,,n and l = 1,2,,ml=1,2,,m.
2:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint mnvarmnvar.
isx(l)isxl indicates whether or not the llth variable in x is to be included in the variance-covariance matrices.
If isx(l) > 0isxl>0 the llth variable is included, for l = 1,2,,ml=1,2,,m; otherwise it is not referenced.
Constraint: isx(l) > 0isxl>0 for nvar values of ll.
3:     nvar – int64int32nag_int scalar
pp, the number of variables in the variance-covariance matrices.
Constraint: nvar1nvar1.
4:     ing(n) – int64int32nag_int array
n, the dimension of the array, must satisfy the constraint n1n1.
ing(k)ingk indicates to which group the kkth observation belongs, for k = 1,2,,nk=1,2,,n.
Constraint: 1ing(k)ng1ingkng, for k = 1,2,,nk=1,2,,n
The values of ing must be such that each group has at least nvar members.
5:     ng – int64int32nag_int scalar
The number of groups, ngng.
Constraint: ng2ng2.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array ing and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
nn, the number of observations.
Constraint: n1n1.
2:     m – int64int32nag_int scalar
Default: The dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
The number of variables in the data array x.
Constraint: mnvarmnvar.
3:     wt( : :) – double array
Note: the dimension of the array wt must be at least nn if weight = 'W'weight='W', and at least 11 otherwise.
If weight = 'W'weight='W' the first nn elements of wt must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If wt(k) = 0.0wtk=0.0 the kkth observation is excluded from the calculations.
If weight = 'U'weight='U', wt is not referenced and the effective number of observations for a group is the number of observations in that group.
Constraint: if weight = 'W'weight='W', wt(k)0.0wtk0.0, for k = 1,2,,nk=1,2,,n.

Input Parameters Omitted from the MATLAB Interface

weight ldx ldgmn wk iwk

Output Parameters

1:     nig(ng) – int64int32nag_int array
nig(j)nigj contains the number of observations in the jjth group, for j = 1,2,,ngj=1,2,,ng.
2:     gmn(ldgmn,nvar) – double array
ldgmnngldgmnng.
The jjth row of gmn contains the means of the pp selected variables for the jjth group, for j = 1,2,,ngj=1,2,,ng.
3:     det(ng) – double array
The logarithm of the determinants of the within-group variance-covariance matrices.
4:     gc((ng + 1) × nvar × (nvar + 1) / 2(ng+1)×nvar×(nvar+1)/2) – double array
The first p(p + 1) / 2p(p+1)/2 elements of gc contain RR and the remaining ngng blocks of p(p + 1) / 2p(p+1)/2 elements contain the RjRj matrices. All are stored in packed form by columns.
5:     stat – double scalar
The likelihood-ratio test statistic, GG.
6:     df – double scalar
The degrees of freedom for the distribution of GG.
7:     sig – double scalar
The significance level for GG.
8:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,nvar < 1nvar<1,
orn < 1n<1,
orng < 2ng<2,
orm < nvarm<nvar,
orldx < nldx<n,
orldgmn < ngldgmn<ng,
orweight'U'weight'U' or 'W''W'.
  ifail = 2ifail=2
On entry,weight = 'W'weight='W' and a value of wt < 0.0wt<0.0.
  ifail = 3ifail=3
On entry,there are not exactly nvar elements of isx > 0isx>0,
ora value of ing is not in the range 11 to ng,
orthe effective number of observations for a group is less than 11,
ora group has less than nvar members.
  ifail = 4ifail=4
RR or one of the RjRj is not of full rank.

Accuracy

The accuracy is dependent on the accuracy of the computation of the QRQR decomposition. See nag_lapack_dgeqrf (f08ae) for further details.

Further Comments

The time taken will be approximately proportional to np2np2.

Example

function nag_mv_discrim_example
x = [1.1314, 2.4596;
     1.0986, 0.2624;
     0.6419, -2.3026;
     1.335, -3.2189;
     1.411, 0.0953;
     0.6419, -0.9163;
     2.1163, 0;
     1.335, -1.6094;
     1.361, -0.5108;
     2.0541, 0.1823;
     2.2083, -0.5108;
     2.7344, 1.2809;
     2.0412, 0.47;
     1.8718, -0.9163;
     1.7405, -0.9163;
     2.6101, 0.47;
     2.3224, 1.8563;
     2.2192, 2.0669;
     2.2618, 1.1314;
     3.9853, 0.9163;
     2.76, 2.0281];
isx = [int64(1);1];
nvar = int64(2);
ing = [int64(1);1;1;1;1;1;2;2;2;2;2;2;2;2;2;2;3;3;3;3;3];
ng = int64(3);
[nig, gmean, det, gc, stat, df, sig, ifail] = ...
    nag_mv_discrim(x, isx, nvar, ing, ng)
 

nig =

                    6
                   10
                    5


gmean =

    1.0433   -0.6034
    2.0073   -0.2060
    2.7097    1.5998


det =

   -0.8273
   -3.0460
   -2.2877


gc =

   -0.5100
   -0.2797
   -1.2173
   -0.3327
   -0.3724
   -1.9876
   -0.4603
   -0.7042
    0.4737
    0.7451
   -0.3251
   -0.4276


stat =

   19.2410


df =

     6


sig =

    0.0038


ifail =

                    0


function g03da_example
x = [1.1314, 2.4596;
     1.0986, 0.2624;
     0.6419, -2.3026;
     1.335, -3.2189;
     1.411, 0.0953;
     0.6419, -0.9163;
     2.1163, 0;
     1.335, -1.6094;
     1.361, -0.5108;
     2.0541, 0.1823;
     2.2083, -0.5108;
     2.7344, 1.2809;
     2.0412, 0.47;
     1.8718, -0.9163;
     1.7405, -0.9163;
     2.6101, 0.47;
     2.3224, 1.8563;
     2.2192, 2.0669;
     2.2618, 1.1314;
     3.9853, 0.9163;
     2.76, 2.0281];
isx = [int64(1);1];
nvar = int64(2);
ing = [int64(1);1;1;1;1;1;2;2;2;2;2;2;2;2;2;2;3;3;3;3;3];
ng = int64(3);
[nig, gmean, det, gc, stat, df, sig, ifail] = ...
    g03da(x, isx, nvar, ing, ng)
 

nig =

                    6
                   10
                    5


gmean =

    1.0433   -0.6034
    2.0073   -0.2060
    2.7097    1.5998


det =

   -0.8273
   -3.0460
   -2.2877


gc =

   -0.5100
   -0.2797
   -1.2173
   -0.3327
   -0.3724
   -1.9876
   -0.4603
   -0.7042
    0.4737
    0.7451
   -0.3251
   -0.4276


stat =

   19.2410


df =

     6


sig =

    0.0038


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013