hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_mv_discrim_mahal (g03db)

Purpose

nag_mv_discrim_mahal (g03db) computes Mahalanobis squared distances for group or pooled variance-covariance matrices. It is intended for use after nag_mv_discrim (g03da).

Syntax

[d, ifail] = g03db(equal, mode, gmn, gc, nobs, isx, x, 'nvar', nvar, 'ng', ng, 'm', m)
[d, ifail] = nag_mv_discrim_mahal(equal, mode, gmn, gc, nobs, isx, x, 'nvar', nvar, 'ng', ng, 'm', m)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: ng has been made optional
.

Description

Consider pp variables observed on ngng populations or groups. Let xjx-j be the sample mean and SjSj the within-group variance-covariance matrix for the jjth group and let xkxk be the kkth sample point in a dataset. A measure of the distance of the point from the jjth population or group is given by the Mahalanobis distance, DkjDkj:
Dkj2 = (xkxj)TSj1(xkxj).
Dkj2=(xk-x-j)TSj-1(xk-x-j).
If the pooled estimated of the variance-covariance matrix SS is used rather than the within-group variance-covariance matrices, then the distance is:
Dkj2 = (xkxj)TS1(xkxj).
Dkj2=(xk-x-j)TS-1(xk-x-j).
Instead of using the variance-covariance matrices SS and SjSj, nag_mv_discrim_mahal (g03db) uses the upper triangular matrices RR and RjRj supplied by nag_mv_discrim (g03da) such that S = RTRS=RTR and Sj = RjTRjSj=RjTRj. Dkj2Dkj2 can then be calculated as zTzzTz where Rjz = (xkxj)Rjz=(xk-x-j) or Rz = (xkxj)Rz=(xk-x-j) as appropriate.
A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis squared distance between the iith and jjth groups is:
Dij2 = (xixj)TSj1(xixj)
Dij2=(x-i-x-j)TSj-1(x-i-x-j)
or
Dij2 = (xixj)TS1(xixj).
Dij2=(x-i-x-j)TS-1(x-i-x-j).
Note:  Djj2 = 0Djj2=0 and that in the case when the pooled variance-covariance matrix is used Dij2 = Dji2Dij2=Dji2 so in this case only the lower triangular values of Dij2Dij2, i > ji>j, are computed.

References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

Parameters

Compulsory Input Parameters

1:     equal – string (length ≥ 1)
Indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
equal = 'E'equal='E'
The within-group variance-covariance matrices are assumed equal and the matrix RR stored in the first p(p + 1) / 2p(p+1)/2 elements of gc is used.
equal = 'U'equal='U'
The within-group variance-covariance matrices are assumed to be unequal and the matrices RjRj, for j = 1,2,,ngj=1,2,,ng, stored in the remainder of gc are used.
Constraint: equal = 'E'equal='E' or 'U''U'.
2:     mode – string (length ≥ 1)
Indicates whether distances from sample points are to be calculated or distances between the group means.
mode = 'S'mode='S'
The distances between the sample points given in x and the group means are calculated.
mode = 'M'mode='M'
The distances between the group means will be calculated.
Constraint: mode = 'M'mode='M' or 'S''S'.
3:     gmn(ldgmn,nvar) – double array
ldgmn, the first dimension of the array, must satisfy the constraint ldgmnngldgmnng.
The jjth row of gmn contains the means of the pp selected variables for the jjth group, for j = 1,2,,ngj=1,2,,ng. These are returned by nag_mv_discrim (g03da).
4:     gc((ng + 1) × nvar × (nvar + 1) / 2(ng+1)×nvar×(nvar+1)/2) – double array
The first p(p + 1) / 2p(p+1)/2 elements of gc should contain the upper triangular matrix RR and the next ngng blocks of p(p + 1) / 2p(p+1)/2 elements should contain the upper triangular matrices RjRj. All matrices must be stored packed by column. These matrices are returned by nag_mv_discrim (g03da). If equal = 'E'equal='E' only the first p(p + 1) / 2p(p+1)/2 elements are referenced, if equal = 'U'equal='U' only the elements p(p + 1) / 2 + 1p(p+1)/2+1 to (ng + 1)p(p + 1) / 2(ng+1)p(p+1)/2 are referenced.
Constraints:
  • if equal = 'E'equal='E', R0.0R0.0;
  • if equal = 'U'equal='U', the diagonal elements of the Rj0.0Rj0.0, for j = 1,2,,ngj=1,2,,ng.
5:     nobs – int64int32nag_int scalar
If mode = 'S'mode='S', the number of sample points in x for which distances are to be calculated.
If mode = 'M'mode='M', nobs is not referenced.
Constraint: if nobs1nobs1, mode = 'S'mode='S'.
6:     isx( : :) – int64int32nag_int array
Note: the dimension of the array isx must be at least max (1,m)max(1,m).
If mode = 'S'mode='S', isx(l)isxl indicates if the llth variable in x is to be included in the distance calculations. If isx(l) > 0isxl>0 the llth variable is included, for l = 1,2,,ml=1,2,,m; otherwise the llth variable is not referenced.
If mode = 'M'mode='M', isx is not referenced.
Constraint: if mode = 'S'mode='S', isx(l) > 0isxl>0 for nvar values of ll.
7:     x(ldx, : :) – double array
The first dimension, ldx, of the array x must satisfy
  • if mode = 'S'mode='S', ldxnobsldxnobs;
  • otherwise 11.
The second dimension of the array must be at least max (1,m)max(1,m)
If mode = 'S'mode='S' the kkth row of x must contain xkxk. That is x(k,l)xkl must contain the kkth sample value for the llth variable, for k = 1,2,,nobsk=1,2,,nobs and l = 1,2,,ml=1,2,,m. Otherwise x is not referenced.

Optional Input Parameters

1:     nvar – int64int32nag_int scalar
Default: The second dimension of the array gmn.
pp, the number of variables in the variance-covariance matrices as specified to nag_mv_discrim (g03da).
Constraint: nvar1nvar1.
2:     ng – int64int32nag_int scalar
Default: The first dimension of the array gmn.
The number of groups, ngng.
Constraint: ng2ng2.
3:     m – int64int32nag_int scalar
Default: The dimension of the arrays isx, x.
If mode = 'S'mode='S', the number of variables in the data array x.
If mode = 'M'mode='M', m is not referenced.
Constraint: if mnvarmnvar, mode = 'S'mode='S'.

Input Parameters Omitted from the MATLAB Interface

ldgmn ldx ldd wk

Output Parameters

1:     d(ldd,ng) – double array
The squared distances.
If mode = 'S'mode='S', d(k,j)dkj contains the squared distance of the kkth sample point from the jjth group mean, Dkj2Dkj2, for k = 1,2,,nobsk=1,2,,nobs and j = 1,2,,ngj=1,2,,ng.
If mode = 'M'mode='M' and equal = 'U'equal='U', d(i,j)dij contains the squared distance between the iith mean and the jjth mean, Dij2Dij2, for i = 1,2,,ngi=1,2,,ng and j = 1,2,,i1,i + 1,,ngj=1,2,,i-1,i+1,,ng. The elements d(i,i)dii are not referenced, for i = 1,2,,ngi=1,2,,ng.
If mode = 'M'mode='M' and equal = 'E'equal='E', d(i,j)dij contains the squared distance between the iith mean and the jjth mean, Dij2Dij2, for i = 1,2,,ngi=1,2,,ng and j = 1,2,,i1j=1,2,,i-1. Since Dij = DjiDij=Dji the elements d(i,j)dij are not referenced, for i = 1,2,,ngi=1,2,,ng and j = i + 1,,ngj=i+1,,ng.
2:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,nvar < 1nvar<1,
orng < 2ng<2,
orldgmn < ngldgmn<ng,
ormode = 'S'mode='S' and nobs < 1nobs<1,
ormode = 'S'mode='S' and m < nvarm<nvar,
ormode = 'S'mode='S' and ldx < nobsldx<nobs,
ormode = 'S'mode='S' and ldd < nobsldd<nobs,
ormode = 'M'mode='M' and ldd < ngldd<ng,
orequal'E'equal'E' or ‘U’,
ormode'M'mode'M' or ‘S’.
  ifail = 2ifail=2
On entry,mode = 'S'mode='S' and the number of variables indicated by isx is not equal to nvar,
orequal = 'E'equal='E' and a diagonal element of RR is zero,
orequal = 'U'equal='U' and a diagonal element of RjRj for some jj is zero.

Accuracy

The accuracy will depend upon the accuracy of the input RR or RjRj matrices.

Further Comments

If the distances are to be used for discrimination, see also nag_mv_discrim_group (g03dc).

Example

function nag_mv_discrim_mahal_example
equal = 'U';
mode = 'Sample points';
gmean = [1.0433, -0.603417;
     2.00727, -0.20604;
     2.70974, 1.5998];
gc = [-0.5099642881287538;
     -0.279705472386133;
     -1.217327847040481;
     -0.3326727521153484;
     -0.3723518779712077;
     -1.987589395382754;
     -0.4603014906920608;
     -0.7041634974247672;
     0.4737334252803499;
     0.7451327720614629;
     -0.3251057349548681;
     -0.4275545007358186];
nobs = int64(6);
isx = [int64(1);1];
x = [1.6292, -0.9163;
     2.5572, 1.6094;
     2.5649, -0.2231;
     0.9555, -2.3026;
     3.4012, -2.3026;
     3.0204, -0.2231];
[d, ifail] = nag_mv_discrim_mahal(equal, mode, gmean, gc, nobs, isx, x)
 

d =

    3.3393    0.7521   50.9283
   20.7771    5.6559    0.0597
   21.3631    4.8411   19.4978
    0.7184    6.2803  124.7323
   55.0003   88.8604   71.7852
   36.1703   15.7849   15.7489


ifail =

                    0


function g03db_example
equal = 'U';
mode = 'Sample points';
gmean = [1.0433, -0.603417;
     2.00727, -0.20604;
     2.70974, 1.5998];
gc = [-0.5099642881287538;
     -0.279705472386133;
     -1.217327847040481;
     -0.3326727521153484;
     -0.3723518779712077;
     -1.987589395382754;
     -0.4603014906920608;
     -0.7041634974247672;
     0.4737334252803499;
     0.7451327720614629;
     -0.3251057349548681;
     -0.4275545007358186];
nobs = int64(6);
isx = [int64(1);1];
x = [1.6292, -0.9163;
     2.5572, 1.6094;
     2.5649, -0.2231;
     0.9555, -2.3026;
     3.4012, -2.3026;
     3.0204, -0.2231];
[d, ifail] = g03db(equal, mode, gmean, gc, nobs, isx, x)
 

d =

    3.3393    0.7521   50.9283
   20.7771    5.6559    0.0597
   21.3631    4.8411   19.4978
    0.7184    6.2803  124.7323
   55.0003   88.8604   71.7852
   36.1703   15.7849   15.7489


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013