hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_mv_discrim_group (g03dc)

Purpose

nag_mv_discrim_group (g03dc) allocates observations to groups according to selected rules. It is intended for use after nag_mv_discrim (g03da).

Syntax

[prior, p, iag, ati, ifail] = g03dc(typ, equal, priors, nig, gmn, gc, det, isx, x, prior, atiq, 'nvar', nvar, 'ng', ng, 'nobs', nobs, 'm', m)
[prior, p, iag, ati, ifail] = nag_mv_discrim_group(typ, equal, priors, nig, gmn, gc, det, isx, x, prior, atiq, 'nvar', nvar, 'ng', ng, 'nobs', nobs, 'm', m)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: nobs has been made optional
.

Description

Discriminant analysis is concerned with the allocation of observations to groups using information from other observations whose group membership is known, XtXt; these are called the training set. Consider pp variables observed on ngng populations or groups. Let xjx-j be the sample mean and SjSj the within-group variance-covariance matrix for the jjth group; these are calculated from a training set of nn observations with njnj observations in the jjth group, and let xkxk be the kkth observation from the set of observations to be allocated to the ngng groups. The observation can be allocated to a group according to a selected rule. The allocation rule or discriminant function will be based on the distance of the observation from an estimate of the location of the groups, usually the group means. A measure of the distance of the observation from the jjth group mean is given by the Mahalanobis distance, DkjDkj:
Dkj2 = (xkxj)TSj1(xkxj).
Dkj2=(xk-x-j)TSj-1(xk-x-j).
(1)
If the pooled estimate of the variance-covariance matrix SS is used rather than the within-group variance-covariance matrices, then the distance is:
Dkj2 = (xkxj)TS1(xkxj).
Dkj2=(xk-x-j)TS-1(xk-x-j).
(2)
Instead of using the variance-covariance matrices SS and SjSj, nag_mv_discrim_group (g03dc) uses the upper triangular matrices RR and RjRj supplied by nag_mv_discrim (g03da) such that S = RTRS=RTR and Sj = RjTRjSj=RjTRj. Dkj2Dkj2 can then be calculated as zTzzTz where RTjz = (xkxj)RTjz=(xk-xj) or RTz = (xkx)RTz=(xk-x) as appropriate.
In addition to the distances, a set of prior probabilities of group membership, πjπj, for j = 1,2,,ngj=1,2,,ng, may be used, with πj = 1πj=1. The prior probabilities reflect your view as to the likelihood of the observations coming from the different groups. Two common cases for prior probabilities are π1 = π2 = = πngπ1=π2==πng, that is, equal prior probabilities, and πj = nj / nπj=nj/n, for j = 1,2,,ngj=1,2,,ng, that is, prior probabilities proportional to the number of observations in the groups in the training set.
nag_mv_discrim_group (g03dc) uses one of four allocation rules. In all four rules the pp variables are assumed to follow a multivariate Normal distribution with mean μjμj and variance-covariance matrix ΣjΣj if the observation comes from the jjth group. The different rules depend on whether or not the within-group variance-covariance matrices are assumed equal, i.e., Σ1 = Σ2 = = ΣngΣ1=Σ2==Σng, and whether a predictive or estimative approach is used. If p ( xk μj ,Σj) p ( xk μj ,Σj)  is the probability of observing the observation xkxk from group jj, then the posterior probability of belonging to group jj is:
p (jxk,μj,Σj) p ( xk μj ,Σj) πj.
p (jxk,μj,Σj) p ( xk μj ,Σj) πj.
(3)
In the estimative approach, the parameters μjμj and ΣjΣj in (3) are replaced by their estimates calculated from XtXt. In the predictive approach, a non-informative prior distribution is used for the parameters and a posterior distribution for the parameters, p (μj, Σj Xt ) p (μj, Σj Xt ) , is found. A predictive distribution is then obtained by integrating p (jxk,μj,Σj) p (μj, Σj X ) p (jxk,μj,Σj) p (μj, Σj X )  over the parameter space. This predictive distribution then replaces p ( xk μj ,Σj) p ( xk μj ,Σj)  in (3). See Aitchison and Dunsmore (1975), Aitchison et al. (1977) and Moran and Murphy (1979) for further details.
The observation is allocated to the group with the highest posterior probability. Denoting the posterior probabilities, p (jxk,μj,Σj) p (jxk,μj,Σj) , by qjqj, the four allocation rules are:
(i) Estimative with equal variance-covariance matrices – Linear Discrimination
logqj(1/2)Dkj2 + logπj
logqj-12Dkj2+logπj
(ii) Estimative with unequal variance-covariance matrices – Quadratic Discrimination
logqj(1/2)Dkj2 + logπj(1/2)log|Sj|
logqj-12Dkj2+logπj-12log|Sj|
(iii) Predictive with equal variance-covariance matrices
qj 1 ((nj + 1) / nj) p / 2 {1 + [nj / ((nng)(nj + 1))]D k j 2} (n + 1ng) / 2
q j - 1 ( ( n j +1 ) / n j ) p / 2 { 1 +[ n j / ( ( n - n g ) ( n j +1 ) ) ] D k j 2 } ( n +1 - n g ) / 2
(iv) Predictive with unequal variance-covariance matrices
qj 1 C {((nj21) / nj)|Sj|} p / 2 {1 + (nj / (nj21))D k j 2} nj / 2 ,
q j - 1 C { ( ( n j 2 - 1 ) / n j ) | S j | } p / 2 { 1 + ( n j / ( n j 2 - 1 ) ) D k j 2 } n j / 2 ,
where
C = (Γ((1/2)(njp)))/(Γ((1/2)nj)).
C=Γ(12(nj-p)) Γ(12nj) .
In the above the appropriate value of Dkj2Dkj2 from (1) or (2) is used. The values of the qjqj are standardized so that,
ng
qj = 1.
j = 1
j=1ngqj=1.
Moran and Murphy (1979) show the similarity between the predictive methods and methods based upon likelihood ratio tests.
In addition to allocating the observation to a group, nag_mv_discrim_group (g03dc) computes an atypicality index, Ij(xk)Ij(xk). The predictive atypicality index is returned, irrespective of the value of the parameter typ. This represents the probability of obtaining an observation more typical of group jj than the observed xkxk (see Aitchison and Dunsmore (1975) and Aitchison et al. (1977)). The atypicality index is computed for unequal within-group variance-covariance matrices as:
Ij(xk) = P(Bz : (1/2)p,(1/2)(njp))
Ij(xk)=P(Bz:12p,12(nj-p))
where P(Bβ : a,b)P(Bβ:a,b) is the lower tail probability from a beta distribution and
z = Dkj2 / (Dkj2 + (nj21) / nj),
z=Dkj2/(Dkj2+(nj2-1)/nj),
and for equal within-group variance-covariance matrices as:
Ij(xk) = P(Bz : (1/2)p,(1/2)(nngp + 1)),
Ij(xk)=P(Bz : 12p,12(n-ng-p+ 1)),
with
z = Dkj2 / (Dkj2 + (nng)(nj + 1) / nj).
z=Dkj2/(Dkj2+(n-ng)(nj+1)/nj).
If Ij(xk)Ij(xk) is close to 11 for all groups it indicates that the observation may come from a grouping not represented in the training set. Moran and Murphy (1979) provide a frequentist interpretation of Ij(xk)Ij(xk).

References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Aitchison J, Habbema J D F and Kay J W (1977) A critical comparison of two methods of statistical discrimination Appl. Statist. 26 15–25
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Moran M A and Murphy B J (1979) A closer look at two alternative methods of statistical discrimination Appl. Statist. 28 223–232
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

Parameters

Compulsory Input Parameters

1:     typ – string (length ≥ 1)
Whether the estimative or predictive approach is used.
typ = 'E'typ='E'
The estimative approach is used.
typ = 'P'typ='P'
The predictive approach is used.
Constraint: typ = 'E'typ='E' or 'P''P'.
2:     equal – string (length ≥ 1)
Indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
equal = 'E'equal='E'
The within-group variance-covariance matrices are assumed equal and the matrix RR stored in the first p(p + 1) / 2p(p+1)/2 elements of gc is used.
equal = 'U'equal='U'
The within-group variance-covariance matrices are assumed to be unequal and the matrices RiRi, for i = 1,2,,ngi=1,2,,ng, stored in the remainder of gc are used.
Constraint: equal = 'E'equal='E' or 'U''U'.
3:     priors – string (length ≥ 1)
Indicates the form of the prior probabilities to be used.
priors = 'E'priors='E'
Equal prior probabilities are used.
priors = 'P'priors='P'
Prior probabilities proportional to the group sizes in the training set, njnj, are used.
priors = 'I'priors='I'
The prior probabilities are input in prior.
Constraint: priors = 'E'priors='E', 'I''I' or 'P''P'.
4:     nig(ng) – int64int32nag_int array
ng, the dimension of the array, must satisfy the constraint ng2ng2.
The number of observations in each group in the training set, njnj.
Constraints:
  • if equal = 'E'equal='E', nig(j) > 0nigj>0 and j = 1ngnig(j) > ng + nvarj=1ngnigj>ng+nvar, for j = 1,2,,ngj=1,2,,ng;
  • if equal = 'U'equal='U', nig(j) > nvarnigj>nvar, for j = 1,2,,ngj=1,2,,ng.
5:     gmn(ldgmn,nvar) – double array
ldgmn, the first dimension of the array, must satisfy the constraint ldgmnngldgmnng.
The jjth row of gmn contains the means of the pp variables for the jjth group, for j = 1,2,,njj=1,2,,nj. These are returned by nag_mv_discrim (g03da).
6:     gc((ng + 1) × nvar × (nvar + 1) / 2(ng+1)×nvar×(nvar+1)/2) – double array
The first p(p + 1) / 2p(p+1)/2 elements of gc should contain the upper triangular matrix RR and the next ngng blocks of p(p + 1) / 2p(p+1)/2 elements should contain the upper triangular matrices RjRj.
All matrices must be stored packed by column. These matrices are returned by nag_mv_discrim (g03da). If equal = 'E'equal='E' only the first p(p + 1) / 2p(p+1)/2 elements are referenced, if equal = 'U'equal='U' only the elements p(p + 1) / 2 + 1p(p+1)/2+1 to (ng + 1)p(p + 1) / 2(ng+1)p(p+1)/2 are referenced.
Constraints:
  • if equal = 'E'equal='E', the diagonal elements of RR must be 0.00.0;
  • if equal = 'U'equal='U', the diagonal elements of the RjRj must be 0.00.0, for j = 1,2,,ngj=1,2,,ng.
7:     det(ng) – double array
ng, the dimension of the array, must satisfy the constraint ng2ng2.
If equal = 'U'equal='U'. the logarithms of the determinants of the within-group variance-covariance matrices as returned by nag_mv_discrim (g03da). Otherwise det is not referenced.
8:     isx(m) – int64int32nag_int array
m, the dimension of the array, must satisfy the constraint mnvarmnvar.
isx(l)isxl indicates if the llth variable in x is to be included in the distance calculations.
If isx(l) > 0isxl>0, the llth variable is included, for l = 1,2,,ml=1,2,,m; otherwise the llth variable is not referenced.
Constraint: isx(l) > 0isxl>0 for nvar values of ll.
9:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxnobsldxnobs.
x(k,l)xkl must contain the kkth observation for the llth variable, for k = 1,2,,nobsk=1,2,,nobs and l = 1,2,,ml=1,2,,m.
10:   prior(ng) – double array
ng, the dimension of the array, must satisfy the constraint ng2ng2.
If priors = 'I'priors='I', the prior probabilities for the ngng groups.
Constraint: if priors = 'I'priors='I', prior(j) > 0.0priorj>0.0 and |1j = 1ngprior(j)| 10 × machine precision |1- j=1 ng priorj | 10×machine precision , for j = 1,2,,ngj=1,2,,ng.
11:   atiq – logical scalar
atiq must be true if atypicality indices are required. If atiq is false the array ati is not set.

Optional Input Parameters

1:     nvar – int64int32nag_int scalar
Default: The second dimension of the array gmn.
pp, the number of variables in the variance-covariance matrices.
Constraint: nvar1nvar1.
2:     ng – int64int32nag_int scalar
Default: The dimension of the arrays nig, det, prior and the first dimension of the array gmn. (An error is raised if these dimensions are not equal.)
The number of groups, ngng.
Constraint: ng2ng2.
3:     nobs – int64int32nag_int scalar
Default: The first dimension of the arrays gmn, x. (An error is raised if these dimensions are not equal.)
The number of observations in x which are to be allocated.
Constraint: nobs1nobs1.
4:     m – int64int32nag_int scalar
Default: The dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
The number of variables in the data array x.
Constraint: mnvarmnvar.

Input Parameters Omitted from the MATLAB Interface

ldgmn ldx ldp wk

Output Parameters

1:     prior(ng) – double array
If priors = 'P'priors='P', the computed prior probabilities in proportion to group sizes for the ngng groups.
If priors = 'I'priors='I', the input prior probabilities will be unchanged.
If priors = 'E'priors='E', prior is not set.
2:     p(ldp,ng) – double array
ldpnobsldpnobs.
p(k,j)pkj contains the posterior probability pkjpkj for allocating the kkth observation to the jjth group, for k = 1,2,,nobsk=1,2,,nobs and j = 1,2,,ngj=1,2,,ng.
3:     iag(nobs) – int64int32nag_int array
The groups to which the observations have been allocated.
4:     ati(ldp, : :) – double array
The first dimension of the array ati will be nobsnobs
The second dimension of the array will be ngng if atiq = trueatiq=true, and at least 11 otherwise
ldpnobsldpnobs.
If atiq is true, ati(k,j)atikj will contain the predictive atypicality index for the kkth observation with respect to the jjth group, for k = 1,2,,nobsk=1,2,,nobs and j = 1,2,,ngj=1,2,,ng.
If atiq is false, ati is not set.
5:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,nvar < 1nvar<1,
orng < 2ng<2,
ornobs < 1nobs<1,
orm < nvarm<nvar,
orldgmn < ngldgmn<ng,
orldx < nobsldx<nobs,
orldp < nobsldp<nobs,
ortyp'E'typ'E' or ‘p’,
orequal'E'equal'E' or ‘U’,
orpriors'E'priors'E', ‘I’ or ‘p’.
  ifail = 2ifail=2
On entry,the number of variables indicated by isx is not equal to nvar,
orequal = 'E'equal='E' and nig(j)0nigj0, for some jj,
orequal = 'E'equal='E' and j = 1ngnig(j)ng + nvarj=1ngnigjng+nvar,
orequal = 'U'equal='U' and nig(j)nvarnigjnvar for some jj.
  ifail = 3ifail=3
On entry,priors = 'I'priors='I' and prior(j)0.0priorj0.0 for some jj,
orpriors = 'I'priors='I' and j = 1ngprior(j)j=1ngpriorj is not within 10 × machine precision10×machine precision of 11.
  ifail = 4ifail=4
On entry,equal = 'E'equal='E' and a diagonal element of RR is zero,
orequal = 'U'equal='U' and a diagonal element of RjRj for some jj is zero.

Accuracy

The accuracy of the returned posterior probabilities will depend on the accuracy of the input RR or RjRj matrices. The atypicality index should be accurate to four significant places.

Further Comments

The distances Dkj2Dkj2 can be computed using nag_mv_discrim_mahal (g03db) if other forms of discrimination are required.

Example

function nag_mv_discrim_group_example
typ = 'P';
equal = 'U';
priors = 'Equal priors';
nig = [int64(6);10;5];
gmean = [1.0433, -0.6034166666666667;
     2.00727, -0.20604;
     2.70974, 1.5998];
gc = [-0.5099642881287538;
     -0.279705472386133;
     -1.217327847040481;
     -0.3326727521153484;
     -0.3723518779712077;
     -1.987589395382754;
     -0.4603014906920608;
     -0.7041634974247672;
     0.4737334252803499;
     0.7451327720614629;
     -0.3251057349548681;
     -0.4275545007358186];
det = [-0.8273469064608421;
     -3.045968198109008;
     -2.287732741158105];
isx = [int64(1);1];
x = [1.6292, -0.9163;
     2.5572, 1.6094;
     2.5649, -0.2231;
     0.9555, -2.3026;
     3.4012, -2.3026;
     3.0204, -0.2231];
prior = zeros(3, 1);
atiq = true;
[priorOut, p, iag, ati, ifail] = ...
    nag_mv_discrim_group(typ, equal, priors, nig, gmean, gc, det, isx, x, prior, atiq)
 

priorOut =

     0
     0
     0


p =

    0.0939    0.9046    0.0015
    0.0047    0.1682    0.8270
    0.0186    0.9196    0.0618
    0.6969    0.3026    0.0005
    0.3174    0.0130    0.6696
    0.0323    0.3664    0.6013


iag =

                    2
                    3
                    2
                    1
                    3
                    3


ati =

    0.5956    0.2539    0.9747
    0.9519    0.8360    0.0184
    0.9540    0.7966    0.9122
    0.2073    0.8599    0.9929
    0.9908    0.9999    0.9843
    0.9807    0.9779    0.8871


ifail =

                    0


function g03dc_example
typ = 'P';
equal = 'U';
priors = 'Equal priors';
nig = [int64(6);10;5];
gmean = [1.0433, -0.6034166666666667;
     2.00727, -0.20604;
     2.70974, 1.5998];
gc = [-0.5099642881287538;
     -0.279705472386133;
     -1.217327847040481;
     -0.3326727521153484;
     -0.3723518779712077;
     -1.987589395382754;
     -0.4603014906920608;
     -0.7041634974247672;
     0.4737334252803499;
     0.7451327720614629;
     -0.3251057349548681;
     -0.4275545007358186];
det = [-0.8273469064608421;
     -3.045968198109008;
     -2.287732741158105];
isx = [int64(1);1];
x = [1.6292, -0.9163;
     2.5572, 1.6094;
     2.5649, -0.2231;
     0.9555, -2.3026;
     3.4012, -2.3026;
     3.0204, -0.2231];
prior = zeros(3, 1);
atiq = true;
[priorOut, p, iag, ati, ifail] = ...
    g03dc(typ, equal, priors, nig, gmean, gc, det, isx, x, prior, atiq)
 

priorOut =

     0
     0
     0


p =

    0.0939    0.9046    0.0015
    0.0047    0.1682    0.8270
    0.0186    0.9196    0.0618
    0.6969    0.3026    0.0005
    0.3174    0.0130    0.6696
    0.0323    0.3664    0.6013


iag =

                    2
                    3
                    2
                    1
                    3
                    3


ati =

    0.5956    0.2539    0.9747
    0.9519    0.8360    0.0184
    0.9540    0.7966    0.9122
    0.2073    0.8599    0.9929
    0.9908    0.9999    0.9843
    0.9807    0.9779    0.8871


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013