Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_mv_discrim (g03da)

## Purpose

nag_mv_discrim (g03da) computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.

## Syntax

[nig, gmn, det, gc, stat, df, sig, ifail] = g03da(x, isx, nvar, ing, ng, 'n', n, 'm', m, 'wt', wt)
[nig, gmn, det, gc, stat, df, sig, ifail] = nag_mv_discrim(x, isx, nvar, ing, ng, 'n', n, 'm', m, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 24: weight was removed from the interface; wt was made optional

## Description

Let a sample of $n$ observations on $p$ variables come from ${n}_{g}$ groups with ${n}_{j}$ observations in the $j$th group and $\sum {n}_{j}=n$. If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the $j$th group ${\Sigma }_{j}$, then to test for equality of the variance-covariance matrices between groups, that is, ${\Sigma }_{1}={\Sigma }_{2}=\cdots ={\Sigma }_{{n}_{g}}=\Sigma$, the following likelihood-ratio test statistic, $G$, can be used;
 $G=C n-nglogS-∑j=1ngnj-1logSj ,$
where
 $C= 1-2p2+3p- 1 6p+ 1ng- 1 ∑j= 1ng1 nj- 1 -1 n-ng ,$
and ${S}_{j}$ are the within-group variance-covariance matrices and $S$ is the pooled variance-covariance matrix given by
 $S=∑j=1ngnj-1Sj n-ng .$
For large $n$, $G$ is approximately distributed as a ${\chi }^{2}$ variable with $\frac{1}{2}p\left(p+1\right)\left({n}_{g}-1\right)$ degrees of freedom, see Morrison (1967) for further comments. If weights are used, then $S$ and ${S}_{j}$ are the weighted pooled and within-group variance-covariance matrices and $n$ is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, nag_mv_discrim (g03da) uses a $QR$ decomposition. The group means are subtracted from the data and then for each group, a $QR$ decomposition is computed to give an upper triangular matrix ${R}_{j}^{*}$. This matrix can be scaled to give a matrix ${R}_{j}$ such that ${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. The pooled $R$ matrix is then computed from the ${R}_{j}$ matrices. The values of $\left|S\right|$ and the $\left|{S}_{j}\right|$ can then be calculated from the diagonal elements of $R$ and the ${R}_{j}$.
This approach means that the Mahalanobis squared distances for a vector observation $x$ can be computed as ${z}^{\mathrm{T}}z$, where ${R}_{j}z=\left(x-{\stackrel{-}{x}}_{j}\right)$, ${\stackrel{-}{x}}_{j}$ being the vector of means of the $j$th group. These distances can be calculated by nag_mv_discrim_mahal (g03db). The distances are used in discriminant analysis and nag_mv_discrim_group (g03dc) uses the results of nag_mv_discrim (g03da) to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.

## References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{m}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{x}}\left(\mathit{k},\mathit{l}\right)$ must contain the $\mathit{k}$th observation for the $\mathit{l}$th variable, for $\mathit{k}=1,2,\dots ,n$ and $\mathit{l}=1,2,\dots ,{\mathbf{m}}$.
2:     $\mathrm{isx}\left({\mathbf{m}}\right)$int64int32nag_int array
${\mathbf{isx}}\left(l\right)$ indicates whether or not the $l$th variable in x is to be included in the variance-covariance matrices.
If ${\mathbf{isx}}\left(\mathit{l}\right)>0$ the $\mathit{l}$th variable is included, for $\mathit{l}=1,2,\dots ,{\mathbf{m}}$; otherwise it is not referenced.
Constraint: ${\mathbf{isx}}\left(l\right)>0$ for nvar values of $l$.
3:     $\mathrm{nvar}$int64int32nag_int scalar
$p$, the number of variables in the variance-covariance matrices.
Constraint: ${\mathbf{nvar}}\ge 1$.
4:     $\mathrm{ing}\left({\mathbf{n}}\right)$int64int32nag_int array
${\mathbf{ing}}\left(\mathit{k}\right)$ indicates to which group the $\mathit{k}$th observation belongs, for $\mathit{k}=1,2,\dots ,n$.
Constraint: $1\le {\mathbf{ing}}\left(\mathit{k}\right)\le {\mathbf{ng}}$, for $\mathit{k}=1,2,\dots ,n$
The values of ing must be such that each group has at least nvar members.
5:     $\mathrm{ng}$int64int32nag_int scalar
The number of groups, ${n}_{g}$.
Constraint: ${\mathbf{ng}}\ge 2$.

### Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the dimension of the array ing and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
$n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 1$.
2:     $\mathrm{m}$int64int32nag_int scalar
Default: the dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
The number of variables in the data array x.
Constraint: ${\mathbf{m}}\ge {\mathbf{nvar}}$.
3:     $\mathrm{wt}\left(:\right)$ – double array
The dimension of the array wt must be at least ${\mathbf{n}}$ if $\mathit{weight}=\text{'W'}$, and at least $1$ otherwise
If $\mathit{weight}=\text{'W'}$ the first $n$ elements of wt must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If ${\mathbf{wt}}\left(k\right)=0.0$ the $k$th observation is excluded from the calculations.
If $\mathit{weight}=\text{'U'}$, wt is not referenced and the effective number of observations for a group is the number of observations in that group.
Constraint: if $\mathit{weight}=\text{'W'}$, ${\mathbf{wt}}\left(\mathit{k}\right)\ge 0.0$, for $\mathit{k}=1,2,\dots ,n$.

### Output Parameters

1:     $\mathrm{nig}\left({\mathbf{ng}}\right)$int64int32nag_int array
${\mathbf{nig}}\left(\mathit{j}\right)$ contains the number of observations in the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
2:     $\mathrm{gmn}\left(\mathit{ldgmn},{\mathbf{nvar}}\right)$ – double array
The $\mathit{j}$th row of gmn contains the means of the $p$ selected variables for the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
3:     $\mathrm{det}\left({\mathbf{ng}}\right)$ – double array
The logarithm of the determinants of the within-group variance-covariance matrices.
4:     $\mathrm{gc}\left(\left({\mathbf{ng}}+1\right)×{\mathbf{nvar}}×\left({\mathbf{nvar}}+1\right)/2\right)$ – double array
The first $p\left(p+1\right)/2$ elements of gc contain $R$ and the remaining ${n}_{g}$ blocks of $p\left(p+1\right)/2$ elements contain the ${R}_{j}$ matrices. All are stored in packed form by columns.
5:     $\mathrm{stat}$ – double scalar
The likelihood-ratio test statistic, $G$.
6:     $\mathrm{df}$ – double scalar
The degrees of freedom for the distribution of $G$.
7:     $\mathrm{sig}$ – double scalar
The significance level for $G$.
8:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{nvar}}<1$, or ${\mathbf{n}}<1$, or ${\mathbf{ng}}<2$, or ${\mathbf{m}}<{\mathbf{nvar}}$, or $\mathit{ldx}<{\mathbf{n}}$, or $\mathit{ldgmn}<{\mathbf{ng}}$, or $\mathit{weight}\ne \text{'U'}$ or $\text{'W'}$.
${\mathbf{ifail}}=2$
 On entry, $\mathit{weight}=\text{'W'}$ and a value of ${\mathbf{wt}}<0.0$.
${\mathbf{ifail}}=3$
 On entry, there are not exactly nvar elements of ${\mathbf{isx}}>0$, or a value of ing is not in the range $1$ to ng, or the effective number of observations for a group is less than $1$, or a group has less than nvar members.
${\mathbf{ifail}}=4$
$R$ or one of the ${R}_{j}$ is not of full rank.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

The accuracy is dependent on the accuracy of the computation of the $QR$ decomposition. See nag_lapack_dgeqrf (f08ae) for further details.

The time taken will be approximately proportional to $n{p}^{2}$.

## Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of $21$ patients are input and the statistics computed by nag_mv_discrim (g03da). The printed results show that there is evidence that the within-group variance-covariance matrices are not equal.
```function g03da_example

fprintf('g03da example results\n\n');

x = [1.1314,  2.4596;
1.0986,  0.2624;
0.6419, -2.3026;
1.3350, -3.2189;
1.4110,  0.0953;
0.6419, -0.9163;
2.1163,  0.0000;
1.3350, -1.6094;
1.3610, -0.5108;
2.0541,  0.1823;
2.2083, -0.5108;
2.7344,  1.2809;
2.0412,  0.4700;
1.8718, -0.9163;
1.7405, -0.9163;
2.6101,  0.4700;
2.3224,  1.8563;
2.2192,  2.0669;
2.2618,  1.1314;
3.9853,  0.9163;
2.7600,  2.0281];
[n,m] = size(x);
isx  = ones(m,1,'int64');
nvar = int64(m);
ing  = ones(n,1,'int64');
ing(7:16) = int64(2);
ing(17:n) = int64(3);
ng        = int64(3);

[nig, gmean, det, gc, stat, df, sig, ifail] = ...
g03da( ...
x, isx, nvar, ing, ng);

mtitle = 'Group means';
matrix = 'General';
diag   = ' ';
[ifail] = x04ca( ...
matrix, diag, gmean, mtitle);
fprintf('\nLog of determinants\n\n');
fprintf('%10.4f%10.4f%10.4f\n\n', det);
fprintf(' Stat = %7.4f\n', stat);
fprintf('   DF = %7.4f\n', df);
fprintf('  SIG = %7.4f\n', sig);

```
```g03da example results

Group means
1          2
1      1.0433    -0.6034
2      2.0073    -0.2060
3      2.7097     1.5998

Log of determinants

-0.8273   -3.0460   -2.2877

Stat = 19.2410
DF =  6.0000
SIG =  0.0038
```