Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_mv_discrim_mahal (g03db)

## Purpose

nag_mv_discrim_mahal (g03db) computes Mahalanobis squared distances for group or pooled variance-covariance matrices. It is intended for use after nag_mv_discrim (g03da).

## Syntax

[d, ifail] = g03db(equal, mode, gmn, gc, nobs, isx, x, 'nvar', nvar, 'ng', ng, 'm', m)
[d, ifail] = nag_mv_discrim_mahal(equal, mode, gmn, gc, nobs, isx, x, 'nvar', nvar, 'ng', ng, 'm', m)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 22: ng was made optional

## Description

Consider $p$ variables observed on ${n}_{g}$ populations or groups. Let ${\stackrel{-}{x}}_{j}$ be the sample mean and ${S}_{j}$ the within-group variance-covariance matrix for the $j$th group and let ${x}_{k}$ be the $k$th sample point in a dataset. A measure of the distance of the point from the $j$th population or group is given by the Mahalanobis distance, ${D}_{kj}$:
 $Dkj2=xk-x-jTSj-1xk-x-j.$
If the pooled estimated of the variance-covariance matrix $S$ is used rather than the within-group variance-covariance matrices, then the distance is:
 $Dkj2=xk-x-jTS-1xk-x-j.$
Instead of using the variance-covariance matrices $S$ and ${S}_{j}$, nag_mv_discrim_mahal (g03db) uses the upper triangular matrices $R$ and ${R}_{j}$ supplied by nag_mv_discrim (g03da) such that $S={R}^{\mathrm{T}}R$ and ${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. ${D}_{kj}^{2}$ can then be calculated as ${z}^{\mathrm{T}}z$ where ${R}_{j}z=\left({x}_{k}-{\stackrel{-}{x}}_{j}\right)$ or $Rz=\left({x}_{k}-{\stackrel{-}{x}}_{j}\right)$ as appropriate.
A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis squared distance between the $i$th and $j$th groups is:
 $Dij2=x-i-x-jTSj-1x-i-x-j$
or
 $Dij2=x-i-x-jTS-1x-i-x-j.$
Note:  ${D}_{jj}^{2}=0$ and that in the case when the pooled variance-covariance matrix is used ${D}_{ij}^{2}={D}_{ji}^{2}$ so in this case only the lower triangular values of ${D}_{ij}^{2}$, $i>j$, are computed.

## References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{equal}$ – string (length ≥ 1)
Indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
${\mathbf{equal}}=\text{'E'}$
The within-group variance-covariance matrices are assumed equal and the matrix $R$ stored in the first $p\left(p+1\right)/2$ elements of gc is used.
${\mathbf{equal}}=\text{'U'}$
The within-group variance-covariance matrices are assumed to be unequal and the matrices ${R}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{n}_{g}$, stored in the remainder of gc are used.
Constraint: ${\mathbf{equal}}=\text{'E'}$ or $\text{'U'}$.
2:     $\mathrm{mode}$ – string (length ≥ 1)
Indicates whether distances from sample points are to be calculated or distances between the group means.
${\mathbf{mode}}=\text{'S'}$
The distances between the sample points given in x and the group means are calculated.
${\mathbf{mode}}=\text{'M'}$
The distances between the group means will be calculated.
Constraint: ${\mathbf{mode}}=\text{'M'}$ or $\text{'S'}$.
3:     $\mathrm{gmn}\left(\mathit{ldgmn},{\mathbf{nvar}}\right)$ – double array
ldgmn, the first dimension of the array, must satisfy the constraint $\mathit{ldgmn}\ge {\mathbf{ng}}$.
The $\mathit{j}$th row of gmn contains the means of the $p$ selected variables for the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$. These are returned by nag_mv_discrim (g03da).
4:     $\mathrm{gc}\left(\left({\mathbf{ng}}+1\right)×{\mathbf{nvar}}×\left({\mathbf{nvar}}+1\right)/2\right)$ – double array
The first $p\left(p+1\right)/2$ elements of gc should contain the upper triangular matrix $R$ and the next ${n}_{g}$ blocks of $p\left(p+1\right)/2$ elements should contain the upper triangular matrices ${R}_{j}$. All matrices must be stored packed by column. These matrices are returned by nag_mv_discrim (g03da). If ${\mathbf{equal}}=\text{'E'}$ only the first $p\left(p+1\right)/2$ elements are referenced, if ${\mathbf{equal}}=\text{'U'}$ only the elements $p\left(p+1\right)/2+1$ to $\left({n}_{g}+1\right)p\left(p+1\right)/2$ are referenced.
Constraints:
• if ${\mathbf{equal}}=\text{'E'}$, $R\ne 0.0$;
• if ${\mathbf{equal}}=\text{'U'}$, the diagonal elements of the ${R}_{\mathit{j}}\ne 0.0$, for $\mathit{j}=1,2,\dots ,{\mathbf{ng}}$.
5:     $\mathrm{nobs}$int64int32nag_int scalar
If ${\mathbf{mode}}=\text{'S'}$, the number of sample points in x for which distances are to be calculated.
If ${\mathbf{mode}}=\text{'M'}$, nobs is not referenced.
Constraint: if ${\mathbf{nobs}}\ge 1$, ${\mathbf{mode}}=\text{'S'}$.
6:     $\mathrm{isx}\left(:\right)$int64int32nag_int array
The dimension of the array isx must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}\right)$
If ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{isx}}\left(\mathit{l}\right)$ indicates if the $\mathit{l}$th variable in x is to be included in the distance calculations. If ${\mathbf{isx}}\left(\mathit{l}\right)>0$ the $\mathit{l}$th variable is included, for $\mathit{l}=1,2,\dots ,{\mathbf{m}}$; otherwise the $\mathit{l}$th variable is not referenced.
If ${\mathbf{mode}}=\text{'M'}$, isx is not referenced.
Constraint: if ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{isx}}\left(l\right)>0$ for nvar values of $l$.
7:     $\mathrm{x}\left(\mathit{ldx},:\right)$ – double array
The first dimension, $\mathit{ldx}$, of the array x must satisfy
• if ${\mathbf{mode}}=\text{'S'}$, $\mathit{ldx}\ge {\mathbf{nobs}}$;
• otherwise $1$.
The second dimension of the array x must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}\right)$.
If ${\mathbf{mode}}=\text{'S'}$ the $\mathit{k}$th row of x must contain ${x}_{\mathit{k}}$. That is ${\mathbf{x}}\left(\mathit{k},\mathit{l}\right)$ must contain the $\mathit{k}$th sample value for the $\mathit{l}$th variable, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{l}=1,2,\dots ,{\mathbf{m}}$. Otherwise x is not referenced.

### Optional Input Parameters

1:     $\mathrm{nvar}$int64int32nag_int scalar
Default: the second dimension of the array gmn.
$p$, the number of variables in the variance-covariance matrices as specified to nag_mv_discrim (g03da).
Constraint: ${\mathbf{nvar}}\ge 1$.
2:     $\mathrm{ng}$int64int32nag_int scalar
Default: the first dimension of the array gmn.
The number of groups, ${n}_{g}$.
Constraint: ${\mathbf{ng}}\ge 2$.
3:     $\mathrm{m}$int64int32nag_int scalar
Default: the dimension of the arrays isx, x.
If ${\mathbf{mode}}=\text{'S'}$, the number of variables in the data array x.
If ${\mathbf{mode}}=\text{'M'}$, m is not referenced.
Constraint: if ${\mathbf{m}}\ge {\mathbf{nvar}}$, ${\mathbf{mode}}=\text{'S'}$.

### Output Parameters

1:     $\mathrm{d}\left(\mathit{ldd},{\mathbf{ng}}\right)$ – double array
The squared distances.
If ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{d}}\left(\mathit{k},\mathit{j}\right)$ contains the squared distance of the $\mathit{k}$th sample point from the $\mathit{j}$th group mean, ${D}_{\mathit{k}\mathit{j}}^{2}$, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{j}=1,2,\dots ,{n}_{g}$.
If ${\mathbf{mode}}=\text{'M'}$ and ${\mathbf{equal}}=\text{'U'}$, ${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ contains the squared distance between the $\mathit{i}$th mean and the $\mathit{j}$th mean, ${D}_{\mathit{i}\mathit{j}}^{2}$, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=1,2,\dots ,\mathit{i}-1,\mathit{i}+1,\dots ,{n}_{g}$. The elements ${\mathbf{d}}\left(\mathit{i},\mathit{i}\right)$ are not referenced, for $\mathit{i}=1,2,\dots ,{n}_{g}$.
If ${\mathbf{mode}}=\text{'M'}$ and ${\mathbf{equal}}=\text{'E'}$, ${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ contains the squared distance between the $\mathit{i}$th mean and the $\mathit{j}$th mean, ${D}_{\mathit{i}\mathit{j}}^{2}$, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=1,2,\dots ,\mathit{i}-1$. Since ${D}_{\mathit{i}\mathit{j}}={D}_{\mathit{j}\mathit{i}}$ the elements ${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ are not referenced, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=\mathit{i}+1,\dots ,{n}_{g}$.
2:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{nvar}}<1$, or ${\mathbf{ng}}<2$, or $\mathit{ldgmn}<{\mathbf{ng}}$, or ${\mathbf{mode}}=\text{'S'}$ and ${\mathbf{nobs}}<1$, or ${\mathbf{mode}}=\text{'S'}$ and ${\mathbf{m}}<{\mathbf{nvar}}$, or ${\mathbf{mode}}=\text{'S'}$ and $\mathit{ldx}<{\mathbf{nobs}}$, or ${\mathbf{mode}}=\text{'S'}$ and $\mathit{ldd}<{\mathbf{nobs}}$, or ${\mathbf{mode}}=\text{'M'}$ and $\mathit{ldd}<{\mathbf{ng}}$, or ${\mathbf{equal}}\ne \text{'E'}$ or ‘U’, or ${\mathbf{mode}}\ne \text{'M'}$ or ‘S’.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{mode}}=\text{'S'}$ and the number of variables indicated by isx is not equal to nvar, or ${\mathbf{equal}}=\text{'E'}$ and a diagonal element of $R$ is zero, or ${\mathbf{equal}}=\text{'U'}$ and a diagonal element of ${R}_{j}$ for some $j$ is zero.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

The accuracy will depend upon the accuracy of the input $R$ or ${R}_{j}$ matrices.

If the distances are to be used for discrimination, see also nag_mv_discrim_group (g03dc).

## Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of $21$ patients are input and the group means and $R$ matrices are computed by nag_mv_discrim (g03da). A further six observations of unknown type are input, and the distances from the group means of the $21$ patients of known type are computed under the assumption that the within-group variance-covariance matrices are not equal. These results are printed and indicate that the first four are close to one of the groups while observations $5$ and $6$ are some distance from any group.
```function g03db_example

fprintf('g03db example results\n\n');

x = [1.1314,  2.4596;
1.0986,  0.2624;
0.6419, -2.3026;
1.3350, -3.2189;
1.4110,  0.0953;
0.6419, -0.9163;
2.1163,  0.0000;
1.3350, -1.6094;
1.3610, -0.5108;
2.0541,  0.1823;
2.2083, -0.5108;
2.7344,  1.2809;
2.0412,  0.4700;
1.8718, -0.9163;
1.7405, -0.9163;
2.6101,  0.4700;
2.3224,  1.8563;
2.2192,  2.0669;
2.2618,  1.1314;
3.9853,  0.9163;
2.7600,  2.0281];
[n,m] = size(x);
isx  = ones(m,1,'int64');
nvar = int64(m);
ing  = ones(n,1,'int64');
ing(7:16) = int64(2);
ing(17:n) = int64(3);
ng        = int64(3);

% Compute covariance matrix
[nig, gmean, det, gc, stat, df, sig, ifail] = ...
g03da( ...
x, isx, nvar, ing, ng);

equal = 'U';
mode = 'Sample points';
nobs = int64(6);

% Data from which to compute distances
x = [1.6292, -0.9163;
2.5572, 1.6094;
2.5649, -0.2231;
0.9555, -2.3026;
3.4012, -2.3026;
3.0204, -0.2231];

% Compute distances
[d, ifail] = g03db( ...
equal, mode, gmean, gc, nobs, isx, x);

mtitle = 'Distances';
matrix = 'General';
diag   = ' ';
[ifail] = x04ca( ...
matrix, diag, d, mtitle);

```
```g03db example results

Distances
1          2          3
1      3.3393     0.7521    50.9283
2     20.7771     5.6559     0.0597
3     21.3631     4.8411    19.4978
4      0.7184     6.2803   124.7323
5     55.0003    88.8604    71.7852
6     36.1703    15.7849    15.7489
```