# NAG Library Function Document

## 1Purpose

nag_mv_discrim_mahaldist (g03dbc) computes Mahalanobis squared distances for group or pooled variance-covariance matrices. It is intended for use after nag_mv_discrim (g03dac).

## 2Specification

 #include #include
 void nag_mv_discrim_mahaldist (Nag_GroupCovars equal, Nag_MahalDist mode, Integer nvar, Integer ng, const double gmean[], Integer tdg, const double gc[], Integer nobs, Integer m, const Integer isx[], const double x[], Integer tdx, double d[], Integer tdd, NagError *fail)

## 3Description

Consider $p$ variables observed on ${n}_{g}$ populations or groups. Let ${\stackrel{-}{x}}_{j}$ be the sample mean and ${S}_{j}$ the within-group variance-covariance matrix for the $j$th group and let ${x}_{k}$ be the $k$th sample point in a dataset. A measure of the distance of the point from the $j$th population or group is given by the Mahalanobis distance, ${D}_{kj}^{2}$:
 $D kj 2 = x k - x - j T S j -1 x k - x - j .$
If the pooled estimated of the variance-covariance matrix $S$ is used rather than the within-group variance-covariance matrices, then the distance is:
 $D kj 2 = x k - x - j T S -1 x k - x - j .$
Instead of using the variance-covariance matrices $S$ and ${S}_{j}$, nag_mv_discrim_mahaldist (g03dbc) uses the upper triangular matrices $R$ and ${R}_{j}$ supplied by nag_mv_discrim (g03dac) such that $S={R}^{\mathrm{T}}R$ and ${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. ${D}_{kj}^{2}$ can then be calculated as ${z}^{\mathrm{T}}z$ where ${R}_{j}z=\left({x}_{k}-{\stackrel{-}{x}}_{j}\right)$ or $Rz=\left({x}_{k}-{\stackrel{-}{x}}_{j}\right)$ as appropriate.
A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis distance between the $i$th and $j$th groups is:
 $D ij 2 = x - i - x - j T S j -1 x - i - x - j$
or
 $D ij 2 = x - i - x - j T S -1 x - i - x - j .$
Note: ${D}_{jj}^{2}=0$ and that in the case when the pooled variance-covariance matrix is used ${D}_{ij}^{2}={D}_{ji}^{2}$ so in this case only the lower triangular values of ${D}_{ij}^{2}$, $i>j$, are computed.
Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

## 5Arguments

1:    $\mathbf{equal}$Nag_GroupCovarsInput
On entry: indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$
The within-group variance-covariance matrices are assumed equal and the matrix $R$ stored in the first $p\left(p+1\right)/2$ elements of gc is used.
${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$
The within-group variance-covariance matrices are assumed to be unequal and the matrices ${R}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{n}_{g}$, stored in the remainder of gc are used.
Constraint: ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$ or $\mathrm{Nag_NotEqualCovar}$.
2:    $\mathbf{mode}$Nag_MahalDistInput
On entry: indicates whether distances from sample points are to be calculated or distances between the group means.
${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$
The distances between the sample points given in x and the group means are calculated.
${\mathbf{mode}}=\mathrm{Nag_GroupMeans}$
The distances between the group means will be calculated.
Constraint: ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$ or $\mathrm{Nag_GroupMeans}$.
3:    $\mathbf{nvar}$IntegerInput
On entry: the number of variables, $p$, in the variance-covariance matrices as specified to nag_mv_discrim (g03dac).
Constraint: ${\mathbf{nvar}}\ge 1$.
4:    $\mathbf{ng}$IntegerInput
On entry: the number of groups, ${n}_{g}$.
Constraint: ${\mathbf{ng}}\ge 2$.
5:    $\mathbf{gmean}\left[{\mathbf{ng}}×{\mathbf{tdg}}\right]$const doubleInput
Note: the $\left(i,j\right)$th element of the matrix is stored in ${\mathbf{gmean}}\left[\left(i-1\right)×{\mathbf{tdg}}+j-1\right]$.
On entry: the $\mathit{j}$th row of gmean contains the means of the $p$ selected variables for the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$. These are returned by nag_mv_discrim (g03dac).
6:    $\mathbf{tdg}$IntegerInput
On entry: the stride separating matrix column elements in the array gmean.
Constraint: ${\mathbf{tdg}}\ge {\mathbf{nvar}}$.
7:    $\mathbf{gc}\left[\mathit{dim}\right]$const doubleInput
Note: the dimension, dim, of the array gc must be at least $\left({\mathbf{ng}}+1\right)×{\mathbf{nvar}}×\left({\mathbf{nvar}}+1\right)/2$.
On entry: the first $p\left(p+1\right)/2$ elements of gc should contain the upper triangular matrix $R$ and the next ${n}_{g}$ blocks of $p\left(p+1\right)/2$ elements should contain the upper triangular matrices ${R}_{j}$. All matrices must be stored packed by column. These matrices are returned by nag_mv_discrim (g03dac).
If ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$ only the first $p\left(p+1\right)/2$ elements are referenced.
If ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$ only the elements $p\left(p+1\right)/2$ to $\left({n}_{g}+1\right)p\left(p+1\right)/2-1$ are referenced.
Constraints:
• if ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$, the diagonal elements of $R\ne 0.0$;
• if ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$, the diagonal elements of the ${R}_{j}\ne 0.0$, for $\mathit{j}=1,2,\dots ,{\mathbf{ng}}$.
8:    $\mathbf{nobs}$IntegerInput
On entry: if ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$ the number of sample points in x for which distances are to be calculated.
If ${\mathbf{mode}}=\mathrm{Nag_GroupMeans}$, nobs is not referenced.
Constraint: if ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$, ${\mathbf{nobs}}\ge 1$.
9:    $\mathbf{m}$IntegerInput
On entry: if ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$ the number of variables in the data array x.
If ${\mathbf{mode}}=\mathrm{Nag_GroupMeans}$, then m is not referenced.
Constraint: if ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$, ${\mathbf{m}}\ge {\mathbf{nvar}}$.
10:  $\mathbf{isx}\left[{\mathbf{m}}\right]$const IntegerInput
On entry: if ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$, ${\mathbf{isx}}\left[\mathit{l}-1\right]$ indicates if the $\mathit{l}$th variable in x is to be included in the distance calculations. If ${\mathbf{isx}}\left[\mathit{l}-1\right]>0$, the $\mathit{l}$th variable is included, for $\mathit{l}=1,2,\dots ,{\mathbf{m}}$; otherwise the $l$th variable is not referenced.
If ${\mathbf{mode}}=\mathrm{Nag_GroupMeans}$, then isx is not referenced and may be set to the NULL pointer (Integer *)0.
Constraint: if ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$, ${\mathbf{isx}}\left[l-1\right]>0$ for nvar values of $l$.
11:  $\mathbf{x}\left[{\mathbf{nobs}}×{\mathbf{tdx}}\right]$const doubleInput
On entry: if ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$, the $k$th row of x must contain ${x}_{k}$. That is, ${\mathbf{x}}\left[\left(k-1\right)×{\mathbf{tdx}}+l-1\right]$ must contain the $k$th sample value for the $l$th variable for $k=1,2,\dots ,{\mathbf{nobs}}$ and $l=1,2,\dots ,{\mathbf{m}}$. Otherwise x is not referenced and may be set to the NULL pointer (double *)0.
12:  $\mathbf{tdx}$IntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}\right)$.
13:  $\mathbf{d}\left[\mathit{dim1}×{\mathbf{tdd}}\right]$doubleOutput
On exit: the squared distances.
If ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$, ${\mathbf{d}}\left[\left(\mathit{k}-1\right)×{\mathbf{tdd}}+\mathit{j}-1\right]$ contains the squared distance of the $\mathit{k}$th sample point from the $\mathit{j}$th group mean, ${D}_{\mathit{k}\mathit{j}}^{2}$, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{j}=1,2,\dots ,{n}_{g}$.
If ${\mathbf{mode}}=\mathrm{Nag_GroupMeans}$ and ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$, ${\mathbf{d}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdd}}+\mathit{j}-1\right]$ contains the squared distance between the $\mathit{i}$th mean and the $\mathit{j}$th mean, ${D}_{\mathit{i}\mathit{j}}^{2}$, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=1,2,\dots ,\mathit{i}-1,\mathit{i}+1,\dots ,{n}_{g}$. The elements ${\mathbf{d}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdd}}+\mathit{i}-1\right]$ are not referenced, for $\mathit{i}=1,2,\dots ,{n}_{g}$.
If ${\mathbf{mode}}=\mathrm{Nag_GroupMeans}$ and ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$, ${\mathbf{d}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdd}}+\mathit{j}-1\right]$ contains the squared distance between the $\mathit{i}$th mean and the $\mathit{j}$th mean, ${D}_{\mathit{i}\mathit{j}}^{2}$, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=1,2,\dots ,\mathit{i}-1$. Since ${D}_{\mathit{i}\mathit{j}}={D}_{\mathit{j}\mathit{i}}$ the elements ${\mathbf{d}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdd}}+\mathit{j}-1\right]$ are not referenced, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=\mathit{i},\dots ,{n}_{g}$.
14:  $\mathbf{tdd}$IntegerInput
On entry: the stride separating matrix column elements in the array d.
Constraint: ${\mathbf{tdd}}\ge {\mathbf{ng}}$.
15:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

## 6Error Indicators and Warnings

NE_2_INT_ARG_ENUM_CONS
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$ while ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$ and ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$. These arguments must satisfy ${\mathbf{m}}\ge {\mathbf{nvar}}$ when ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$.
On entry, ${\mathbf{tdx}}=〈\mathit{\text{value}}〉$ while ${\mathbf{m}}=〈\mathit{\text{value}}〉$ and ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$. These arguments must satisfy ${\mathbf{tdx}}\ge \mathrm{max}$(1,m) when ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$.
NE_2_INT_ARG_LT
On entry, ${\mathbf{tdd}}=〈\mathit{\text{value}}〉$ while ${\mathbf{ng}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdd}}\ge {\mathbf{ng}}$.
On entry, ${\mathbf{tdg}}=〈\mathit{\text{value}}〉$ while ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdg}}\ge {\mathbf{nvar}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument equal had an illegal value.
On entry, argument mode had an illegal value.
NE_DIAG_0_COND
A diagonal element of $R$ is zero when ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$.
NE_DIAG_0_J_COND
A diagonal element of $R$ is zero for some $j$, when ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$.
NE_INT_ARG_ENUM_CONS
On entry, ${\mathbf{nobs}}=〈\mathit{\text{value}}〉$ while ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$. These arguments must satisfy ${\mathbf{nobs}}\ge 1$ when ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$.
NE_INT_ARG_LT
On entry, ${\mathbf{ng}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ng}}\ge 2$.
On entry, ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nvar}}\ge 1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_VAR_INCL_COND
The number of variables, nvar in the analysis $\text{}=〈\mathit{\text{value}}〉$, while number of variables included in the analysis via array ${\mathbf{isx}}=〈\mathit{\text{value}}〉$.
Constraint: These two numbers must be the same when ${\mathbf{mode}}=\mathrm{Nag_SamplePoints}$.

## 7Accuracy

The accuracy will depend upon the accuracy of the input $R$ or ${R}_{j}$ matrices.

## 8Parallelism and Performance

nag_mv_discrim_mahaldist (g03dbc) is not threaded in any implementation.

If the distances are to be used for discrimination, see also nag_mv_discrim_group (g03dcc).

## 10Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of 21 patients are input and the group means and $R$ matrices are computed by nag_mv_discrim (g03dac). A further six observations of unknown type are input, and the distances from the group means of the 21 patients of known type are computed under the assumption that the within-group variance-covariance matrices are not equal. These results are printed and indicate that the first four are close to one of the groups while observations 5 and 6 are some distance from any group.

### 10.1Program Text

Program Text (g03dbce.c)

### 10.2Program Data

Program Data (g03dbce.d)

### 10.3Program Results

Program Results (g03dbce.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017