# NAG FL Interfaceg03dbf (discrim_​mahal)

## ▸▿ Contents

Settings help

FL Name Style:

FL Specification Language:

## 1Purpose

g03dbf computes Mahalanobis squared distances for group or pooled variance-covariance matrices. It is intended for use after g03daf.

## 2Specification

Fortran Interface
 Subroutine g03dbf ( mode, nvar, ng, gmn, gc, nobs, m, isx, x, ldx, d, ldd, wk,
 Integer, Intent (In) :: nvar, ng, ldgmn, nobs, m, isx(*), ldx, ldd Integer, Intent (Inout) :: ifail Real (Kind=nag_wp), Intent (In) :: gmn(ldgmn,nvar), gc((ng+1)*nvar*(nvar+1)/2), x(ldx,*) Real (Kind=nag_wp), Intent (Inout) :: d(ldd,ng) Real (Kind=nag_wp), Intent (Out) :: wk(2*nvar) Character (1), Intent (In) :: equal, mode
#include <nag.h>
 void g03dbf_ (const char *equal, const char *mode, const Integer *nvar, const Integer *ng, const double gmn[], const Integer *ldgmn, const double gc[], const Integer *nobs, const Integer *m, const Integer isx[], const double x[], const Integer *ldx, double d[], const Integer *ldd, double wk[], Integer *ifail, const Charlen length_equal, const Charlen length_mode)
The routine may be called by the names g03dbf or nagf_mv_discrim_mahal.

## 3Description

Consider $p$ variables observed on ${n}_{g}$ populations or groups. Let ${\overline{x}}_{j}$ be the sample mean and ${S}_{j}$ the within-group variance-covariance matrix for the $j$th group and let ${x}_{k}$ be the $k$th sample point in a dataset. A measure of the distance of the point from the $j$th population or group is given by the Mahalanobis distance, ${D}_{kj}$:
 $Dkj2= (xk-x¯j) TSj-1(xk-x¯j).$
If the pooled estimated of the variance-covariance matrix $S$ is used rather than the within-group variance-covariance matrices, then the distance is:
 $Dkj2= (xk-x¯j) TS-1(xk-x¯j).$
Instead of using the variance-covariance matrices $S$ and ${S}_{j}$, g03dbf uses the upper triangular matrices $R$ and ${R}_{j}$ supplied by g03daf such that $S={R}^{\mathrm{T}}R$ and ${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. ${D}_{kj}^{2}$ can then be calculated as ${z}^{\mathrm{T}}z$ where ${R}_{j}z=\left({x}_{k}-{\overline{x}}_{j}\right)$ or $Rz=\left({x}_{k}-{\overline{x}}_{j}\right)$ as appropriate.
A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis squared distance between the $i$th and $j$th groups is:
 $Dij2= (x¯i-x¯j) TSj-1(x¯i-x¯j)$
or
 $Dij2= (x¯i-x¯j) TS-1(x¯i-x¯j).$
Note:  ${D}_{jj}^{2}=0$ and that in the case when the pooled variance-covariance matrix is used ${D}_{ij}^{2}={D}_{ji}^{2}$ so in this case only the lower triangular values of ${D}_{ij}^{2}$, $i>j$, are computed.
Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

## 5Arguments

1: $\mathbf{equal}$Character(1) Input
On entry: indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
${\mathbf{equal}}=\text{'E'}$
The within-group variance-covariance matrices are assumed equal and the matrix $R$ stored in the first $p\left(p+1\right)/2$ elements of gc is used.
${\mathbf{equal}}=\text{'U'}$
The within-group variance-covariance matrices are assumed to be unequal and the matrices ${R}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{n}_{g}$, stored in the remainder of gc are used.
Constraint: ${\mathbf{equal}}=\text{'E'}$ or $\text{'U'}$.
2: $\mathbf{mode}$Character(1) Input
On entry: indicates whether distances from sample points are to be calculated or distances between the group means.
${\mathbf{mode}}=\text{'S'}$
The distances between the sample points given in x and the group means are calculated.
${\mathbf{mode}}=\text{'M'}$
The distances between the group means will be calculated.
Constraint: ${\mathbf{mode}}=\text{'M'}$ or $\text{'S'}$.
3: $\mathbf{nvar}$Integer Input
On entry: $p$, the number of variables in the variance-covariance matrices as specified to g03daf.
Constraint: ${\mathbf{nvar}}\ge 1$.
4: $\mathbf{ng}$Integer Input
On entry: the number of groups, ${n}_{g}$.
Constraint: ${\mathbf{ng}}\ge 2$.
5: $\mathbf{gmn}\left({\mathbf{ldgmn}},{\mathbf{nvar}}\right)$Real (Kind=nag_wp) array Input
On entry: the $\mathit{j}$th row of gmn contains the means of the $p$ selected variables for the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$. These are returned by g03daf.
6: $\mathbf{ldgmn}$Integer Input
On entry: the first dimension of the array gmn as declared in the (sub)program from which g03dbf is called.
Constraint: ${\mathbf{ldgmn}}\ge {\mathbf{ng}}$.
7: $\mathbf{gc}\left(\left({\mathbf{ng}}+1\right)×{\mathbf{nvar}}×\left({\mathbf{nvar}}+1\right)/2\right)$Real (Kind=nag_wp) array Input
On entry: the first $p\left(p+1\right)/2$ elements of gc should contain the upper triangular matrix $R$ and the next ${n}_{g}$ blocks of $p\left(p+1\right)/2$ elements should contain the upper triangular matrices ${R}_{j}$. All matrices must be stored packed by column. These matrices are returned by g03daf. If ${\mathbf{equal}}=\text{'E'}$ only the first $p\left(p+1\right)/2$ elements are referenced, if ${\mathbf{equal}}=\text{'U'}$ only the elements $p\left(p+1\right)/2+1$ to $\left({n}_{g}+1\right)p\left(p+1\right)/2$ are referenced.
Constraints:
• if ${\mathbf{equal}}=\text{'E'}$, $R\ne 0.0$;
• if ${\mathbf{equal}}=\text{'U'}$, the diagonal elements of the ${R}_{\mathit{j}}\ne 0.0$, for $\mathit{j}=1,2,\dots ,{\mathbf{ng}}$.
8: $\mathbf{nobs}$Integer Input
On entry: if ${\mathbf{mode}}=\text{'S'}$, the number of sample points in x for which distances are to be calculated.
If ${\mathbf{mode}}=\text{'M'}$, nobs is not referenced.
Constraint: if ${\mathbf{nobs}}\ge 1$, ${\mathbf{mode}}=\text{'S'}$.
9: $\mathbf{m}$Integer Input
On entry: if ${\mathbf{mode}}=\text{'S'}$, the number of variables in the data array x.
If ${\mathbf{mode}}=\text{'M'}$, m is not referenced.
Constraint: if ${\mathbf{m}}\ge {\mathbf{nvar}}$, ${\mathbf{mode}}=\text{'S'}$.
10: $\mathbf{isx}\left(*\right)$Integer array Input
Note: the dimension of the array isx must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}\right)$.
On entry: if ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{isx}}\left(\mathit{l}\right)$ indicates if the $\mathit{l}$th variable in x is to be included in the distance calculations. If ${\mathbf{isx}}\left(\mathit{l}\right)>0$ the $\mathit{l}$th variable is included, for $\mathit{l}=1,2,\dots ,{\mathbf{m}}$; otherwise the $\mathit{l}$th variable is not referenced.
If ${\mathbf{mode}}=\text{'M'}$, isx is not referenced.
Constraint: if ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{isx}}\left(l\right)>0$ for nvar values of $l$.
11: $\mathbf{x}\left({\mathbf{ldx}},*\right)$Real (Kind=nag_wp) array Input
Note: the second dimension of the array x must be at least $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{m}}\right)$.
On entry: if ${\mathbf{mode}}=\text{'S'}$ the $\mathit{k}$th row of x must contain ${x}_{\mathit{k}}$. That is ${\mathbf{x}}\left(\mathit{k},\mathit{l}\right)$ must contain the $\mathit{k}$th sample value for the $\mathit{l}$th variable, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{l}=1,2,\dots ,{\mathbf{m}}$. Otherwise x is not referenced.
12: $\mathbf{ldx}$Integer Input
On entry: the first dimension of the array x as declared in the (sub)program from which g03dbf is called.
Constraints:
• if ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{ldx}}\ge {\mathbf{nobs}}$;
• otherwise ${\mathbf{ldx}}\ge 1$.
13: $\mathbf{d}\left({\mathbf{ldd}},{\mathbf{ng}}\right)$Real (Kind=nag_wp) array Output
On exit: the squared distances.
If ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{d}}\left(\mathit{k},\mathit{j}\right)$ contains the squared distance of the $\mathit{k}$th sample point from the $\mathit{j}$th group mean, ${D}_{\mathit{k}\mathit{j}}^{2}$, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{j}=1,2,\dots ,{n}_{g}$.
If ${\mathbf{mode}}=\text{'M'}$ and ${\mathbf{equal}}=\text{'U'}$, ${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ contains the squared distance between the $\mathit{i}$th mean and the $\mathit{j}$th mean, ${D}_{\mathit{i}\mathit{j}}^{2}$, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=1,2,\dots ,\mathit{i}-1,\mathit{i}+1,\dots ,{n}_{g}$. The elements ${\mathbf{d}}\left(\mathit{i},\mathit{i}\right)$ are not referenced, for $\mathit{i}=1,2,\dots ,{n}_{g}$.
If ${\mathbf{mode}}=\text{'M'}$ and ${\mathbf{equal}}=\text{'E'}$, ${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ contains the squared distance between the $\mathit{i}$th mean and the $\mathit{j}$th mean, ${D}_{\mathit{i}\mathit{j}}^{2}$, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=1,2,\dots ,\mathit{i}-1$. Since ${D}_{\mathit{i}\mathit{j}}={D}_{\mathit{j}\mathit{i}}$ the elements ${\mathbf{d}}\left(\mathit{i},\mathit{j}\right)$ are not referenced, for $\mathit{i}=1,2,\dots ,{n}_{g}$ and $\mathit{j}=\mathit{i}+1,\dots ,{n}_{g}$.
14: $\mathbf{ldd}$Integer Input
On entry: the first dimension of the array d as declared in the (sub)program from which g03dbf is called.
Constraints:
• if ${\mathbf{mode}}=\text{'S'}$, ${\mathbf{ldd}}\ge {\mathbf{nobs}}$;
• if ${\mathbf{mode}}=\text{'M'}$, ${\mathbf{ldd}}\ge {\mathbf{ng}}$.
15: $\mathbf{wk}\left(2×{\mathbf{nvar}}\right)$Real (Kind=nag_wp) array Workspace
16: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{equal}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{equal}}=\text{'E'}$ or $\text{'U'}$.
On entry, ${\mathbf{ldd}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ldd}}\ge {\mathbf{ng}}$.
On entry, ${\mathbf{ldd}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{nobs}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ldd}}\ge {\mathbf{nobs}}$.
On entry, ${\mathbf{ldgmn}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ldgmn}}\ge {\mathbf{ng}}$.
On entry, ${\mathbf{ldx}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{nobs}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{nobs}}$.
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}\ge {\mathbf{nvar}}$.
On entry, ${\mathbf{mode}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{mode}}=\text{'M'}$ or $\text{'S'}$.
On entry, ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ng}}\ge 2$.
On entry, ${\mathbf{nobs}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nobs}}\ge 1$.
On entry, ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nvar}}\ge 1$.
${\mathbf{ifail}}=2$
On entry, diagonal element $⟨\mathit{\text{value}}⟩$ of $R=0$.
On entry, diagonal element $⟨\mathit{\text{value}}⟩$ of ${R}_{j}=0$ for $j=⟨\mathit{\text{value}}⟩$.
On entry, ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$ and $⟨\mathit{\text{value}}⟩$ values of ${\mathbf{isx}}>0$.
Constraint: exactly nvar elements of ${\mathbf{isx}}>0$.
${\mathbf{ifail}}=-99$
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

The accuracy will depend upon the accuracy of the input $R$ or ${R}_{j}$ matrices.

## 8Parallelism and Performance

g03dbf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

If the distances are to be used for discrimination, see also g03dcf.

## 10Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of $21$ patients are input and the group means and $R$ matrices are computed by g03daf. A further six observations of unknown type are input, and the distances from the group means of the $21$ patients of known type are computed under the assumption that the within-group variance-covariance matrices are not equal. These results are printed and indicate that the first four are close to one of the groups while observations $5$ and $6$ are some distance from any group.

### 10.1Program Text

Program Text (g03dbfe.f90)

### 10.2Program Data

Program Data (g03dbfe.d)

### 10.3Program Results

Program Results (g03dbfe.r)