naginterfaces.library.mv.discrim_mahal¶

naginterfaces.library.mv.discrim_mahal(equal, mode, gmn, gc, nobs, isx, x)[source]¶

discrim_mahal computes Mahalanobis squared distances for group or pooled variance-covariance matrices. It is intended for use after discrim().

For full information please refer to the NAG Library document for g03db

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g03/g03dbf.html

Parameters

equalstr, length 1

Indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.

$e q u a l ='E'$

The within-group variance-covariance matrices are assumed equal and the matrix $R$ stored in the first $p (p + 1) / 2$ elements of $g c$ is used.

$e q u a l ='U'$

The within-group variance-covariance matrices are assumed to be unequal and the matrices $R_{j}$ , for $j = 1, 2, \dots, n_{g}$ , stored in the remainder of $g c$ are used.

modestr, length 1

Indicates whether distances from sample points are to be calculated or distances between the group means.

$m o d e ='S'$

The distances between the sample points given in $x$ and the group means are calculated.

$m o d e ='M'$

The distances between the group means will be calculated.

gmnfloat, array-like, shape $(ng, nvar)$

The $j$ th row of $g m n$ contains the means of the $p$ selected variables for the $j$ th group, for $j = 1, 2, \dots, n_{g}$ . These are returned by discrim().

gcfloat, array-like, shape $((ng + 1) \times nvar \times (nvar + 1) / 2)$

The first $p (p + 1) / 2$ elements of $g c$ should contain the upper triangular matrix $R$ and the next $n_{g}$ blocks of $p (p + 1) / 2$ elements should contain the upper triangular matrices $R_{j}$ . All matrices must be stored packed by column. These matrices are returned by discrim(). If $e q u a l ='E'$ only the first $p (p + 1) / 2$ elements are referenced, if $e q u a l ='U'$ only the elements $p (p + 1) / 2 + 1$ to $(n_{g} + 1) p (p + 1) / 2$ are referenced.

nobsint

If $m o d e ='S'$ , the number of sample points in $x$ for which distances are to be calculated.

If $m o d e ='M'$ , $n o b s$ is not referenced.

isxint, array-like, shape $(m)$

If $m o d e ='S'$ , $i s x [l - 1]$ indicates if the $l$ th variable in $x$ is to be included in the distance calculations. If $i s x [l - 1] > 0$ the $l$ th variable is included, for $l = 1, 2, \dots, m$ ; otherwise the $l$ th variable is not referenced.

If $m o d e ='M'$ , $i s x$ is not referenced.

xfloat, array-like, shape $(:, m)$

Note: the required extent for this argument in dimension 1 is determined as follows: if $m o d e ='S'$ : $n o b s$ ; otherwise: $1$ .

If $m o d e ='S'$ the $k$ th row of $x$ must contain $x_{k}$ . That is $x [k - 1, l - 1]$ must contain the $k$ th sample value for the $l$ th variable, for $l = 1, 2, \dots, m$ , for $k = 1, 2, \dots, n o b s$ . Otherwise $x$ is not referenced.

Returns

dfloat, ndarray, shape $(:, ng)$

The squared distances.

If $m o d e ='S'$ , $d [k - 1, j - 1]$ contains the squared distance of the $k$ th sample point from the $j$ th group mean, $D_{k j}^{2}$ , for $j = 1, 2, \dots, n_{g}$ , for $k = 1, 2, \dots, n o b s$ .

If $m o d e ='M'$ and $e q u a l ='U'$ , $d [i - 1, j - 1]$ contains the squared distance between the $i$ th mean and the $j$ th mean, $D_{i j}^{2}$ , for $j = 1, 2, \dots, i - 1, i + 1, \dots, n_{g}$ , for $i = 1, 2, \dots, n_{g}$ .

The elements $d [i - 1, i - 1]$ are not referenced, for $i = 1, 2, \dots, n_{g}$ .

If $m o d e ='M'$ and $e q u a l ='E'$ , $d [i - 1, j - 1]$ contains the squared distance between the $i$ th mean and the $j$ th mean, $D_{i j}^{2}$ , for $j = 1, 2, \dots, i - 1$ , for $i = 1, 2, \dots, n_{g}$ .

Since $D_{i j} = D_{j i}$ the elements $d [i - 1, j - 1]$ are not referenced, for $j = i + 1, \dots, n_{g}$ , for $i = 1, 2, \dots, n_{g}$ .

Raises

NagValueError

(errno $1$ )

On entry, $m o d e = ⟨ v a l u e ⟩$ .

Constraint: $m o d e ='M'$ or $'S'$ .

(errno $1$ )

On entry, $e q u a l = ⟨ v a l u e ⟩$ .

Constraint: $e q u a l ='E'$ or $'U'$ .

(errno $1$ )

On entry, $m = ⟨ v a l u e ⟩$ and $nvar = ⟨ v a l u e ⟩$ .

Constraint: $m \geq nvar$ .

(errno $1$ )

On entry, $n o b s = ⟨ v a l u e ⟩$ .

Constraint: $n o b s \geq 1$ .

(errno $1$ )

On entry, $ng = ⟨ v a l u e ⟩$ .

Constraint: $ng \geq 2$ .

(errno $1$ )

On entry, $nvar = ⟨ v a l u e ⟩$ .

Constraint: $nvar \geq 1$ .

(errno $2$ )

On entry, $nvar = ⟨ v a l u e ⟩$ and $⟨ v a l u e ⟩$ values of $i s x > 0$ .

Constraint: exactly $nvar$ elements of $i s x > 0$ .

(errno $2$ )

On entry, diagonal element $⟨ v a l u e ⟩$ of $R_{j} = 0$ for $j = ⟨ v a l u e ⟩$ .

(errno $2$ )

On entry, diagonal element $⟨ v a l u e ⟩$ of $R = 0$ .

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Consider $p$ variables observed on $n_{g}$ populations or groups. Let ${¯ x}_{j}$ be the sample mean and $S_{j}$ the within-group variance-covariance matrix for the $j$ th group and let $x_{k}$ be the $k$ th sample point in a dataset. A measure of the distance of the point from the $j$ th population or group is given by the Mahalanobis distance, $D_{k j}$ :

D_{k j}^{2} = {(x_{k} - {¯ x}_{j})}_{k}^{T} S_{j}^{- 1} (x_{k} - {¯ x}_{j}) .

If the pooled estimated of the variance-covariance matrix $S$ is used rather than the within-group variance-covariance matrices, then the distance is:

D_{k j}^{2} = {(x_{k} - {¯ x}_{j})}_{k}^{T} S^{- 1} (x_{k} - {¯ x}_{j}) .

Instead of using the variance-covariance matrices $S$ and $S_{j}$ , discrim_mahal uses the upper triangular matrices $R$ and $R_{j}$ supplied by discrim() such that $S = R^{T} R$ and $S_{j} = R_{j}^{T} R_{j}$ . $D_{k j}^{2}$ can then be calculated as $z^{T} z$ where $R_{j} z = (x_{k} - {¯ x}_{j})$ or $R z = (x_{k} - {¯ x}_{j})$ as appropriate.

A particular case is when the distance between the group or population means is to be estimated. The Mahalanobis squared distance between the $i$ th and $j$ th groups is:

D_{i j}^{2} = {({¯ x}_{i} - {¯ x}_{j})}_{i}^{T} S_{j}^{- 1} ({¯ x}_{i} - {¯ x}_{j})

or

D_{i j}^{2} = {({¯ x}_{i} - {¯ x}_{j})}_{i}^{T} S^{- 1} ({¯ x}_{i} - {¯ x}_{j}) .

Note: $D_{j j}^{2} = 0$ and that in the case when the pooled variance-covariance matrix is used $D_{i j}^{2} = D_{j i}^{2}$ so in this case only the lower triangular values of $D_{i j}^{2}$ , $i > j$ , are computed.

References

Aitchison, J and Dunsmore, I R, 1975, Statistical Prediction Analysis, Cambridge

Kendall, M G and Stuart, A, 1976, The Advanced Theory of Statistics (Volume 3), (3rd Edition), Griffin

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press

NAG and Python

Return to Front

naginterfaces.library.mv.discrim_mahal¶

naginterfaces.library.mv.discrim_​mahal¶

naginterfaces.library.mv.discrim_mahal¶