# NAG CL Interfaceg03dcc (discrim_​group)

Settings help

CL Name Style:

## 1Purpose

g03dcc allocates observations to groups according to selected rules. It is intended for use after g03dac.

## 2Specification

 #include
 void g03dcc (Nag_DiscrimMethod type, Nag_GroupCovars equal, Nag_PriorProbability priors, Integer nvar, Integer ng, const Integer nig[], const double gmean[], Integer tdg, const double gc[], const double det[], Integer nobs, Integer m, const Integer isx[], const double x[], Integer tdx, double prior[], double p[], Integer tdp, Integer iag[], Nag_Boolean atiq, double ati[], NagError *fail)
The function may be called by the names: g03dcc or nag_mv_discrim_group.

## 3Description

Discriminant analysis is concerned with the allocation of observations to groups using information from other observations whose group membership is known, ${X}_{t}$; these are called the training set. Consider $p$ variables observed on ${n}_{g}$ populations or groups. Let ${\overline{x}}_{j}$ be the sample mean and ${S}_{j}$ the within-group variance-covariance matrix for the $j$th group; these are calculated from a training set of $n$ observations with ${n}_{j}$ observations in the $j$th group, and let ${x}_{k}$ be the $k$th observation from the set of observations to be allocated to the ${n}_{g}$ groups. The observation can be allocated to a group according to a selected rule. The allocation rule or discriminant function will be based on the distance of the observation from an estimate of the location of the groups, usually the group means. A measure of the distance of the observation from the $j$th group mean is given by the Mahalanobis distance, ${D}_{kj}^{2}$:
 $D kj 2 = ( x k - x ¯ j ) T S j −1 ( x k - x ¯ j ) .$ (1)
If the pooled estimate of the variance-covariance matrix $S$ is used rather than the within-group variance-covariance matrices, then the distance is:
 $D kj 2 = ( x k - x ¯ j ) T S −1 ( x k - x ¯ j ) .$ (2)
Instead of using the variance-covariance matrices $S$ and ${S}_{j}$, g03dcc uses the upper triangular matrices $R$ and ${R}_{j}$ supplied by g03dac such that $S={R}^{\mathrm{T}}R$ and ${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. ${D}_{kj}^{2}$ can then be calculated as ${z}^{\mathrm{T}}z$ where ${R}_{j}z=\left({x}_{k}-{\overline{x}}_{j}\right)$ or $Rz=\left({x}_{k}-{\overline{x}}_{j}\right)$ as appropriate.
In addition to the distances, a set of prior probabilities of group membership, ${\pi }_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,{n}_{g}$, may be used, with $\sum {\pi }_{j}=1$. The prior probabilities reflect your view as to the likelihood of the observations coming from the different groups. Two common cases for prior probabilities are ${\pi }_{1}={\pi }_{2}=\cdots ={\pi }_{{n}_{g}}$, that is, equal prior probabilities, and ${\pi }_{\mathit{j}}={n}_{\mathit{j}}/n$, for $\mathit{j}=1,2,\dots ,{n}_{g}$, that is, prior probabilities proportional to the number of observations in the groups in the training set.
g03dcc uses one of four allocation rules. In all four rules the $p$ variables are assumed to follow a multivariate Normal distribution with mean ${\mu }_{j}$ and variance-covariance matrix ${\Sigma }_{j}$ if the observation comes from the $j$th group. The different rules depend on whether or not the within-group variance-covariance matrices are assumed equal, i.e., ${\Sigma }_{1}={\Sigma }_{2}=\cdots ={\Sigma }_{{n}_{g}}$, and whether a predictive or estimative approach is used. If $p\left({x}_{k}\mid {\mu }_{j},{\Sigma }_{j}\right)$ is the probability of observing the observation ${x}_{k}$ from group $j$, then the posterior probability of belonging to group $j$ is:
 $p ( j ∣ x k μ j , Σ j ) ∝ p ( x k ∣ μ j , Σ j ) π j .$ (3)
In the estimative approach, the arguments ${\mu }_{j}$ and ${\Sigma }_{j}$ in (3) are replaced by their estimates calculated from ${X}_{t}$. In the predictive approach, a non-informative prior distribution is used for the arguments and a posterior distribution for the arguments, $p\left({\mu }_{j},{\Sigma }_{j}\mid {X}_{t}\right)$, is found. A predictive distribution is then obtained by integrating $p\left(j\mid {x}_{k},{\mu }_{j},{\Sigma }_{j}\right)p\left({\mu }_{j},{\Sigma }_{j}\mid X\right)$ over the argument space. This predictive distribution then replaces $p\left({x}_{k}\mid {\mu }_{j},{\Sigma }_{j}\right)$ in (3). See Aitchison and Dunsmore (1975), Aitchison et al. (1977) and Moran and Murphy (1979) for further details.
The observation is allocated to the group with the highest posterior probability. Denoting the posterior probabilities, $p\left(j\mid {x}_{k},{\mu }_{j},{\Sigma }_{j}\right)$, by ${q}_{j}$, the four allocation rules are:
1. (i)Estimative with equal variance-covariance matrices – Linear Discrimination.
 $log(q) j ∝ - 1 2 D kj 2 + log⁡π j$
2. (ii)Estimative with unequal variance-covariance matrices – Quadratic Discrimination.
 $log(q) j ∝ - 1 2 D kj 2 + log⁡π j - 1 2 log| S j |$
3. (iii)Predictive with equal variance-covariance matrices.
 $q j −1 ∝ (( n j +1)/ n j ) p/2 {1+[ n j /(( n-n g )( n j +1))] D kj 2 } (n+1- n g ) / 2$
4. (iv)Predictive with unequal variance-covariance matrices
 $q j −1 ∝ C {(( n j 2 -1)/ n j )| S j |} p/2 {1+( n j /( n j 2 -1)) D kj 2 } n j / 2$
where
 $C = Γ ( 1 2 ( n j -p)) Γ ( 1 2 n j )$
In the above the appropriate value of ${D}_{kj}^{2}$ from (1) or (2) is used. The values of the ${q}_{j}$ are standardized so that,
 $∑ j=1 n g q j = 1 .$
Moran and Murphy (1979) show the similarity between the predictive methods and methods based upon likelihood ratio tests.
In addition to allocating the observation to a group, g03dcc computes an atypicality index, ${I}_{j}\left({x}_{k}\right)$. This represents the probability of obtaining an observation more typical of group $j$ than the observed ${x}_{k}$ (see Aitchison and Dunsmore (1975) and Aitchison et al. (1977)). The atypicality index is computed as:
 $I j ( x k ) = P ( B ≤ z : 1 2 p , 1 2 ( n j -d) )$
where $P\left(B\le \beta :a,b\right)$ is the lower tail probability from a beta distribution where, for unequal within-group variance-covariance matrices,
 $z = D kj 2 / ( D kj 2 +( n j 2 -1)/ n j ) ,$
and for equal within-group variance-covariance matrices,
 $z = D kj 2 / ( D kj 2 +( n-n g )( n j -1)/ n j ) .$
If ${I}_{j}\left({x}_{k}\right)$ is close to 1 for all groups it indicates that the observation may come from a grouping not represented in the training set. Moran and Murphy (1979) provide a frequentist interpretation of ${I}_{j}\left({x}_{k}\right)$.

## 4References

Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Aitchison J, Habbema J D F and Kay J W (1977) A critical comparison of two methods of statistical discrimination Appl. Statist. 26 15–25
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Moran M A and Murphy B J (1979) A closer look at two alternative methods of statistical discrimination Appl. Statist. 28 223–232
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

## 5Arguments

1: $\mathbf{type}$Nag_DiscrimMethod Input
On entry: indicates whether the estimative or predictive approach is to be used.
${\mathbf{type}}=\mathrm{Nag_DiscrimEstimate}$
The estimative approach is used.
${\mathbf{type}}=\mathrm{Nag_DiscrimPredict}$
The predictive approach is used.
Constraint: ${\mathbf{type}}=\mathrm{Nag_DiscrimEstimate}$ or $\mathrm{Nag_DiscrimPredict}$.
2: $\mathbf{equal}$Nag_GroupCovars Input
On entry: indicates whether or not the within-group variance-covariance matrices are assumed to be equal and the pooled variance-covariance matrix used.
${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$
The within-group variance-covariance matrices are assumed equal and the matrix $R$ stored in the first $p\left(p+1\right)/2$ elements of gc is used.
${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$
The within-group variance-covariance matrices are assumed to be unequal and the matrices ${R}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,{n}_{g}$, stored in the remainder of gc are used.
Constraint: ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$ or $\mathrm{Nag_NotEqualCovar}$.
3: $\mathbf{priors}$Nag_PriorProbability Input
On entry: indicates the form of the prior probabilities to be used.
${\mathbf{priors}}=\mathrm{Nag_EqualPrior}$
Equal prior probabilities are used.
${\mathbf{priors}}=\mathrm{Nag_GroupSizePrior}$
Prior probabilities proportional to the group sizes in the training set, ${n}_{j}$, are used.
${\mathbf{priors}}=\mathrm{Nag_UserPrior}$
The prior probabilities are input in prior.
Constraint: ${\mathbf{priors}}=\mathrm{Nag_EqualPrior}$, $\mathrm{Nag_GroupSizePrior}$ or $\mathrm{Nag_UserPrior}$.
4: $\mathbf{nvar}$Integer Input
On entry: the number of variables, $p$, in the variance-covariance matrices as specified to g03dac.
Constraint: ${\mathbf{nvar}}\ge 1$.
5: $\mathbf{ng}$Integer Input
On entry: the number of groups, ${n}_{g}$.
Constraint: ${\mathbf{ng}}\ge 2$.
6: $\mathbf{nig}\left[{\mathbf{ng}}\right]$const Integer Input
On entry: the number of observations in each group training set, ${n}_{j}$.
Constraints:
• if ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$, ${\mathbf{nig}}\left[\mathit{j}-1\right]>0$ and ${\sum }_{\mathit{j}=1}^{{n}_{g}}{\mathbf{nig}}\left[\mathit{j}-1\right]>{\mathbf{ng}}+{\mathbf{nvar}}$, for $\mathit{j}=1,2,\dots ,{n}_{g}$;
• if ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$, ${\mathbf{nig}}\left[\mathit{j}-1\right]>{\mathbf{nvar}}$, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
7: $\mathbf{gmean}\left[{\mathbf{ng}}×{\mathbf{tdg}}\right]$const double Input
Note: the $\left(i,j\right)$th element of the matrix is stored in ${\mathbf{gmean}}\left[\left(i-1\right)×{\mathbf{tdg}}+j-1\right]$.
On entry: the $\mathit{j}$th row of gmean contains the means of the $p$ variables for the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{\mathit{j}}$. These are returned by g03dac.
8: $\mathbf{tdg}$Integer Input
On entry: the stride separating matrix column elements in the array gmean.
Constraint: ${\mathbf{tdg}}\ge {\mathbf{nvar}}$.
9: $\mathbf{gc}\left[\mathit{dim}\right]$const double Input
Note: the dimension, dim, of the array gc must be at least $\left({\mathbf{ng}}+1\right)×{\mathbf{nvar}}×\left({\mathbf{nvar}}+1\right)/2$.
On entry: the first $p\left(p+1\right)/2$ elements of gc should contain the upper triangular matrix $R$ and the next ${n}_{g}$ blocks of $p\left(p+1\right)/2$ elements should contain the upper triangular matrices ${R}_{j}$.
All matrices must be stored packed by column. These matrices are returned by g03dac. If ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$, only the first $p\left(p+1\right)/2$ elements are referenced, if ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$, only the elements $p\left(p+1\right)/2$ to $\left({n}_{g}+1\right)p\left(p+1\right)/2-1$ are referenced.
Constraints:
• if ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$, the diagonal elements of $R$ must be $\ne 0.0$;
• if ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$, the diagonal elements of the ${R}_{j}$ must be $\ne 0.0$, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
10: $\mathbf{det}\left[{\mathbf{ng}}\right]$const double Input
On entry: if ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$, the logarithms of the determinants of the within-group variance-covariance matrices as returned by g03dac. Otherwise det is not referenced.
11: $\mathbf{nobs}$Integer Input
On entry: the number of observations in x which are to be allocated.
Constraint: ${\mathbf{nobs}}\ge 1$.
12: $\mathbf{m}$Integer Input
On entry: the number of variables in the data array x.
Constraint: ${\mathbf{m}}\ge {\mathbf{nvar}}$.
13: $\mathbf{isx}\left[{\mathbf{m}}\right]$const Integer Input
On entry: ${\mathbf{isx}}\left[\mathit{l}-1\right]$ indicates if the $\mathit{l}$th variable in x is to be included in the distance calculations. If ${\mathbf{isx}}\left[\mathit{l}-1\right]>0$ the $\mathit{l}$th variable is included, for $\mathit{l}=1,2,\dots ,{\mathbf{m}}$; otherwise the $l$th variable is not referenced.
Constraint: ${\mathbf{isx}}\left[l-1\right]>0$ for nvar values of $l$.
14: $\mathbf{x}\left[{\mathbf{nobs}}×{\mathbf{tdx}}\right]$const double Input
On entry: ${\mathbf{x}}\left[\left(\mathit{k}-1\right)×{\mathbf{tdx}}+\mathit{l}-1\right]$ must contain the $\mathit{k}$th observation for the $\mathit{l}$th variable, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{l}=1,2,\dots ,{\mathbf{m}}$.
15: $\mathbf{tdx}$Integer Input
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
16: $\mathbf{prior}\left[{\mathbf{ng}}\right]$double Input/Output
On entry: if ${\mathbf{priors}}=\mathrm{Nag_UserPrior}$ the prior probabilities for the ${n}_{g}$ groups.
Constraint: if ${\mathbf{priors}}=\mathrm{Nag_UserPrior}$, ${\mathbf{prior}}\left[\mathit{j}-1\right]>0.0$ and , for $\mathit{j}=1,2,\dots ,{n}_{g}$.
On exit: if ${\mathbf{priors}}=\mathrm{Nag_GroupSizePrior}$, the computed prior probabilities in proportion to group sizes for the ${n}_{g}$ groups.
If ${\mathbf{priors}}=\mathrm{Nag_UserPrior}$, the input prior probabilities will be unchanged.
If ${\mathbf{priors}}=\mathrm{Nag_EqualPrior}$, prior is not set.
17: $\mathbf{p}\left[{\mathbf{nobs}}×{\mathbf{tdp}}\right]$double Output
On exit: ${\mathbf{p}}\left[\left(\mathit{k}-1\right)×{\mathbf{tdp}}+\mathit{j}-1\right]$ contains the posterior probability ${p}_{\mathit{k}\mathit{j}}$ for allocating the $\mathit{k}$th observation to the $\mathit{j}$th group, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{j}=1,2,\dots ,{n}_{g}$.
18: $\mathbf{tdp}$Integer Input
On entry: the stride separating matrix column elements in the arrays p, ati.
Constraint: ${\mathbf{tdp}}\ge {\mathbf{ng}}$.
19: $\mathbf{iag}\left[{\mathbf{nobs}}\right]$Integer Output
On exit: the groups to which the observations have been allocated.
20: $\mathbf{atiq}$Nag_Boolean Input
On entry: atiq must be Nag_TRUE if atypicality indices are required. If atiq is Nag_FALSE, the array ati is not set.
21: $\mathbf{ati}\left[{\mathbf{nobs}}×{\mathbf{tdp}}\right]$double Output
On exit: if atiq is Nag_TRUE, ${\mathbf{ati}}\left[\left(\mathit{k}-1\right)×{\mathbf{tdp}}+\mathit{j}-1\right]$ will contain the atypicality index for the $\mathit{k}$th observation with respect to the $\mathit{j}$th group, for $\mathit{k}=1,2,\dots ,{\mathbf{nobs}}$ and $\mathit{j}=1,2,\dots ,{n}_{g}$. If atiq is Nag_FALSE, ati is not set.
22: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

## 6Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{m}}\ge {\mathbf{nvar}}$.
On entry, ${\mathbf{tdg}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdg}}\ge {\mathbf{nvar}}$.
On entry, ${\mathbf{tdp}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdp}}\ge {\mathbf{ng}}$.
On entry, ${\mathbf{tdx}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument equal had an illegal value.
On entry, argument priors had an illegal value.
On entry, argument type had an illegal value.
NE_DIAG_0_COND
A diagonal element of R is zero when ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$.
NE_DIAG_0_J_COND
A diagonal element of R is zero for some $j$, when ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$
NE_GROUP_SUM
On entry, the ${\sum }_{j=1}^{{\mathbf{ng}}}{\mathbf{nig}}\left[j-1\right]=⟨\mathit{\text{value}}⟩$, ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\sum }_{j=1}^{{\mathbf{ng}}}{\mathbf{nig}}\left[j-1\right]>{\mathbf{ng}}+{\mathbf{nvar}}$ when ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$.
NE_INT_ARG_LT
On entry, ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ng}}\ge 2$.
On entry, ${\mathbf{nobs}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nobs}}\ge 1$.
On entry, ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nvar}}\ge 1$.
NE_INTARR
On entry, ${\mathbf{nig}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nig}}\left[\mathit{i}-1\right]>0$, for $\mathit{i}=1,2,\dots ,{\mathbf{ng}}$, when ${\mathbf{equal}}=\mathrm{Nag_EqualCovar}$.
NE_INTARR_INT
On entry, ${\mathbf{nig}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$, ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nig}}\left[i-1\right]>{\mathbf{nvar}}$, $i=1,2,\dots ,{\mathbf{ng}}$ when ${\mathbf{equal}}=\mathrm{Nag_NotEqualCovar}$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_PRIOR_SUM
On entry, ${\sum }_{j=1}^{{\mathbf{ng}}}{\mathbf{prior}}\left[j-1\right]=⟨\mathit{\text{value}}⟩$.
Constraint: ${\sum }_{j=1}^{{\mathbf{ng}}}{\mathbf{prior}}\left[j-1\right]$ must be within $10×$ machine precision of 1 when ${\mathbf{priors}}=\mathrm{Nag_UserPrior}$.
NE_REALARR
On entry, ${\mathbf{prior}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{prior}}\left[j-1\right]>0$, $j=1,2,\dots ,{\mathbf{ng}}$ when ${\mathbf{priors}}=\mathrm{Nag_UserPrior}$.
NE_VAR_INCL_INDICATED
The number of variables, nvar in the analysis $\text{}=⟨\mathit{\text{value}}⟩$, while number of variables included in the analysis via array ${\mathbf{isx}}=⟨\mathit{\text{value}}⟩$.
Constraint: these two numbers must be the same.

## 7Accuracy

The accuracy of the returned posterior probabilities will depend on the accuracy of the input $R$ or ${R}_{j}$ matrices. The atypicality index should be accurate to four significant places.

## 8Parallelism and Performance

g03dcc is not threaded in any implementation.

The distances ${D}_{kj}^{2}$ can be computed using g03dbc if other forms of discrimination are required.

## 10Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of 21 patients are input and the group means and $R$ matrices are computed by g03dac. A further six observations of unknown type are input and allocations made using the predictive approach and under the assumption that the within-group covariance matrices are not equal. The posterior probabilities of group membership, ${q}_{j}$, and the atypicality index are printed along with the allocated group. The atypicality index shows that observations 5 and 6 do not seem to be typical of the three types present in the initial 21 observations.

### 10.1Program Text

Program Text (g03dcce.c)

### 10.2Program Data

Program Data (g03dcce.d)

### 10.3Program Results

Program Results (g03dcce.r)