# NAG CL Interfaceg03dac (discrim)

Settings help

CL Name Style:

## 1Purpose

g03dac computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.

## 2Specification

 #include
 void g03dac (Integer n, Integer m, const double x[], Integer tdx, const Integer isx[], Integer nvar, const Integer ing[], Integer ng, const double wt[], Integer nig[], double gmean[], Integer tdg, double det[], double gc[], double *stat, double *df, double *sig, NagError *fail)
The function may be called by the names: g03dac or nag_mv_discrim.

## 3Description

Let a sample of $n$ observations on $p$ variables come from ${n}_{g}$ groups with ${n}_{j}$ observations in the $j$th group and $\sum {n}_{j}=n$. If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the $j$th group ${\Sigma }_{j}$, then to test for equality of the variance-covariance matrices between groups, that is, ${\Sigma }_{1}={\Sigma }_{2}=\cdots ={\Sigma }_{{n}_{g}}=\Sigma$, the following likelihood-ratio test statistic, $G$, can be used;
 $G = C {( n-n g )log|S|- ∑ j=1 n g ( n j -1)log| S j |} ,$
where
 $C = 1 - 2 p 2 + 3 p - 1 6 (p+1) ( n g -1) ( ∑ j=1 n g 1 ( n j -1) - 1 ( n-n g ) ) ,$
and ${S}_{j}$ are the within-group variance-covariance matrices and $S$ is the pooled variance-covariance matrix given by
 $S = ∑ j=1 n g ( n j -1) S j ( n-n g ) .$
For large $n$, $G$ is approximately distributed as a ${\chi }^{2}$ variable with $\frac{1}{2}p\left(p+1\right)\left({n}_{g}-1\right)$ degrees of freedom, see Morrison (1967) for further comments. If weights are used, then $S$ and ${S}_{j}$ are the weighted pooled and within-group variance-covariance matrices and $n$ is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, g03dac uses a $QR$ decomposition. The group means are subtracted from the data and then for each group, a $QR$ decomposition is computed to give an upper triangular matrix ${R}_{j}^{*}$. This matrix can be scaled to give a matrix ${R}_{j}$ such that ${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. The pooled $R$ matrix is then computed from the ${R}_{j}$ matrices. The values of $|S|$ and the $|{S}_{j}|$ can then be calculated from the diagonal elements of $R$ and the ${R}_{j}$.
This approach means that the Mahalanobis squared distances for a vector observation $x$ can be computed as ${z}^{\mathrm{T}}z$, where ${R}_{j}z=\left(x-{\overline{x}}_{j}\right)$, ${\overline{x}}_{j}$ being the vector of means of the $j$th group. These distances can be calculated by g03dbc. The distances are used in discriminant analysis and g03dcc uses the results of g03dac to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.
Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill

## 5Arguments

1: $\mathbf{n}$Integer Input
On entry: the number of observations, $n$.
Constraint: ${\mathbf{n}}\ge 1$.
2: $\mathbf{m}$Integer Input
On entry: the number of variables in the data array x.
Constraint: ${\mathbf{m}}\ge {\mathbf{nvar}}$.
3: $\mathbf{x}\left[{\mathbf{n}}×{\mathbf{tdx}}\right]$const double Input
On entry: ${\mathbf{x}}\left[\left(\mathit{k}-1\right)×{\mathbf{tdx}}+\mathit{l}-1\right]$ must contain the $\mathit{k}$th observation for the $\mathit{l}$th variable, for $\mathit{k}=1,2,\dots ,n$ and $\mathit{l}=1,2,\dots ,{\mathbf{m}}$.
4: $\mathbf{tdx}$Integer Input
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
5: $\mathbf{isx}\left[{\mathbf{m}}\right]$const Integer Input
On entry: ${\mathbf{isx}}\left[l-1\right]$ indicates whether or not the $l$th variable in x is to be included in the variance-covariance matrices.
If ${\mathbf{isx}}\left[\mathit{l}-1\right]>0$ the $\mathit{l}$th variable is included, for $\mathit{l}=1,2,\dots ,{\mathbf{m}}$; otherwise it is not referenced.
Constraint: ${\mathbf{isx}}\left[l-1\right]>0$ for nvar values of $l$.
6: $\mathbf{nvar}$Integer Input
On entry: the number of variables in the variance-covariance matrices, $p$.
Constraint: ${\mathbf{nvar}}\ge 1$.
7: $\mathbf{ing}\left[{\mathbf{n}}\right]$const Integer Input
On entry: ${\mathbf{ing}}\left[\mathit{k}-1\right]$ indicates to which group the $\mathit{k}$th observation belongs, for $\mathit{k}=1,2,\dots ,n$.
Constraint: $1\le {\mathbf{ing}}\left[\mathit{k}-1\right]\le {\mathbf{ng}}$, for $\mathit{k}=1,2,\dots ,n$
The values of ing must be such that each group has at least nvar members
8: $\mathbf{ng}$Integer Input
On entry: the number of groups, ${n}_{g}$.
Constraint: ${\mathbf{ng}}\ge 2$.
9: $\mathbf{wt}\left[{\mathbf{n}}\right]$const double Input
On entry: the elements of wt must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If ${\mathbf{wt}}\left[k-1\right]=0.0$ then the $k$th observation is excluded from the calculations.
If weights are not provided then wt must be set to NULL and the effective number of observations for a group is the number of observations in that group.
Constraints:
• if wt is not NULL, ${\mathbf{wt}}\left[\mathit{k}-1\right]\ge 0.0$, for $\mathit{k}=1,2,\dots ,n$;
• the effective number of observations for each group must be greater than 1.
10: $\mathbf{nig}\left[{\mathbf{ng}}\right]$Integer Output
On exit: ${\mathbf{nig}}\left[\mathit{j}-1\right]$ contains the number of observations in the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
11: $\mathbf{gmean}\left[{\mathbf{ng}}×{\mathbf{tdg}}\right]$double Output
Note: the $\left(i,j\right)$th element of the matrix is stored in ${\mathbf{gmean}}\left[\left(i-1\right)×{\mathbf{tdg}}+j-1\right]$.
On exit: the $\mathit{j}$th row of gmean contains the means of the $p$ selected variables for the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
12: $\mathbf{tdg}$Integer Input
On entry: the stride separating matrix column elements in the array gmean.
Constraint: ${\mathbf{tdg}}\ge {\mathbf{nvar}}$.
13: $\mathbf{det}\left[{\mathbf{ng}}\right]$double Output
On exit: the logarithm of the determinants of the within-group variance-covariance matrices.
14: $\mathbf{gc}\left[\mathit{dim}\right]$double Output
Note: the dimension, dim, of the array gc must be at least $\left({\mathbf{ng}}+1\right)×{\mathbf{nvar}}×\left({\mathbf{nvar}}+1\right)/2$.
On exit: the first $p\left(p+1\right)/2$ elements of gc contain $R$ and the remaining ${n}_{g}$ blocks of $p\left(p+1\right)/2$ elements contain the ${R}_{j}$ matrices. All are stored in packed form by columns.
15: $\mathbf{stat}$double * Output
On exit: the likelihood-ratio test static, $G$.
16: $\mathbf{df}$double * Output
On exit: the degrees of freedom for the distribution of $G$.
17: $\mathbf{sig}$double * Output
On exit: the significance level for $G$.
18: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

## 6Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{m}}\ge {\mathbf{nvar}}$.
On entry, ${\mathbf{tdg}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdg}}\ge {\mathbf{nvar}}$.
On entry, ${\mathbf{tdx}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_GROUP_OBSERV
On entry, group $⟨\mathit{\text{value}}⟩$ has $⟨\mathit{\text{value}}⟩$ effective observations.
Constraint: in each group the effective number of observations must be $\ge 1$.
NE_GROUP_VAR
On entry, group $⟨\mathit{\text{value}}⟩$ has $⟨\mathit{\text{value}}⟩$ members, while ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$.
Constraint: number of members in each group $\ge {\mathbf{nvar}}$.
NE_GROUP_VAR_RANK
The variables in group $⟨\mathit{\text{value}}⟩$ are not of full rank.
NE_INT_ARG_LT
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}\ge 1$.
On entry, ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ng}}\ge 2$.
On entry, ${\mathbf{nvar}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nvar}}\ge 1$.
NE_INTARR_INT
On entry, ${\mathbf{ing}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$, ${\mathbf{ng}}=⟨\mathit{\text{value}}⟩$.
Constraint: $1\le {\mathbf{ing}}\left[\mathit{i}-1\right]\le {\mathbf{ng}}$, for $\mathit{i}=1,2,\dots ,n$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NEG_WEIGHT_ELEMENT
On entry, ${\mathbf{wt}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
Constraint: when referenced, all elements of wt must be non-negative.
NE_VAR_INCL_INDICATED
The number of variables, nvar in the analysis $\text{}=⟨\mathit{\text{value}}⟩$, while number of variables included in the analysis via array ${\mathbf{isx}}=⟨\mathit{\text{value}}⟩$. Constraint: these two numbers must be the same.
NE_VAR_RANK
The variables are not of full rank.

## 7Accuracy

The accuracy is dependent on the accuracy of the computation of the $QR$ decomposition.

## 8Parallelism and Performance

g03dac is not threaded in any implementation.

The time will be approximately proportional to $n{p}^{2}$.

## 10Example

The data, taken from Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of 21 patients are input and the statistics computed by g03dac. The printed results show that there is evidence that the within-group variance-covariance matrices are not equal.

### 10.1Program Text

Program Text (g03dace.c)

### 10.2Program Data

Program Data (g03dace.d)

### 10.3Program Results

Program Results (g03dace.r)