G03DAF computes a test statistic for the equality of withingroup covariance matrices and also computes matrices for use in discriminant analysis.
Let a sample of
$n$ observations on
$p$ variables come from
${n}_{g}$ groups with
${n}_{j}$ observations in the
$j$th group and
$\sum {n}_{j}=n$. If the data is assumed to follow a multivariate Normal distribution with the variancecovariance matrix of the
$j$th group
${\Sigma}_{j}$, then to test for equality of the variancecovariance matrices between groups, that is,
${\Sigma}_{1}={\Sigma}_{2}=\cdots ={\Sigma}_{{n}_{g}}=\Sigma $, the following likelihoodratio test statistic,
$G$, can be used;
where
and
${S}_{j}$ are the withingroup variancecovariance matrices and
$S$ is the pooled variancecovariance matrix given by
For large
$n$,
$G$ is approximately distributed as a
${\chi}^{2}$ variable with
$\frac{1}{2}p\left(p+1\right)\left({n}_{g}1\right)$ degrees of freedom, see
Morrison (1967) for further comments. If weights are used, then
$S$ and
${S}_{j}$ are the weighted pooled and withingroup variancecovariance matrices and
$n$ is the effective number of observations, that is, the sum of the weights.
Instead of calculating the withingroup variancecovariance matrices and then computing their determinants in order to calculate the test statistic, G03DAF uses a $QR$ decomposition. The group means are subtracted from the data and then for each group, a $QR$ decomposition is computed to give an upper triangular matrix ${R}_{j}^{*}$. This matrix can be scaled to give a matrix ${R}_{j}$ such that ${S}_{j}={R}_{j}^{\mathrm{T}}{R}_{j}$. The pooled $R$ matrix is then computed from the ${R}_{j}$ matrices. The values of $\leftS\right$ and the $\left{S}_{j}\right$ can then be calculated from the diagonal elements of $R$ and the ${R}_{j}$.
This approach means that the Mahalanobis squared distances for a vector observation
$x$ can be computed as
${z}^{\mathrm{T}}z$, where
${R}_{j}z=\left(x{\stackrel{}{x}}_{j}\right)$,
${\stackrel{}{x}}_{j}$ being the vector of means of the
$j$th group. These distances can be calculated by
G03DBF. The distances are used in discriminant analysis and
G03DCF uses the results of G03DAF to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the withingroup variancecovariance matrices are equal.
 1: WEIGHT – CHARACTER(1)Input
On entry: indicates if weights are to be used.
 ${\mathbf{WEIGHT}}=\text{'U'}$
 No weights are used.
 ${\mathbf{WEIGHT}}=\text{'W'}$
 Weights are to be used and must be supplied in WT.
Constraint:
${\mathbf{WEIGHT}}=\text{'U'}$ or $\text{'W'}$.
 2: N – INTEGERInput
On entry: $n$, the number of observations.
Constraint:
${\mathbf{N}}\ge 1$.
 3: M – INTEGERInput
On entry: the number of variables in the data array
X.
Constraint:
${\mathbf{M}}\ge {\mathbf{NVAR}}$.
 4: X(LDX,M) – REAL (KIND=nag_wp) arrayInput
On entry: ${\mathbf{X}}\left(\mathit{k},\mathit{l}\right)$ must contain the $\mathit{k}$th observation for the $\mathit{l}$th variable, for $\mathit{k}=1,2,\dots ,n$ and $\mathit{l}=1,2,\dots ,{\mathbf{M}}$.
 5: LDX – INTEGERInput
On entry: the first dimension of the array
X as declared in the (sub)program from which G03DAF is called.
Constraint:
${\mathbf{LDX}}\ge {\mathbf{N}}$.
 6: ISX(M) – INTEGER arrayInput
On entry:
${\mathbf{ISX}}\left(l\right)$ indicates whether or not the
$l$th variable in
X is to be included in the variancecovariance matrices.
If
${\mathbf{ISX}}\left(\mathit{l}\right)>0$ the $\mathit{l}$th variable is included, for $\mathit{l}=1,2,\dots ,{\mathbf{M}}$; otherwise it is not referenced.
Constraint:
${\mathbf{ISX}}\left(l\right)>0$ for
NVAR values of
$l$.
 7: NVAR – INTEGERInput
On entry: $p$, the number of variables in the variancecovariance matrices.
Constraint:
${\mathbf{NVAR}}\ge 1$.
 8: ING(N) – INTEGER arrayInput
On entry: ${\mathbf{ING}}\left(\mathit{k}\right)$ indicates to which group the $\mathit{k}$th observation belongs, for $\mathit{k}=1,2,\dots ,n$.
Constraint:
$1\le {\mathbf{ING}}\left(\mathit{k}\right)\le {\mathbf{NG}}$, for
$\mathit{k}=1,2,\dots ,n$The values of
ING must be such that each group has at least
NVAR members.
 9: NG – INTEGERInput
On entry: the number of groups, ${n}_{g}$.
Constraint:
${\mathbf{NG}}\ge 2$.
 10: WT($*$) – REAL (KIND=nag_wp) arrayInput

Note: the dimension of the array
WT
must be at least
${\mathbf{N}}$ if
${\mathbf{WEIGHT}}=\text{'W'}$, and at least
$1$ otherwise.
On entry: if
${\mathbf{WEIGHT}}=\text{'W'}$ the first
$n$ elements of
WT must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If
${\mathbf{WT}}\left(k\right)=0.0$ the
$k$th observation is excluded from the calculations.
If
${\mathbf{WEIGHT}}=\text{'U'}$,
WT is not referenced and the effective number of observations for a group is the number of observations in that group.
Constraint:
if ${\mathbf{WEIGHT}}=\text{'W'}$, ${\mathbf{WT}}\left(\mathit{k}\right)\ge 0.0$, for $\mathit{k}=1,2,\dots ,n$.
 11: NIG(NG) – INTEGER arrayOutput
On exit: ${\mathbf{NIG}}\left(\mathit{j}\right)$ contains the number of observations in the $\mathit{j}$th group, for $\mathit{j}=1,2,\dots ,{n}_{g}$.
 12: GMN(LDGMN,NVAR) – REAL (KIND=nag_wp) arrayOutput
On exit: the
$\mathit{j}$th row of
GMN contains the means of the
$p$ selected variables for the
$\mathit{j}$th group, for
$\mathit{j}=1,2,\dots ,{n}_{g}$.
 13: LDGMN – INTEGERInput
On entry: the first dimension of the array
GMN as declared in the (sub)program from which G03DAF is called.
Constraint:
${\mathbf{LDGMN}}\ge {\mathbf{NG}}$.
 14: DET(NG) – REAL (KIND=nag_wp) arrayOutput
On exit: the logarithm of the determinants of the withingroup variancecovariance matrices.
 15: GC($\left({\mathbf{NG}}+1\right)\times {\mathbf{NVAR}}\times \left({\mathbf{NVAR}}+1\right)/2$) – REAL (KIND=nag_wp) arrayOutput
On exit: the first
$p\left(p+1\right)/2$ elements of
GC contain
$R$ and the remaining
${n}_{g}$ blocks of
$p\left(p+1\right)/2$ elements contain the
${R}_{j}$ matrices. All are stored in packed form by columns.
 16: STAT – REAL (KIND=nag_wp)Output
On exit: the likelihoodratio test statistic, $G$.
 17: DF – REAL (KIND=nag_wp)Output
On exit: the degrees of freedom for the distribution of $G$.
 18: SIG – REAL (KIND=nag_wp)Output
On exit: the significance level for $G$.
 19: WK(${\mathbf{N}}\times \left({\mathbf{NVAR}}+1\right)$) – REAL (KIND=nag_wp) arrayWorkspace
 20: IWK(NG) – INTEGER arrayWorkspace
 21: IFAIL – INTEGERInput/Output

On entry:
IFAIL must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
$0$.
When the value $\mathbf{1}\text{ or}\mathbf{1}$ is used it is essential to test the value of IFAIL on exit.
On exit:
${\mathbf{IFAIL}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
If on entry
${\mathbf{IFAIL}}={\mathbf{0}}$ or
${{\mathbf{1}}}$, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
The accuracy is dependent on the accuracy of the computation of the
$QR$ decomposition. See
F08AEF (DGEQRF) for further details.
The data, taken from
Aitchison and Dunsmore (1975), is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of
$21$ patients are input and the statistics computed by G03DAF. The printed results show that there is evidence that the withingroup variancecovariance matrices are not equal.