NAG Library Routine Document
G03DAF computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.
|SUBROUTINE G03DAF (
||WEIGHT, N, M, X, LDX, ISX, NVAR, ING, NG, WT, NIG, GMN, LDGMN, DET, GC, STAT, DF, SIG, WK, IWK, IFAIL)
||N, M, LDX, ISX(M), NVAR, ING(N), NG, NIG(NG), LDGMN, IWK(NG), IFAIL
||X(LDX,M), WT(*), GMN(LDGMN,NVAR), DET(NG), GC((NG+1)*NVAR*(NVAR+1)/2), STAT, DF, SIG, WK(N*(NVAR+1))
Let a sample of
variables come from
observations in the
th group and
. If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the
, then to test for equality of the variance-covariance matrices between groups, that is,
, the following likelihood-ratio test statistic,
, can be used;
are the within-group variance-covariance matrices and
is the pooled variance-covariance matrix given by
is approximately distributed as a
degrees of freedom, see Morrison (1967)
for further comments. If weights are used, then
are the weighted pooled and within-group variance-covariance matrices and
is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic, G03DAF uses a decomposition. The group means are subtracted from the data and then for each group, a decomposition is computed to give an upper triangular matrix . This matrix can be scaled to give a matrix such that . The pooled matrix is then computed from the matrices. The values of and the can then be calculated from the diagonal elements of and the .
This approach means that the Mahalanobis squared distances for a vector observation
can be computed as
being the vector of means of the
th group. These distances can be calculated by G03DBF
. The distances are used in discriminant analysis and G03DCF
uses the results of G03DAF to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.
Aitchison J and Dunsmore I R (1975) Statistical Prediction Analysis Cambridge
Kendall M G and Stuart A (1976) The Advanced Theory of Statistics (Volume 3) (3rd Edition) Griffin
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press
Morrison D F (1967) Multivariate Statistical Methods McGraw–Hill
- 1: WEIGHT – CHARACTER(1)Input
: indicates if weights are to be used.
- No weights are used.
- Weights are to be used and must be supplied in WT.
- 2: N – INTEGERInput
On entry: , the number of observations.
- 3: M – INTEGERInput
: the number of variables in the data array X
- 4: X(LDX,M) – REAL (KIND=nag_wp) arrayInput
On entry: must contain the th observation for the th variable, for and .
- 5: LDX – INTEGERInput
: the first dimension of the array X
as declared in the (sub)program from which G03DAF is called.
- 6: ISX(M) – INTEGER arrayInput
indicates whether or not the
th variable in X
is to be included in the variance-covariance matrices.
the th variable is included, for ; otherwise it is not referenced.
- 7: NVAR – INTEGERInput
On entry: , the number of variables in the variance-covariance matrices.
- 8: ING(N) – INTEGER arrayInput
On entry: indicates to which group the th observation belongs, for .
The values of ING
must be such that each group has at least NVAR
- 9: NG – INTEGERInput
On entry: the number of groups, .
- 10: WT() – REAL (KIND=nag_wp) arrayInput
the dimension of the array WT
must be at least
, and at least
elements of WT
must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If
th observation is excluded from the calculations.
is not referenced and the effective number of observations for a group is the number of observations in that group.
if , , for .
- 11: NIG(NG) – INTEGER arrayOutput
On exit: contains the number of observations in the th group, for .
- 12: GMN(LDGMN,NVAR) – REAL (KIND=nag_wp) arrayOutput
th row of GMN
contains the means of the
selected variables for the
th group, for
- 13: LDGMN – INTEGERInput
: the first dimension of the array GMN
as declared in the (sub)program from which G03DAF is called.
- 14: DET(NG) – REAL (KIND=nag_wp) arrayOutput
On exit: the logarithm of the determinants of the within-group variance-covariance matrices.
- 15: GC() – REAL (KIND=nag_wp) arrayOutput
: the first
elements of GC
and the remaining
elements contain the
matrices. All are stored in packed form by columns.
- 16: STAT – REAL (KIND=nag_wp)Output
On exit: the likelihood-ratio test statistic, .
- 17: DF – REAL (KIND=nag_wp)Output
On exit: the degrees of freedom for the distribution of .
- 18: SIG – REAL (KIND=nag_wp)Output
On exit: the significance level for .
- 19: WK() – REAL (KIND=nag_wp) arrayWorkspace
- 20: IWK(NG) – INTEGER arrayWorkspace
- 21: IFAIL – INTEGERInput/Output
must be set to
. If you are unfamiliar with this parameter you should refer to Section 3.3
in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
is recommended. If the output of error messages is undesirable, then the value
is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
. When the value is used it is essential to test the value of IFAIL on exit.
unless the routine detects an error or a warning has been flagged (see Section 6
6 Error Indicators and Warnings
If on entry
, explanatory error messages are output on the current error message unit (as defined by X04AAF
Errors or warnings detected by the routine:
|or|| or .|
|On entry,|| and a value of .|
|On entry,||there are not exactly NVAR elements of ,|
|or||a value of ING is not in the range to NG,|
|or||the effective number of observations for a group is less than ,|
|or||a group has less than NVAR members.|
or one of the is not of full rank.
The accuracy is dependent on the accuracy of the computation of the
decomposition. See F08AEF (DGEQRF)
for further details.
The time taken will be approximately proportional to .
The data, taken from Aitchison and Dunsmore (1975)
, is concerned with the diagnosis of three ‘types’ of Cushing's syndrome. The variables are the logarithms of the urinary excretion rates (mg/24hr) of two steroid metabolites. Observations for a total of
patients are input and the statistics computed by G03DAF. The printed results show that there is evidence that the within-group variance-covariance matrices are not equal.
9.1 Program Text
Program Text (g03dafe.f90)
9.2 Program Data
Program Data (g03dafe.d)
9.3 Program Results
Program Results (g03dafe.r)