naginterfaces.library.mv.discrim¶
- naginterfaces.library.mv.discrim(x, isx, ing, ng, wt=None)[source]¶
discrim
computes a test statistic for the equality of within-group covariance matrices and also computes matrices for use in discriminant analysis.For full information please refer to the NAG Library document for g03da
https://www.nag.com/numeric/nl/nagdoc_29/flhtml/g03/g03daf.html
- Parameters
- xfloat, array-like, shape
must contain the th observation for the th variable, for , for .
- isxint, array-like, shape
indicates whether or not the th variable in is to be included in the variance-covariance matrices.
If the th variable is included, for ; otherwise it is not referenced.
- ingint, array-like, shape
indicates to which group the th observation belongs, for .
- ngint
The number of groups, .
- wtNone or float, array-like, shape , optional
The elements of must contain the weights to be used in the analysis and the effective number of observations for a group is the sum of the weights of the observations in that group. If then the th observation is excluded from the calculations.
If weights are not provided then must be set to None and the effective number of observations for a group is the number of observations in that group.
- Returns
- nigint, ndarray, shape
contains the number of observations in the th group, for .
- gmnfloat, ndarray, shape
The th row of contains the means of the selected variables for the th group, for .
- detfloat, ndarray, shape
The logarithm of the determinants of the within-group variance-covariance matrices.
- gcfloat, ndarray, shape
The first elements of contain and the remaining blocks of elements contain the matrices. All are stored in packed form by columns.
- statfloat
The likelihood-ratio test statistic, .
- dffloat
The degrees of freedom for the distribution of .
- sigfloat
The significance level for .
- Raises
- NagValueError
- (errno )
On entry, .
Constraint: or .
- (errno )
On entry, and .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, .
Constraint: .
- (errno )
On entry, and .
Constraint: .
- (errno )
The number of observations for group is less than .
- (errno )
The effective number of observations for group is less than .
- (errno )
On entry, , and .
Constraint: .
- (errno )
On entry, and values of
Constraint: exactly elements of .
- (errno )
is not of full rank.
- (errno )
is not of full rank for .
- Notes
In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.
Let a sample of observations on variables come from groups with observations in the th group and . If the data is assumed to follow a multivariate Normal distribution with the variance-covariance matrix of the th group , then to test for equality of the variance-covariance matrices between groups, that is, , the following likelihood-ratio test statistic, , can be used;
where
and are the within-group variance-covariance matrices and is the pooled variance-covariance matrix given by
For large , is approximately distributed as a variable with degrees of freedom, see Morrison (1967) for further comments. If weights are used, then and are the weighted pooled and within-group variance-covariance matrices and is the effective number of observations, that is, the sum of the weights.
Instead of calculating the within-group variance-covariance matrices and then computing their determinants in order to calculate the test statistic,
discrim
uses a decomposition. The group means are subtracted from the data and then for each group, a decomposition is computed to give an upper triangular matrix . This matrix can be scaled to give a matrix such that . The pooled matrix is then computed from the matrices. The values of and the can then be calculated from the diagonal elements of and the .This approach means that the Mahalanobis squared distances for a vector observation can be computed as , where , being the vector of means of the th group. These distances can be calculated by
discrim_mahal()
. The distances are used in discriminant analysis anddiscrim_group()
uses the results ofdiscrim
to perform several different types of discriminant analysis. The differences between the discriminant methods are, in part, due to whether or not the within-group variance-covariance matrices are equal.
- References
Aitchison, J and Dunsmore, I R, 1975, Statistical Prediction Analysis, Cambridge
Kendall, M G and Stuart, A, 1976, The Advanced Theory of Statistics (Volume 3), (3rd Edition), Griffin
Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press
Morrison, D F, 1967, Multivariate Statistical Methods, McGraw–Hill