NAG CL Interface
g08cgc (test_​chisq)

1 Purpose

g08cgc computes the test statistic for the χ 2 goodness-of-fit test for data with a chosen number of class intervals.

2 Specification

#include <nag.h>
void  g08cgc (Integer nclass, const Integer ifreq[], const double cint[], Nag_Distributions dist, const double par[], Integer npest, const double prob[], double *chisq, double *p, Integer *ndf, double eval[], double chisqi[], NagError *fail)
The function may be called by the names: g08cgc, nag_nonpar_test_chisq or nag_chi_sq_goodness_of_fit_test.

3 Description

The χ 2 goodness-of-fit test performed by g08cgc is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size n , denoted by x 1 , x 2 , , x n , drawn from a random variable X , and that the data have been grouped into k classes,
xc1, ci-1<xci, i=2,3,,k-1, x>ck-1,  
then the χ 2 goodness-of-fit test statistic is defined by:
X 2 = i=1 k O i - E i 2 E i  
where O i is the observed frequency of the i th class, and E i is the expected frequency of the i th class.
The expected frequencies are computed as
E i = p i × n ,  
where p i is the probability that X lies in the i th class, that is
p1=PXc1, pi=Pci-1<Xci, i=2,3,,k-1, pk=PX>ck-1.  
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this function are:
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using g01aec.
g08cgc returns the χ 2 test statistic, X 2 , together with its degrees of freedom and the upper tail probability from the χ 2 distribution associated with the test statistic. Note that the use of the χ 2 distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.

4 References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

5 Arguments

1: nclass Integer Input
On entry: the number of classes, k , into which the data is divided.
Constraint: nclass2 .
2: ifreq[nclass] const Integer Input
On entry: ifreq[i-1] must specify the frequency of the i th class, O i , for i=1,2,,k.
Constraint: ifreq[i-1] 0 , for i=1,2,, k.
3: cint[nclass-1] const double Input
On entry: cint[i-1] must specify the upper boundary value for the i th class, for i=1,2,,k - 1.
Constraints:
  • cint[0] < cint[1] < < cint[nclass-2] ;
  • For the exponential, gamma and χ 2 distributions cint[0] 0.0 .
4: dist Nag_Distributions Input
On entry: indicates for which distribution the test is to be carried out.
dist=Nag_Normal
The Normal distribution is used.
dist=Nag_Uniform
The uniform distribution is used.
dist=Nag_Exponential
The exponential distribution is used.
dist=Nag_ChiSquare
The χ 2 distribution is used.
dist=Nag_Gamma
The gamma distribution is used.
dist=Nag_UserProb
You must supply the class probabilities in the array prob.
Constraint: dist=Nag_Normal, Nag_Uniform, Nag_Exponential, Nag_ChiSquare, Nag_Gamma or Nag_UserProb.
5: par[2] const double Input
On entry: par must contain the arguments of the distribution which is being tested. If you supply the probabilities (i.e., dist=Nag_UserProb) the array par is not referenced.
If a Normal distribution is used then par[0] and par[1] must contain the mean, μ , and the variance, σ 2 , respectively.
If a uniform distribution is used then par[0] and par[1] must contain the boundaries a and b respectively.
If an exponential distribution is used then par[0] must contain the argument λ . par[1] is not used.
If a χ 2 distribution is used then par[0] must contain the number of degrees of freedom. par[1] is not used.
If a gamma distribution is used par[0] and par[1] must contain the arguments α and β respectively.
Constraints:
  • if dist=Nag_Normal, par[1] > 0.0 ;
  • if dist=Nag_Uniform, par[0] < par[1] and par[0] cint[0] ;
  • otherwise par[1] cint nclass-2 ;
  • if dist=Nag_Exponential, par[0] > 0.0 ;
  • if dist=Nag_ChiSquare, par[0] > 0.0 ;
  • if dist=Nag_Gamma, par[0] and par[1] > 0.0 .
6: npest Integer Input
On entry: the number of estimated arguments of the distribution.
Constraint: 0 npest < nclass - 1 .
7: prob[nclass] const double Input
On entry: if you are supplying the probability distribution (i.e., dist=Nag_UserProb) then prob[i-1] must contain the probability that X lies in the i th class.
If distNag_UserProb, prob is not referenced.
Constraint: if dist=Nag_UserProb, prob[i-1] > 0.0 and i=1 k prob[i-1] = 1.0 , for i=1,2,,k.
8: chisq double * Output
On exit: the test statistic, X 2 , for the χ 2 goodness-of-fit test.
9: p double * Output
On exit: the upper tail probability from the χ 2 distribution associated with the test statistic, X 2 , and the number of degrees of freedom.
10: ndf Integer * Output
On exit: contains nclass - 1 - npest , the degrees of freedom associated with the test.
11: eval[nclass] double Output
On exit: eval[i-1] contains the expected frequency for the i th class, E i , for i=1,2,,k.
12: chisqi[nclass] double Output
On exit: chisqi[i-1] contains the contribution from the i th class to the test statistic, that is O i - E i 2 / E i , for i=1,2,,k.
13: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ARRAY_CONS
The contents of array prob are not valid.
Constraint: Sum of prob[i-1] = 1 , for i=1,2,,nclass, when dist=Nag_UserProb.
NE_ARRAY_INPUT
On entry, the values provided in par are invalid.
NE_BAD_PARAM
On entry, argument dist had an illegal value.
NE_G08CG_CLASS_VAL
This is a warning that expected values for certain classes are less than 1.0. This implies that one cannot be confident that the χ 2 distribution is a good approximation to the distribution of the test statistic.
NE_G08CG_CONV
The solution obtained when calculating the probability for a certain class for the gamma or χ 2 distribution did not converge in 600 iterations. The solution may be an adequate approximation.
NE_G08CG_FREQ
An expected frequency is equal to zero when the observed frequency is not.
NE_INT_2
On entry, npest=value , nclass=value .
Constraint: 0 npest < nclass - 1 .
NE_INT_ARG_LT
On entry, nclass=value.
Constraint: nclass2.
NE_INT_ARRAY_CONS
On entry, ifreq[value] = value.
Constraint: ifreq[i-1] 0 , for i=1,2,,nclass.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NOT_STRICTLY_INCREASING
The sequence cint is not strictly increasing cint[value] = value, cint[value-1] = value.
NE_REAL_ARRAY_CONS
On entry, prob[value] = value.
Constraint: prob[i-1] > 0 , for i=1,2,,nclass, when dist=Nag_UserProb.
NE_REAL_ARRAY_ELEM_CONS
On entry, cint[0] = value.
Constraint: cint[0] 0.0 , if dist=Nag_ExponentialNag_ChiSquareNag_Gamma.

7 Accuracy

The computations are believed to be stable.

8 Parallelism and Performance

g08cgc is not threaded in any implementation.

9 Further Comments

The time taken by g08cgc is dependent both on the distribution chosen and on the number of classes, k .

10 Example

The example program applies the χ 2 goodness-of-fit test to test whether there is evidence to suggest that a sample of 100 observations generated by g05sqc do not arise from a uniform distribution U 0,1 . The class intervals are calculated such that the interval (0,1) is divided into five equal classes. The frequencies for each class are calculated using g01aec.

10.1 Program Text

Program Text (g08cgce.c)

10.2 Program Data

Program Data (g08cgce.d)

10.3 Program Results

Program Results (g08cgce.r)