nag_chi_sq_goodness_of_fit_test (g08cgc) (PDF version)
g08 Chapter Contents
g08 Chapter Introduction
NAG C Library Manual

NAG Library Function Document

nag_chi_sq_goodness_of_fit_test (g08cgc)

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

nag_chi_sq_goodness_of_fit_test (g08cgc) computes the test statistic for the χ 2  goodness-of-fit test for data with a chosen number of class intervals.

2  Specification

#include <nag.h>
#include <nagg08.h>
void  nag_chi_sq_goodness_of_fit_test (Integer nclass, const Integer ifreq[], const double cint[], Nag_Distributions dist, const double par[], Integer npest, const double prob[], double *chisq, double *p, Integer *ndf, double eval[], double chisqi[], NagError *fail)

3  Description

The χ 2  goodness-of-fit test performed by nag_chi_sq_goodness_of_fit_test (g08cgc) is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size n , denoted by x 1 , x 2 , , x n , drawn from a random variable X , and that the data have been grouped into k  classes,
xc1, ci-1<xci, i=2,3,,k-1, x>ck-1,
then the χ 2  goodness-of-fit test statistic is defined by:
X 2 = i=1 k O i - E i 2 E i
where O i  is the observed frequency of the i th class, and E i  is the expected frequency of the i th class.
The expected frequencies are computed as
E i = p i × n ,
where p i  is the probability that X  lies in the i th class, that is
p1=PXc1, pi=Pci-1<Xci, i=2,3,,k-1, pk=PX>ck-1.
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this function are:
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using nag_frequency_table (g01aec).
nag_chi_sq_goodness_of_fit_test (g08cgc) returns the χ 2  test statistic, X 2 , together with its degrees of freedom and the upper tail probability from the χ 2  distribution associated with the test statistic. Note that the use of the χ 2  distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.

4  References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

5  Arguments

1:     nclassIntegerInput
On entry: the number of classes, k , into which the data is divided.
Constraint: nclass2 .
2:     ifreq[nclass]const IntegerInput
On entry: ifreq[i-1]  must specify the frequency of the i th class, O i , for i=1,2,,k.
Constraint: ifreq[i-1] 0 , for i=1,2,, k.
3:     cint[nclass-1]const doubleInput
On entry: cint[i-1]  must specify the upper boundary value for the i th class, for i=1,2,,k - 1.
  • cint[0] < cint[1] < < cint[nclass-2] ;
  • For the exponential, gamma and χ 2  distributions cint[0] 0.0 .
4:     distNag_DistributionsInput
On entry: indicates for which distribution the test is to be carried out.
The Normal distribution is used.
The uniform distribution is used.
The exponential distribution is used.
The χ 2  distribution is used.
The gamma distribution is used.
You must supply the class probabilities in the array prob.
Constraint: dist=Nag_Normal, Nag_Uniform, Nag_Exponential, Nag_ChiSquare, Nag_Gamma or Nag_UserProb.
5:     par[2]const doubleInput
On entry: par must contain the arguments of the distribution which is being tested. If you supply the probabilities (i.e., dist=Nag_UserProb) the array par is not referenced.
If a Normal distribution is used then par[0]  and par[1]  must contain the mean, μ , and the variance, σ 2 , respectively.
If a uniform distribution is used then par[0]  and par[1]  must contain the boundaries a  and b  respectively.
If an exponential distribution is used then par[0]  must contain the argument λ . par[1]  is not used.
If a χ 2  distribution is used then par[0]  must contain the number of degrees of freedom. par[1]  is not used.
If a gamma distribution is used par[0]  and par[1]  must contain the arguments α  and β  respectively.
  • if dist=Nag_Normal, par[1] > 0.0 ;
  • if dist=Nag_Uniform, par[0] < par[1]  and par[0] cint[0] ;
  • otherwise par[1] cint nclass-2 ;
  • if dist=Nag_Exponential, par[0] > 0.0 ;
  • if dist=Nag_ChiSquare, par[0] > 0.0 ;
  • if dist=Nag_Gamma, par[0]  and par[1] > 0.0 .
6:     npestIntegerInput
On entry: the number of estimated arguments of the distribution.
Constraint: 0 npest < nclass - 1 .
7:     prob[nclass]const doubleInput
On entry: if you are supplying the probability distribution (i.e., dist=Nag_UserProb) then prob[i-1]  must contain the probability that X  lies in the i th class.
If distNag_UserProb, prob is not referenced.
Constraint: if dist=Nag_UserProb, prob[i-1] > 0.0  and i=1 k prob[i-1] = 1.0 , for i=1,2,,k.
8:     chisqdouble *Output
On exit: the test statistic, X 2 , for the χ 2  goodness-of-fit test.
9:     pdouble *Output
On exit: the upper tail probability from the χ 2  distribution associated with the test statistic, X 2 , and the number of degrees of freedom.
10:   ndfInteger *Output
On exit: contains nclass - 1 - npest , the degrees of freedom associated with the test.
11:   eval[nclass]doubleOutput
On exit: eval[i-1]  contains the expected frequency for the i th class, E i , for i=1,2,,k.
12:   chisqi[nclass]doubleOutput
On exit: chisqi[i-1]  contains the contribution from the i th class to the test statistic, that is O i - E i 2 / E i , for i=1,2,,k.
13:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

The contents of array prob are not valid.
Constraint: Sum of prob[i-1] = 1 , for i=1,2,,nclass, when dist=Nag_UserProb.
On entry, the values provided in par are invalid.
On entry, argument dist had an illegal value.
This is a warning that expected values for certain classes are less than 1.0. This implies that one cannot be confident that the χ 2  distribution is a good approximation to the distribution of the test statistic.
The solution obtained when calculating the probability for a certain class for the gamma or χ 2  distribution did not converge in 600 iterations. The solution may be an adequate approximation.
An expected frequency is equal to zero when the observed frequency is not.
On entry, npest=value , nclass=value .
Constraint: 0 npest < nclass - 1 .
On entry, nclass=value.
Constraint: nclass2.
On entry, ifreq[value] = value.
Constraint: ifreq[i-1] 0 , for i=1,2,,nclass.
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
The sequence cint is not strictly increasing cint[value] = value, cint[value-1] = value.
On entry, prob[value] = value.
Constraint: prob[i-1] > 0 , for i=1,2,,nclass, when dist=Nag_UserProb.
On entry, cint[0] = value.
Constraint: cint[0] 0.0 , if dist=Nag_ExponentialNag_ChiSquareNag_Gamma.

7  Accuracy

The computations are believed to be stable.

8  Further Comments

The time taken by nag_chi_sq_goodness_of_fit_test (g08cgc) is dependent both on the distribution chosen and on the number of classes, k .

9  Example

The example program applies the χ 2  goodness-of-fit test to test whether there is evidence to suggest that a sample of 100 observations generated by nag_rand_uniform (g05sqc) do not arise from a uniform distribution U 0,1 . The class intervals are calculated such that the interval (0,1) is divided into five equal classes. The frequencies for each class are calculated using nag_frequency_table (g01aec).

9.1  Program Text

Program Text (g08cgce.c)

9.2  Program Data

Program Data (g08cgce.d)

9.3  Program Results

Program Results (g08cgce.r)

nag_chi_sq_goodness_of_fit_test (g08cgc) (PDF version)
g08 Chapter Contents
g08 Chapter Introduction
NAG C Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012