g08 Chapter Contents
g08 Chapter Introduction
NAG Library Manual

# NAG Library Function Documentnag_chi_sq_goodness_of_fit_test (g08cgc)

## 1  Purpose

nag_chi_sq_goodness_of_fit_test (g08cgc) computes the test statistic for the ${\chi }^{2}$ goodness-of-fit test for data with a chosen number of class intervals.

## 2  Specification

 #include #include
 void nag_chi_sq_goodness_of_fit_test (Integer nclass, const Integer ifreq[], const double cint[], Nag_Distributions dist, const double par[], Integer npest, const double prob[], double *chisq, double *p, Integer *ndf, double eval[], double chisqi[], NagError *fail)

## 3  Description

The ${\chi }^{2}$ goodness-of-fit test performed by nag_chi_sq_goodness_of_fit_test (g08cgc) is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$, and that the data have been grouped into $k$ classes,
 $x≤c1, ci-1ck-1,$
then the ${\chi }^{2}$ goodness-of-fit test statistic is defined by:
 $X 2 = ∑ i=1 k O i - E i 2 E i$
where ${O}_{i}$ is the observed frequency of the $i$th class, and ${E}_{i}$ is the expected frequency of the $i$th class.
The expected frequencies are computed as
 $E i = p i × n ,$
where ${p}_{i}$ is the probability that $X$ lies in the $i$th class, that is
 $p1=PX≤c1, pi=Pci-1ck-1.$
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this function are:
• Normal distribution with mean $\mu$, variance ${\sigma }^{2}$;
• uniform distribution on the interval $\left[a,b\right]$;
• exponential distribution with probability density function $pdf=\lambda {e}^{-\lambda x}$;
• ${\chi }^{2}$ distribution with $f$ degrees of freedom; and
• gamma distribution with $pdf=\frac{{x}^{\alpha -1}{e}^{-x/\beta }}{\Gamma \left(\alpha \right){\beta }^{\alpha }}$.
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using nag_frequency_table (g01aec).
nag_chi_sq_goodness_of_fit_test (g08cgc) returns the ${\chi }^{2}$ test statistic, ${X}^{2}$, together with its degrees of freedom and the upper tail probability from the ${\chi }^{2}$ distribution associated with the test statistic. Note that the use of the ${\chi }^{2}$ distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.

## 4  References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

## 5  Arguments

1:     nclassIntegerInput
On entry: the number of classes, $k$, into which the data is divided.
Constraint: ${\mathbf{nclass}}\ge 2$.
2:     ifreq[nclass]const IntegerInput
On entry: ${\mathbf{ifreq}}\left[\mathit{i}-1\right]$ must specify the frequency of the $\mathit{i}$th class, ${O}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
Constraint: ${\mathbf{ifreq}}\left[\mathit{i}-1\right]\ge 0$, for $\mathit{i}=1,2,\dots ,k$.
3:     cint[${\mathbf{nclass}}-1$]const doubleInput
On entry: ${\mathbf{cint}}\left[\mathit{i}-1\right]$ must specify the upper boundary value for the $\mathit{i}$th class, for $\mathit{i}=1,2,\dots ,k-1$.
Constraints:
• ${\mathbf{cint}}\left[0\right]<{\mathbf{cint}}\left[1\right]<\cdots <{\mathbf{cint}}\left[{\mathbf{nclass}}-2\right]$;
• For the exponential, gamma and ${\chi }^{2}$ distributions ${\mathbf{cint}}\left[0\right]\ge 0.0$.
4:     distNag_DistributionsInput
On entry: indicates for which distribution the test is to be carried out.
${\mathbf{dist}}=\mathrm{Nag_Normal}$
The Normal distribution is used.
${\mathbf{dist}}=\mathrm{Nag_Uniform}$
The uniform distribution is used.
${\mathbf{dist}}=\mathrm{Nag_Exponential}$
The exponential distribution is used.
${\mathbf{dist}}=\mathrm{Nag_ChiSquare}$
The ${\chi }^{2}$ distribution is used.
${\mathbf{dist}}=\mathrm{Nag_Gamma}$
The gamma distribution is used.
${\mathbf{dist}}=\mathrm{Nag_UserProb}$
You must supply the class probabilities in the array prob.
Constraint: ${\mathbf{dist}}=\mathrm{Nag_Normal}$, $\mathrm{Nag_Uniform}$, $\mathrm{Nag_Exponential}$, $\mathrm{Nag_ChiSquare}$, $\mathrm{Nag_Gamma}$ or $\mathrm{Nag_UserProb}$.
5:     par[$2$]const doubleInput
On entry: par must contain the arguments of the distribution which is being tested. If you supply the probabilities (i.e., ${\mathbf{dist}}=\mathrm{Nag_UserProb}$) the array par is not referenced.
If a Normal distribution is used then ${\mathbf{par}}\left[0\right]$ and ${\mathbf{par}}\left[1\right]$ must contain the mean, $\mu$, and the variance, ${\sigma }^{2}$, respectively.
If a uniform distribution is used then ${\mathbf{par}}\left[0\right]$ and ${\mathbf{par}}\left[1\right]$ must contain the boundaries $a$ and $b$ respectively.
If an exponential distribution is used then ${\mathbf{par}}\left[0\right]$ must contain the argument $\lambda$. ${\mathbf{par}}\left[1\right]$ is not used.
If a ${\chi }^{2}$ distribution is used then ${\mathbf{par}}\left[0\right]$ must contain the number of degrees of freedom. ${\mathbf{par}}\left[1\right]$ is not used.
If a gamma distribution is used ${\mathbf{par}}\left[0\right]$ and ${\mathbf{par}}\left[1\right]$ must contain the arguments $\alpha$ and $\beta$ respectively.
Constraints:
• if ${\mathbf{dist}}=\mathrm{Nag_Normal}$, ${\mathbf{par}}\left[1\right]>0.0$;
• if ${\mathbf{dist}}=\mathrm{Nag_Uniform}$, ${\mathbf{par}}\left[0\right]<{\mathbf{par}}\left[1\right]$ and ${\mathbf{par}}\left[0\right]\le {\mathbf{cint}}\left[0\right]$;
• otherwise ${\mathbf{par}}\left[1\right]\ge {\mathbf{cint}}\left({\mathbf{nclass}}-2\right)$;
• if ${\mathbf{dist}}=\mathrm{Nag_Exponential}$, ${\mathbf{par}}\left[0\right]>0.0$;
• if ${\mathbf{dist}}=\mathrm{Nag_ChiSquare}$, ${\mathbf{par}}\left[0\right]>0.0$;
• if ${\mathbf{dist}}=\mathrm{Nag_Gamma}$, ${\mathbf{par}}\left[0\right]$ and ${\mathbf{par}}\left[1\right]>0.0$.
6:     npestIntegerInput
On entry: the number of estimated arguments of the distribution.
Constraint: $0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
7:     prob[nclass]const doubleInput
On entry: if you are supplying the probability distribution (i.e., ${\mathbf{dist}}=\mathrm{Nag_UserProb}$) then ${\mathbf{prob}}\left[i-1\right]$ must contain the probability that $X$ lies in the $i$th class.
If ${\mathbf{dist}}\ne \mathrm{Nag_UserProb}$, prob is not referenced.
Constraint: if ${\mathbf{dist}}=\mathrm{Nag_UserProb}$, ${\mathbf{prob}}\left[\mathit{i}-1\right]>0.0$ and ${\sum }_{\mathit{i}=1}^{k}{\mathbf{prob}}\left[\mathit{i}-1\right]=1.0$, for $\mathit{i}=1,2,\dots ,k$.
8:     chisqdouble *Output
On exit: the test statistic, ${X}^{2}$, for the ${\chi }^{2}$ goodness-of-fit test.
9:     pdouble *Output
On exit: the upper tail probability from the ${\chi }^{2}$ distribution associated with the test statistic, ${X}^{2}$, and the number of degrees of freedom.
10:   ndfInteger *Output
On exit: contains $\left({\mathbf{nclass}}-1-{\mathbf{npest}}\right)$, the degrees of freedom associated with the test.
11:   eval[nclass]doubleOutput
On exit: ${\mathbf{eval}}\left[\mathit{i}-1\right]$ contains the expected frequency for the $\mathit{i}$th class, ${E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
12:   chisqi[nclass]doubleOutput
On exit: ${\mathbf{chisqi}}\left[\mathit{i}-1\right]$ contains the contribution from the $\mathit{i}$th class to the test statistic, that is ${\left({O}_{\mathit{i}}-{E}_{\mathit{i}}\right)}^{2}/{E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
13:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_ARRAY_CONS
The contents of array prob are not valid.
Constraint: Sum of ${\mathbf{prob}}\left[\mathit{i}-1\right]=1$, for $\mathit{i}=1,2,\dots ,{\mathbf{nclass}}$, when ${\mathbf{dist}}=\mathrm{Nag_UserProb}$.
NE_ARRAY_INPUT
On entry, the values provided in par are invalid.
On entry, argument dist had an illegal value.
NE_G08CG_CLASS_VAL
This is a warning that expected values for certain classes are less than 1.0. This implies that one cannot be confident that the ${\chi }^{2}$ distribution is a good approximation to the distribution of the test statistic.
NE_G08CG_CONV
The solution obtained when calculating the probability for a certain class for the gamma or ${\chi }^{2}$ distribution did not converge in 600 iterations. The solution may be an adequate approximation.
NE_G08CG_FREQ
An expected frequency is equal to zero when the observed frequency is not.
NE_INT_2
On entry, ${\mathbf{npest}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{nclass}}=⟨\mathit{\text{value}}⟩$.
Constraint: $0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
NE_INT_ARG_LT
On entry, ${\mathbf{nclass}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{nclass}}\ge 2$.
NE_INT_ARRAY_CONS
On entry, ${\mathbf{ifreq}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ifreq}}\left[\mathit{i}-1\right]\ge 0$, for $\mathit{i}=1,2,\dots ,{\mathbf{nclass}}$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NOT_STRICTLY_INCREASING
The sequence cint is not strictly increasing ${\mathbf{cint}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$, ${\mathbf{cint}}\left[⟨\mathit{\text{value}}⟩-1\right]=⟨\mathit{\text{value}}⟩$.
NE_REAL_ARRAY_CONS
On entry, ${\mathbf{prob}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{prob}}\left[\mathit{i}-1\right]>0$, for $\mathit{i}=1,2,\dots ,{\mathbf{nclass}}$, when ${\mathbf{dist}}=\mathrm{Nag_UserProb}$.
NE_REAL_ARRAY_ELEM_CONS
On entry, ${\mathbf{cint}}\left[0\right]=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{cint}}\left[0\right]\ge 0.0$, if ${\mathbf{dist}}=\mathrm{Nag_Exponential}‖\mathrm{Nag_ChiSquare}‖\mathrm{Nag_Gamma}$.

## 7  Accuracy

The computations are believed to be stable.

Not applicable.

## 9  Further Comments

The time taken by nag_chi_sq_goodness_of_fit_test (g08cgc) is dependent both on the distribution chosen and on the number of classes, $k$.

## 10  Example

The example program applies the ${\chi }^{2}$ goodness-of-fit test to test whether there is evidence to suggest that a sample of 100 observations generated by nag_rand_uniform (g05sqc) do not arise from a uniform distribution $U\left(0,1\right)$. The class intervals are calculated such that the interval (0,1) is divided into five equal classes. The frequencies for each class are calculated using nag_frequency_table (g01aec).

### 10.1  Program Text

Program Text (g08cgce.c)

### 10.2  Program Data

Program Data (g08cgce.d)

### 10.3  Program Results

Program Results (g08cgce.r)