# NAG Library Routine Document

## 1Purpose

g08cgf computes the test statistic for the ${\chi }^{2}$ goodness-of-fit test for data with a chosen number of class intervals.

## 2Specification

Fortran Interface
 Subroutine g08cgf ( cb, dist, par, prob, p, ndf, eval,
 Integer, Intent (In) :: nclass, ifreq(nclass), npest Integer, Intent (Inout) :: ifail Integer, Intent (Out) :: ndf Real (Kind=nag_wp), Intent (In) :: cb(nclass-1), par(2), prob(nclass) Real (Kind=nag_wp), Intent (Out) :: chisq, p, eval(nclass), chisqi(nclass) Character (1), Intent (In) :: dist
#include <nagmk26.h>
 void g08cgf_ (const Integer *nclass, const Integer ifreq[], const double cb[], const char *dist, const double par[], const Integer *npest, const double prob[], double *chisq, double *p, Integer *ndf, double eval[], double chisqi[], Integer *ifail, const Charlen length_dist)

## 3Description

The ${\chi }^{2}$ goodness-of-fit test performed by g08cgf is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$, and that the data has been grouped into $k$ classes,
 $x≤c1, ci-1ck-1,$
then the ${\chi }^{2}$ goodness-of-fit test statistic is defined by
 $X2=∑i=1k Oi-Ei 2Ei,$
where ${O}_{i}$ is the observed frequency of the $i$th class, and ${E}_{i}$ is the expected frequency of the $i$th class.
The expected frequencies are computed as
 $Ei=pi×n,$
where ${p}_{i}$ is the probability that $X$ lies in the $i$th class, that is
 $p1=PX≤c1, pi=Pci-1ck-1.$
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this routine are:
• Normal distribution with mean $\mu$, variance ${\sigma }^{2}$;
• uniform distribution on the interval $\left[a,b\right]$;
• exponential distribution with probability density function $\left(\mathrm{pdf}\right)=\lambda {e}^{-\lambda x}$;
• ${\chi }^{2}$-distribution with $f$ degrees of freedom; and
• gamma distribution with $\mathrm{pdf}=\frac{{x}^{\alpha -1}{e}^{-x/\beta }}{\Gamma \left(\alpha \right){\beta }^{\alpha }}$.
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using g01aef.
g08cgf returns the ${\chi }^{2}$ test statistic, ${X}^{2}$, together with its degrees of freedom and the upper tail probability from the ${\chi }^{2}$-distribution associated with the test statistic. Note that the use of the ${\chi }^{2}$-distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.
Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

## 5Arguments

1:     $\mathbf{nclass}$ – IntegerInput
On entry: $k$, the number of classes into which the data is divided.
Constraint: ${\mathbf{nclass}}\ge 2$.
2:     $\mathbf{ifreq}\left({\mathbf{nclass}}\right)$ – Integer arrayInput
On entry: ${\mathbf{ifreq}}\left(\mathit{i}\right)$ must specify the frequency of the $\mathit{i}$th class, ${O}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
Constraint: ${\mathbf{ifreq}}\left(\mathit{i}\right)\ge 0$, for $\mathit{i}=1,2,\dots ,k$.
3:     $\mathbf{cb}\left({\mathbf{nclass}}-1\right)$ – Real (Kind=nag_wp) arrayInput
On entry: ${\mathbf{cb}}\left(\mathit{i}\right)$ must specify the upper boundary value for the $\mathit{i}$th class, for $\mathit{i}=1,2,\dots ,k-1$.
Constraint: ${\mathbf{cb}}\left(1\right)<{\mathbf{cb}}\left(2\right)<\cdots <{\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$. For the exponential, gamma and ${\chi }^{2}$-distributions ${\mathbf{cb}}\left(1\right)\ge 0.0$.
4:     $\mathbf{dist}$ – Character(1)Input
On entry: indicates for which distribution the test is to be carried out.
${\mathbf{dist}}=\text{'N'}$
The Normal distribution is used.
${\mathbf{dist}}=\text{'U'}$
The uniform distribution is used.
${\mathbf{dist}}=\text{'E'}$
The exponential distribution is used.
${\mathbf{dist}}=\text{'C'}$
The ${\chi }^{2}$-distribution is used.
${\mathbf{dist}}=\text{'G'}$
The gamma distribution is used.
${\mathbf{dist}}=\text{'A'}$
You must supply the class probabilities in the array prob.
Constraint: ${\mathbf{dist}}=\text{'N'}$, $\text{'U'}$, $\text{'E'}$, $\text{'C'}$, $\text{'G'}$ or $\text{'A'}$.
5:     $\mathbf{par}\left(2\right)$ – Real (Kind=nag_wp) arrayInput
On entry: must contain the parameters of the distribution which is being tested. If you supply the probabilities (i.e., ${\mathbf{dist}}=\text{'A'}$) the array par is not referenced.
If a Normal distribution is used then ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the mean, $\mu$, and the variance, ${\sigma }^{2}$, respectively.
If a uniform distribution is used then ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the boundaries $a$ and $b$ respectively.
If an exponential distribution is used then ${\mathbf{par}}\left(1\right)$ must contain the parameter $\lambda$. ${\mathbf{par}}\left(2\right)$ is not used.
If a ${\chi }^{2}$-distribution is used then ${\mathbf{par}}\left(1\right)$ must contain the number of degrees of freedom. ${\mathbf{par}}\left(2\right)$ is not used.
If a gamma distribution is used ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the parameters $\alpha$ and $\beta$ respectively.
Constraints:
• if ${\mathbf{dist}}=\text{'N'}$, ${\mathbf{par}}\left(2\right)>0.0$;
• if ${\mathbf{dist}}=\text{'U'}$, ${\mathbf{par}}\left(1\right)<{\mathbf{par}}\left(2\right)$ and ${\mathbf{par}}\left(1\right)\le {\mathbf{cb}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)\ge {\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$;
• if ${\mathbf{dist}}=\text{'E'}$, ${\mathbf{par}}\left(1\right)>0.0$;
• if ${\mathbf{dist}}=\text{'C'}$, ${\mathbf{par}}\left(1\right)>0.0$;
• if ${\mathbf{dist}}=\text{'G'}$, ${\mathbf{par}}\left(1\right)>0.0$ and ${\mathbf{par}}\left(2\right)>0.0$.
6:     $\mathbf{npest}$ – IntegerInput
On entry: the number of estimated parameters of the distribution.
Constraint: $0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
7:     $\mathbf{prob}\left({\mathbf{nclass}}\right)$ – Real (Kind=nag_wp) arrayInput
On entry: if you are supplying the probability distribution (i.e., ${\mathbf{dist}}=\text{'A'}$) then ${\mathbf{prob}}\left(i\right)$ must contain the probability that $X$ lies in the $i$th class.
If ${\mathbf{dist}}\ne \text{'A'}$, prob is not referenced.
Constraint: if ${\mathbf{dist}}=\text{'A'}$, $\sum _{i=1}^{k}{\mathbf{prob}}\left(i\right)=1.0$, ${\mathbf{prob}}\left(\mathit{i}\right)>0.0$, for $\mathit{i}=1,2,\dots ,k$.
8:     $\mathbf{chisq}$ – Real (Kind=nag_wp)Output
On exit: the test statistic, ${X}^{2}$, for the ${\chi }^{2}$ goodness-of-fit test.
9:     $\mathbf{p}$ – Real (Kind=nag_wp)Output
On exit: the upper tail probability from the ${\chi }^{2}$-distribution associated with the test statistic, ${X}^{2}$, and the number of degrees of freedom.
10:   $\mathbf{ndf}$ – IntegerOutput
On exit: contains $\left({\mathbf{nclass}}-1-{\mathbf{npest}}\right)$, the degrees of freedom associated with the test.
11:   $\mathbf{eval}\left({\mathbf{nclass}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: ${\mathbf{eval}}\left(\mathit{i}\right)$ contains the expected frequency for the $\mathit{i}$th class, ${E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
12:   $\mathbf{chisqi}\left({\mathbf{nclass}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: ${\mathbf{chisqi}}\left(\mathit{i}\right)$ contains the contribution from the $\mathit{i}$th class to the test statistic, that is, ${\left({O}_{\mathit{i}}-{E}_{\mathit{i}}\right)}^{2}/{E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
13:   $\mathbf{ifail}$ – IntegerInput/Output
On entry: ifail must be set to $0$, . If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value  is recommended. If the output of error messages is undesirable, then the value $1$ is recommended. Otherwise, because for this routine the values of the output arguments may be useful even if ${\mathbf{ifail}}\ne {\mathbf{0}}$ on exit, the recommended value is $-1$. When the value  is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Note: g08cgf may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{nclass}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nclass}}\ge 2$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{dist}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{dist}}=\text{'N'}$, $\text{'U'}$, $\text{'E'}$, $\text{'C'}$, $\text{'G'}$ or $\text{'A'}$.
${\mathbf{ifail}}=3$
On entry, ${\mathbf{npest}}=〈\mathit{\text{value}}〉$.
Constraint: $0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
${\mathbf{ifail}}=4$
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{ifreq}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ifreq}}\left(i\right)\ge 0$.
${\mathbf{ifail}}=5$
On entry, $i=〈\mathit{\text{value}}〉$, ${\mathbf{cb}}\left(i-1\right)=〈\mathit{\text{value}}〉$ and ${\mathbf{cb}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{cb}}\left(i-1\right)<{\mathbf{cb}}\left(i\right)$.
${\mathbf{ifail}}=6$
On entry, ${\mathbf{cb}}\left(1\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{cb}}\left(1\right)\ge 0.0$.
${\mathbf{ifail}}=7$
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$.
Constraint: for the exponential distribution, ${\mathbf{par}}\left(1\right)>0.0$.
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$.
Constraint: for the ${\chi }^{2}$ distribution, ${\mathbf{par}}\left(1\right)>0.0$.
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$ and ${\mathbf{par}}\left(2\right)=〈\mathit{\text{value}}〉$.
Constraint: for the gamma distribution, ${\mathbf{par}}\left(1\right)>0.0$ and ${\mathbf{par}}\left(2\right)>0.0$.
On entry, ${\mathbf{par}}\left(1\right)=〈\mathit{\text{value}}〉$ and ${\mathbf{par}}\left(2\right)=〈\mathit{\text{value}}〉$.
Constraint: for the uniform distribution, ${\mathbf{par}}\left(1\right)<{\mathbf{par}}\left(2\right)$, ${\mathbf{par}}\left(1\right)\le {\mathbf{cb}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)\ge {\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$.
On entry, ${\mathbf{par}}\left(2\right)=〈\mathit{\text{value}}〉$.
Constraint: for the Normal distribution, ${\mathbf{par}}\left(2\right)>0.0$.
${\mathbf{ifail}}=8$
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{prob}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{prob}}>0.0$
On entry, ${\sum }_{i}{\mathbf{prob}}\left(i\right)=〈\mathit{\text{value}}〉$.
Constraint: ${\sum }_{i}{\mathbf{prob}}\left(i\right)=1.0$.
${\mathbf{ifail}}=9$
An expected frequency equals zero, when the observed frequency was not.
${\mathbf{ifail}}=10$
At least one class has an expected frequency less than $1$. The ${\chi }^{2}$ distribution may not be a good approximation to the distribution of the test statistic.
${\mathbf{ifail}}=11$
The solution has failed to converge whilst computing the expected values. The returned solution may be an adequate approximation.
${\mathbf{ifail}}=-99$
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

## 7Accuracy

The computations are believed to be stable.

## 8Parallelism and Performance

g08cgf is not threaded in any implementation.

The time taken by g08cgf is dependent both on the distribution chosen and on the number of classes, $k$.

## 10Example

This example applies the ${\chi }^{2}$ goodness-of-fit test to test whether there is evidence to suggest that a sample of $100$ randomly generated observations do not arise from a uniform distribution $U\left(0,1\right)$. The class intervals are calculated such that the interval $\left(0,1\right)$ is divided into five equal classes. The frequencies for each class are calculated using g01aef.

### 10.1Program Text

Program Text (g08cgfe.f90)

### 10.2Program Data

Program Data (g08cgfe.d)

### 10.3Program Results

Program Results (g08cgfe.r)