Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_nonpar_test_chisq (g08cg)

## Purpose

nag_nonpar_test_chisq (g08cg) computes the test statistic for the ${\chi }^{2}$ goodness-of-fit test for data with a chosen number of class intervals.

## Syntax

[chisq, p, ndf, eval, chisqi, ifail] = g08cg(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)
[chisq, p, ndf, eval, chisqi, ifail] = nag_nonpar_test_chisq(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)

## Description

The ${\chi }^{2}$ goodness-of-fit test performed by nag_nonpar_test_chisq (g08cg) is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$, and that the data has been grouped into $k$ classes,
 $x≤c1, ci-1ck-1,$
then the ${\chi }^{2}$ goodness-of-fit test statistic is defined by
 $X2=∑i=1k Oi-Ei 2Ei,$
where ${O}_{i}$ is the observed frequency of the $i$th class, and ${E}_{i}$ is the expected frequency of the $i$th class.
The expected frequencies are computed as
 $Ei=pi×n,$
where ${p}_{i}$ is the probability that $X$ lies in the $i$th class, that is
 $p1=PX≤c1, pi=Pci-1ck-1.$
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this function are:
• Normal distribution with mean $\mu$, variance ${\sigma }^{2}$;
• uniform distribution on the interval $\left[a,b\right]$;
• exponential distribution with probability density function $\left(\mathrm{pdf}\right)=\lambda {e}^{-\lambda x}$;
• ${\chi }^{2}$-distribution with $f$ degrees of freedom; and
• gamma distribution with $\mathrm{pdf}=\frac{{x}^{\alpha -1}{e}^{-x/\beta }}{\Gamma \left(\alpha \right){\beta }^{\alpha }}$.
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using nag_stat_frequency_table (g01ae).
nag_nonpar_test_chisq (g08cg) returns the ${\chi }^{2}$ test statistic, ${X}^{2}$, together with its degrees of freedom and the upper tail probability from the ${\chi }^{2}$-distribution associated with the test statistic. Note that the use of the ${\chi }^{2}$-distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.

## References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{ifreq}\left({\mathbf{nclass}}\right)$int64int32nag_int array
${\mathbf{ifreq}}\left(\mathit{i}\right)$ must specify the frequency of the $\mathit{i}$th class, ${O}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
Constraint: ${\mathbf{ifreq}}\left(\mathit{i}\right)\ge 0$, for $\mathit{i}=1,2,\dots ,k$.
2:     $\mathrm{cb}\left({\mathbf{nclass}}-1\right)$ – double array
${\mathbf{cb}}\left(\mathit{i}\right)$ must specify the upper boundary value for the $\mathit{i}$th class, for $\mathit{i}=1,2,\dots ,k-1$.
Constraint: ${\mathbf{cb}}\left(1\right)<{\mathbf{cb}}\left(2\right)<\cdots <{\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$. For the exponential, gamma and ${\chi }^{2}$-distributions ${\mathbf{cb}}\left(1\right)\ge 0.0$.
3:     $\mathrm{dist}$ – string (length ≥ 1)
Indicates for which distribution the test is to be carried out.
${\mathbf{dist}}=\text{'N'}$
The Normal distribution is used.
${\mathbf{dist}}=\text{'U'}$
The uniform distribution is used.
${\mathbf{dist}}=\text{'E'}$
The exponential distribution is used.
${\mathbf{dist}}=\text{'C'}$
The ${\chi }^{2}$-distribution is used.
${\mathbf{dist}}=\text{'G'}$
The gamma distribution is used.
${\mathbf{dist}}=\text{'A'}$
You must supply the class probabilities in the array prob.
Constraint: ${\mathbf{dist}}=\text{'N'}$, $\text{'U'}$, $\text{'E'}$, $\text{'C'}$, $\text{'G'}$ or $\text{'A'}$.
4:     $\mathrm{par}\left(2\right)$ – double array
Must contain the parameters of the distribution which is being tested. If you supply the probabilities (i.e., ${\mathbf{dist}}=\text{'A'}$) the array par is not referenced.
If a Normal distribution is used then ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the mean, $\mu$, and the variance, ${\sigma }^{2}$, respectively.
If a uniform distribution is used then ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the boundaries $a$ and $b$ respectively.
If an exponential distribution is used then ${\mathbf{par}}\left(1\right)$ must contain the parameter $\lambda$. ${\mathbf{par}}\left(2\right)$ is not used.
If a ${\chi }^{2}$-distribution is used then ${\mathbf{par}}\left(1\right)$ must contain the number of degrees of freedom. ${\mathbf{par}}\left(2\right)$ is not used.
If a gamma distribution is used ${\mathbf{par}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)$ must contain the parameters $\alpha$ and $\beta$ respectively.
Constraints:
• if ${\mathbf{dist}}=\text{'N'}$, ${\mathbf{par}}\left(2\right)>0.0$;
• if ${\mathbf{dist}}=\text{'U'}$, ${\mathbf{par}}\left(1\right)<{\mathbf{par}}\left(2\right)$ and ${\mathbf{par}}\left(1\right)\le {\mathbf{cb}}\left(1\right)$ and ${\mathbf{par}}\left(2\right)\ge {\mathbf{cb}}\left({\mathbf{nclass}}-1\right)$;
• if ${\mathbf{dist}}=\text{'E'}$, ${\mathbf{par}}\left(1\right)>0.0$;
• if ${\mathbf{dist}}=\text{'C'}$, ${\mathbf{par}}\left(1\right)>0.0$;
• if ${\mathbf{dist}}=\text{'G'}$, ${\mathbf{par}}\left(1\right)>0.0$ and ${\mathbf{par}}\left(2\right)>0.0$.
5:     $\mathrm{npest}$int64int32nag_int scalar
The number of estimated parameters of the distribution.
Constraint: $0\le {\mathbf{npest}}<{\mathbf{nclass}}-1$.
6:     $\mathrm{prob}\left({\mathbf{nclass}}\right)$ – double array
If you are supplying the probability distribution (i.e., ${\mathbf{dist}}=\text{'A'}$) then ${\mathbf{prob}}\left(i\right)$ must contain the probability that $X$ lies in the $i$th class.
If ${\mathbf{dist}}\ne \text{'A'}$, prob is not referenced.
Constraint: if ${\mathbf{dist}}=\text{'A'}$, $\sum _{i=1}^{k}{\mathbf{prob}}\left(i\right)=1.0$, ${\mathbf{prob}}\left(\mathit{i}\right)>0.0$, for $\mathit{i}=1,2,\dots ,k$.

### Optional Input Parameters

1:     $\mathrm{nclass}$int64int32nag_int scalar
Default: the dimension of the arrays ifreq, prob. (An error is raised if these dimensions are not equal.)
$k$, the number of classes into which the data is divided.
Constraint: ${\mathbf{nclass}}\ge 2$.

### Output Parameters

1:     $\mathrm{chisq}$ – double scalar
The test statistic, ${X}^{2}$, for the ${\chi }^{2}$ goodness-of-fit test.
2:     $\mathrm{p}$ – double scalar
The upper tail probability from the ${\chi }^{2}$-distribution associated with the test statistic, ${X}^{2}$, and the number of degrees of freedom.
3:     $\mathrm{ndf}$int64int32nag_int scalar
Contains $\left({\mathbf{nclass}}-1-{\mathbf{npest}}\right)$, the degrees of freedom associated with the test.
4:     $\mathrm{eval}\left({\mathbf{nclass}}\right)$ – double array
${\mathbf{eval}}\left(\mathit{i}\right)$ contains the expected frequency for the $\mathit{i}$th class, ${E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
5:     $\mathrm{chisqi}\left({\mathbf{nclass}}\right)$ – double array
${\mathbf{chisqi}}\left(\mathit{i}\right)$ contains the contribution from the $\mathit{i}$th class to the test statistic, that is, ${\left({O}_{\mathit{i}}-{E}_{\mathit{i}}\right)}^{2}/{E}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$.
6:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Note: nag_nonpar_test_chisq (g08cg) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

${\mathbf{ifail}}=1$
 On entry, ${\mathbf{nclass}}<2$.
${\mathbf{ifail}}=2$
 On entry, dist is invalid.
${\mathbf{ifail}}=3$
 On entry, ${\mathbf{npest}}<0$, or ${\mathbf{npest}}\ge {\mathbf{nclass}}-1$.
${\mathbf{ifail}}=4$
 On entry, ${\mathbf{ifreq}}\left(\mathit{i}\right)<0.0$ for some $\mathit{i}$, for $\mathit{i}=1,2,\dots ,k$.
${\mathbf{ifail}}=5$
On entry, the elements of cb are not in ascending order. That is, ${\mathbf{cb}}\left(\mathit{i}\right)\le {\mathbf{cb}}\left(\mathit{i}-1\right)$ for some $\mathit{i}$, for $\mathit{i}=2,3,\dots ,k-1$.
${\mathbf{ifail}}=6$
On entry, ${\mathbf{dist}}=\text{'E'}$, $\text{'C'}$ or $\text{'G'}$ and ${\mathbf{cb}}\left(1\right)<0.0$. No negative class boundary values are valid for the exponential, gamma or ${\chi }^{2}$-distributions.
${\mathbf{ifail}}=7$
 On entry, the values provided in par are invalid.
${\mathbf{ifail}}=8$
 On entry, with ${\mathbf{dist}}=\text{'A'}$, ${\mathbf{prob}}\left(i\right)\le 0.0$ for some $i$, for $i=1,2,\dots ,k$, or $\sum _{i=1}^{k}{\mathbf{prob}}\left(i\right)\ne 1.0$.
${\mathbf{ifail}}=9$
An expected frequency is equal to zero when the observed frequency was not.
W  ${\mathbf{ifail}}=10$
This is a warning that expected values for certain classes are less than $1.0$. This implies that we cannot be confident that the ${\chi }^{2}$-distribution is a good approximation to the distribution of the test statistic.
W  ${\mathbf{ifail}}=11$
The solution obtained when calculating the probability for a certain class for the gamma or ${\chi }^{2}$-distribution did not converge in $600$ iterations. The solution may be an adequate approximation.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

The computations are believed to be stable.

The time taken by nag_nonpar_test_chisq (g08cg) is dependent both on the distribution chosen and on the number of classes, $k$.

## Example

This example applies the ${\chi }^{2}$ goodness-of-fit test to test whether there is evidence to suggest that a sample of $100$ randomly generated observations do not arise from a uniform distribution $U\left(0,1\right)$. The class intervals are calculated such that the interval $\left(0,1\right)$ is divided into five equal classes. The frequencies for each class are calculated using nag_stat_frequency_table (g01ae).
```function g08cg_example

fprintf('g08cg example results\n\n');

x = [ 0.59 0.23 0.76 0.96 0.20 0.91 0.29 0.22 0.36 0.81 ...
0.91 0.80 0.17 0.82 0.07 0.74 0.15 0.91 0.26 0.98 ...
0.59 0.34 0.28 0.95 0.33 0.42 0.72 0.35 0.86 0.22 ...
0.15 0.39 0.32 0.82 0.13 0.48 0.46 0.74 0.99 0.26 ...
0.04 0.21 0.04 0.24 0.56 0.36 0.48 0.53 1.00 0.58 ...
0.50 0.41 0.03 0.38 0.89 0.40 0.66 0.79 0.34 0.94 ...
0.49 0.12 0.24 0.05 1.00 0.29 0.67 0.29 0.75 0.81 ...
0.45 0.21 0.51 0.68 0.78 0.20 0.23 0.57 0.25 0.48 ...
0.96 0.33 0.48 0.55 0.04 0.48 0.42 0.11 0.38 0.73 ...
0.91 0.45 0.59 0.97 0.27 0.27 0.25 0.99 0.99 0.80];

cb     = [0.2;     0.4;     0.6;     0.8;    1.0 ];
nclass = int64(5);

% Produce frequency table
[~, ifreq, ~, ~, ifail] = ...
g01ae( ...
nclass, x, 'cb', cb);

% Test parameters
dist   = 'Uniform';
npest  = int64(0);
par    = [0;  1];
prob   = zeros(nclass,1);

% Perform Chi^2 test
[chisq, p, ndf, eval, chisqi, ifail] = ...
g08cg( ...
ifreq, cb, dist, par, npest, prob, 'nclass', nclass);

fprintf('Chi-squared test statistic   = %10.4f\n', chisq);
fprintf('Degrees of freedom.          = %5d\n', ndf);
fprintf('Significance level           = %10.4f\n\n', p);
fprintf('The contributions to the test statistic are :-\n');
disp(chisqi');

```
```g08cg example results

Chi-squared test statistic   =    14.2000
Degrees of freedom.          =     4
Significance level           =     0.0067

The contributions to the test statistic are :-
3.2000    6.0500    0.4500    4.0500    0.4500

```