hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_nonpar_test_chisq (g08cg)

Purpose

nag_nonpar_test_chisq (g08cg) computes the test statistic for the χ2χ2 goodness-of-fit test for data with a chosen number of class intervals.

Syntax

[chisq, p, ndf, eval, chisqi, ifail] = g08cg(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)
[chisq, p, ndf, eval, chisqi, ifail] = nag_nonpar_test_chisq(ifreq, cb, dist, par, npest, prob, 'nclass', nclass)

Description

The χ2χ2 goodness-of-fit test performed by nag_nonpar_test_chisq (g08cg) is used to test the null hypothesis that a random sample arises from a specified distribution against the alternative hypothesis that the sample does not arise from the specified distribution.
Given a sample of size nn, denoted by x1,x2,,xnx1,x2,,xn, drawn from a random variable XX, and that the data has been grouped into kk classes,
xc1,
ci1 < xci, i = 2,3,,k1,
x > ck1,
xc1, ci-1<xci, i=2,3,,k-1, x>ck-1,
then the χ2χ2 goodness-of-fit test statistic is defined by
k
X2 = ((OiEi)2)/(Ei),
i = 1
X2=i=1k (Oi-Ei) 2Ei,
where OiOi is the observed frequency of the iith class, and EiEi is the expected frequency of the iith class.
The expected frequencies are computed as
Ei = pi × n,
Ei=pi×n,
where pipi is the probability that XX lies in the iith class, that is
p1 = P(Xc1),
pi = P(ci1 < Xci), i = 2,3,,k1,
pk = P(X > ck1).
p1=P(Xc1), pi=P(ci-1<Xci), i=2,3,,k-1, pk=P(X>ck-1).
These probabilities are either taken from a common probability distribution or are supplied by you. The available probability distributions within this function are:
You must supply the frequencies and classes. Given a set of data and classes the frequencies may be calculated using nag_stat_frequency_table (g01ae).
nag_nonpar_test_chisq (g08cg) returns the χ2χ2 test statistic, X2X2, together with its degrees of freedom and the upper tail probability from the χ2χ2-distribution associated with the test statistic. Note that the use of the χ2χ2-distribution as an approximation to the distribution of the test statistic improves as the expected values in each class increase.

References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill

Parameters

Compulsory Input Parameters

1:     ifreq(nclass) – int64int32nag_int array
nclass, the dimension of the array, must satisfy the constraint nclass2nclass2.
ifreq(i)ifreqi must specify the frequency of the iith class, OiOi, for i = 1,2,,ki=1,2,,k.
Constraint: ifreq(i)0ifreqi0, for i = 1,2,,ki=1,2,,k.
2:     cb(nclass1nclass-1) – double array
cb(i)cbi must specify the upper boundary value for the iith class, for i = 1,2,,k1i=1,2,,k-1.
Constraint: cb(1) < cb(2) < < cb(nclass1)cb1<cb2<<cbnclass-1. For the exponential, gamma and χ2χ2-distributions cb(1)0.0cb10.0.
3:     dist – string (length ≥ 1)
Indicates for which distribution the test is to be carried out.
dist = 'N'dist='N'
The Normal distribution is used.
dist = 'U'dist='U'
The uniform distribution is used.
dist = 'E'dist='E'
The exponential distribution is used.
dist = 'C'dist='C'
The χ2χ2-distribution is used.
dist = 'G'dist='G'
The gamma distribution is used.
dist = 'A'dist='A'
You must supply the class probabilities in the array prob.
Constraint: dist = 'N'dist='N', 'U''U', 'E''E', 'C''C', 'G''G' or 'A''A'.
4:     par(22) – double array
Must contain the parameters of the distribution which is being tested. If you supply the probabilities (i.e., dist = 'A'dist='A') the array par is not referenced.
If a Normal distribution is used then par(1)par1 and par(2)par2 must contain the mean, μμ, and the variance, σ2σ2, respectively.
If a uniform distribution is used then par(1)par1 and par(2)par2 must contain the boundaries aa and bb respectively.
If an exponential distribution is used then par(1)par1 must contain the parameter λλ. par(2)par2 is not used.
If a χ2χ2-distribution is used then par(1)par1 must contain the number of degrees of freedom. par(2)par2 is not used.
If a gamma distribution is used par(1)par1 and par(2)par2 must contain the parameters αα and ββ respectively.
Constraints:
  • if dist = 'N'dist='N', par(2) > 0.0par2>0.0;
  • if dist = 'U'dist='U', par(1) < par(2)par1<par2 and par(1)cb(1)par1cb1 and par(2)cb(nclass1)par2cbnclass-1;
  • if dist = 'E'dist='E', par(1) > 0.0par1>0.0;
  • if dist = 'C'dist='C', par(1) > 0.0par1>0.0;
  • if dist = 'G'dist='G', par(1) > 0.0par1>0.0 and par(2) > 0.0par2>0.0.
5:     npest – int64int32nag_int scalar
The number of estimated parameters of the distribution.
Constraint: 0npest < nclass10npest<nclass-1.
6:     prob(nclass) – double array
nclass, the dimension of the array, must satisfy the constraint nclass2nclass2.
If you are supplying the probability distribution (i.e., dist = 'A'dist='A') then prob(i)probi must contain the probability that XX lies in the iith class.
If dist'A'dist'A', prob is not referenced.
Constraint: if dist = 'A'dist='A', i = 1kprob(i) = 1.0i=1kprobi=1.0, prob(i) > 0.0probi>0.0, for i = 1,2,,ki=1,2,,k.

Optional Input Parameters

1:     nclass – int64int32nag_int scalar
Default: The dimension of the arrays ifreq, prob. (An error is raised if these dimensions are not equal.)
kk, the number of classes into which the data is divided.
Constraint: nclass2nclass2.

Input Parameters Omitted from the MATLAB Interface

None.

Output Parameters

1:     chisq – double scalar
The test statistic, X2X2, for the χ2χ2 goodness-of-fit test.
2:     p – double scalar
The upper tail probability from the χ2χ2-distribution associated with the test statistic, X2X2, and the number of degrees of freedom.
3:     ndf – int64int32nag_int scalar
Contains (nclass1npest)(nclass-1-npest), the degrees of freedom associated with the test.
4:     eval(nclass) – double array
eval(i)evali contains the expected frequency for the iith class, EiEi, for i = 1,2,,ki=1,2,,k.
5:     chisqi(nclass) – double array
chisqi(i)chisqii contains the contribution from the iith class to the test statistic, that is, (OiEi)2 / Ei (Oi-Ei) 2/Ei, for i = 1,2,,ki=1,2,,k.
6:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Note: nag_nonpar_test_chisq (g08cg) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

  ifail = 1ifail=1
On entry,nclass < 2nclass<2.
  ifail = 2ifail=2
On entry,dist is invalid.
  ifail = 3ifail=3
On entry,npest < 0npest<0,
ornpestnclass1npestnclass-1.
  ifail = 4ifail=4
On entry,ifreq(i) < 0.0ifreqi<0.0 for some ii, for i = 1,2,,ki=1,2,,k.
  ifail = 5ifail=5
On entry, the elements of cb are not in ascending order. That is, cb(i)cb(i1)cbicbi-1 for some ii, for i = 2,3,,k1i=2,3,,k-1.
  ifail = 6ifail=6
On entry, dist = 'E'dist='E', 'C''C' or 'G''G' and cb(1) < 0.0cb1<0.0. No negative class boundary values are valid for the exponential, gamma or χ2χ2-distributions.
  ifail = 7ifail=7
On entry,the values provided in par are invalid.
  ifail = 8ifail=8
On entry,with dist = 'A'dist='A', prob(i)0.0probi0.0 for some ii, for i = 1,2,,ki=1,2,,k,
ori = 1kprob(i)1.0i=1kprobi1.0.
  ifail = 9ifail=9
An expected frequency is equal to zero when the observed frequency was not.
W ifail = 10ifail=10
This is a warning that expected values for certain classes are less than 1.01.0. This implies that we cannot be confident that the χ2χ2-distribution is a good approximation to the distribution of the test statistic.
W ifail = 11ifail=11
The solution obtained when calculating the probability for a certain class for the gamma or χ2χ2-distribution did not converge in 600600 iterations. The solution may be an adequate approximation.

Accuracy

The computations are believed to be stable.

Further Comments

The time taken by nag_nonpar_test_chisq (g08cg) is dependent both on the distribution chosen and on the number of classes, kk.

Example

function nag_nonpar_test_chisq_example
ifreq = [int64(26);16;22;19;17];
cb = [0.2;
     0.4;
     0.6;
     0.8];
dist = 'U';
par = [0;
     1];
npest = int64(0);
prob = [0;
     0;
     0;
     4.878438904751203e+199;
     5.495816452771857e+222];
[chisq, p, ndf, eval, chisqi, ifail] = nag_nonpar_test_chisq(ifreq, cb, dist, par, npest, prob)
 

chisq =

    3.3000


p =

    0.5089


ndf =

                    4


eval =

   20.0000
   20.0000
   20.0000
   20.0000
   20.0000


chisqi =

    1.8000
    0.8000
    0.2000
    0.0500
    0.4500


ifail =

                    0


function g08cg_example
ifreq = [int64(26);16;22;19;17];
cb = [0.2;
     0.4;
     0.6;
     0.8];
dist = 'U';
par = [0;
     1];
npest = int64(0);
prob = [0;
     0;
     0;
     4.878438904751203e+199;
     5.495816452771857e+222];
[chisq, p, ndf, eval, chisqi, ifail] = g08cg(ifreq, cb, dist, par, npest, prob)
 

chisq =

    3.3000


p =

    0.5089


ndf =

                    4


eval =

   20.0000
   20.0000
   20.0000
   20.0000
   20.0000


chisqi =

    1.8000
    0.8000
    0.2000
    0.0500
    0.4500


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013