NAG Library Routine Document

g11aaf (chisq)

 Contents

    1  Purpose
    7  Accuracy

1
Purpose

g11aaf computes χ2 statistics for a two-way contingency table. For a 2×2 table with a small number of observations exact probabilities are computed.

2
Specification

Fortran Interface
Subroutine g11aaf ( nrow, ncol, nobs, ldnobs, expt, chist, prob, chi, g, df, ifail)
Integer, Intent (In):: nrow, ncol, nobs(ldnobs,ncol), ldnobs
Integer, Intent (Inout):: ifail
Real (Kind=nag_wp), Intent (Inout):: expt(ldnobs,ncol), chist(ldnobs,ncol)
Real (Kind=nag_wp), Intent (Out):: prob, chi, g, df
C Header Interface
#include nagmk26.h
void  g11aaf_ (const Integer *nrow, const Integer *ncol, const Integer nobs[], const Integer *ldnobs, double expt[], double chist[], double *prob, double *chi, double *g, double *df, Integer *ifail)

3
Description

For a set of n observations classified by two variables, with r and c levels respectively, a two-way table of frequencies with r rows and c columns can be computed.
n11 n12 n1c n1. n21 n22 n2c n2. nr1 nr2 nrc nr. n.1 n.2 n.c n  
To measure the association between the two classification variables two statistics that can be used are, the Pearson χ2 statistic, i=1rj=1c nij-fij 2fij , and the likelihood ratio test statistic, 2i=1rj=1c nij×lognij/fij, where fij are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association. These values are the expected cell frequencies and are given by
fij=ni.n.j/n.  
Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a χ2-distribution with c-1r-1 degrees of freedom. This distribution is arrived at under the assumption that the expected cell frequencies, fij, are not too small. For a discussion of this point see Everitt (1977). He concludes by saying, ‘... in the majority of cases the chi-square criterion may be used for tables with expectations in excess of 0.5 in the smallest cell’.
In the case of the 2×2 table, i.e., c=2 and r=2, the χ2 approximation can be improved by using Yates' continuity correction factor. This decreases the absolute value of nij-fij by 12 . For 2×2 tables with a small value of n the exact probabilities from Fisher's test are computed. These are based on the hypergeometric distribution and are computed using g01blf. A two tail probability is computed as min1,2pu,2pl , where pu and pl are the upper and lower one-tail probabilities from the hypergeometric distribution.

4
References

Everitt B S (1977) The Analysis of Contingency Tables Chapman and Hall
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin

5
Arguments

1:     nrow – IntegerInput
On entry: r, the number of rows in the contingency table.
Constraint: nrow2.
2:     ncol – IntegerInput
On entry: c, the number of columns in the contingency table.
Constraint: ncol2.
3:     nobsldnobsncol – Integer arrayInput
On entry: the contingency table nobsij must contain nij, for i=1,2,,r and j=1,2,,c.
Constraint: nobsij0, for i=1,2,,r and j=1,2,,c.
4:     ldnobs – IntegerInput
On entry: the first dimension of the arrays nobs, expt and chist as declared in the (sub)program from which g11aaf is called.
Constraint: ldnobsnrow.
5:     exptldnobsncol – Real (Kind=nag_wp) arrayOutput
On exit: the table of expected values. exptij contains fij, for i=1,2,,r and j=1,2,,c.
6:     chistldnobsncol – Real (Kind=nag_wp) arrayOutput
On exit: the table of χ2 contributions. chistij contains nij-fij 2fij , for i=1,2,,r and j=1,2,,c.
7:     prob – Real (Kind=nag_wp)Output
On exit: if c=2, r=2 and n40 then prob contains the two tail significance level for Fisher's exact test, otherwise prob contains the significance level from the Pearson χ2 statistic.
8:     chi – Real (Kind=nag_wp)Output
On exit: the Pearson χ2 statistic.
9:     g – Real (Kind=nag_wp)Output
On exit: the likelihood ratio test statistic.
10:   df – Real (Kind=nag_wp)Output
On exit: the degrees of freedom for the statistics.
11:   ifail – IntegerInput/Output
On entry: ifail must be set to 0, -1​ or ​1. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1​ or ​1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, because for this routine the values of the output arguments may be useful even if ifail0 on exit, the recommended value is -1. When the value -1​ or ​1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6
Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Note: g11aaf may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the routine:
ifail=1
On entry,nrow<2,
orncol<2,
orldnobs<nrow.
ifail=2
On entry,a value in nobs<0, or all values in nobs are zero.
ifail=3
On entry,a 2×2 table has a row or column with both values 0.
ifail=4
At least one cell has expected frequency, fij, 0.5. The χ2 approximation may be poor.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7
Accuracy

For the accuracy of the probabilities for Fisher's exact test see g01blf.

8
Parallelism and Performance

g11aaf is not threaded in any implementation.

9
Further Comments

The routine g01aff allows for the automatic amalgamation of rows and columns. In most circumstances this is not recommended; see Everitt (1977).
Multidimensional contingency tables can be analysed using log-linear models fitted by g02gbf.

10
Example

The data below, taken from Everitt (1977), is from 141 patients with brain tumours. The row classification variable is the site of the tumour: frontal lobes, temporal lobes and other cerebral areas. The column classification variable is the type of tumour: benign, malignant and other cerebral tumours.
23 9 6 38 21 4 3 28 34 24 17 75 78 37 26 141  
The data is read in and the statistics computed and printed.

10.1
Program Text

Program Text (g11aafe.f90)

10.2
Program Data

Program Data (g11aafe.d)

10.3
Program Results

Program Results (g11aafe.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017