NAG FL Interface
g01aff (contingency_table)
1
Purpose
g01aff performs the analysis of a twoway $r\times c$ contingency table or classification. If $r=c=2$, and the total number of objects classified is $40$ or fewer, then the probabilities for Fisher's exact test are computed. Otherwise, a test statistic is computed (with Yates' correction when $r=c=2$), which under the assumption of no association between the classifications has approximately a chisquare distribution with $\left(r1\right)\times \left(c1\right)$ degrees of freedom.
2
Specification
Fortran Interface
Subroutine g01aff ( 
ldnob, ldpred, m, n, nobs, num, pred, chis, p, npos, ndf, m1, n1, ifail) 
Integer, Intent (In) 
:: 
ldnob, ldpred, m, n 
Integer, Intent (Inout) 
:: 
nobs(ldnob,n), num, ifail 
Integer, Intent (Out) 
:: 
npos, ndf, m1, n1 
Real (Kind=nag_wp), Intent (Inout) 
:: 
pred(ldpred,n) 
Real (Kind=nag_wp), Intent (Out) 
:: 
chis, p(21) 

C Header Interface
#include <nag.h>
void 
g01aff_ (const Integer *ldnob, const Integer *ldpred, const Integer *m, const Integer *n, Integer nobs[], Integer *num, double pred[], double *chis, double p[], Integer *npos, Integer *ndf, Integer *m1, Integer *n1, Integer *ifail) 

C++ Header Interface
#include <nag.h> extern "C" {
void 
g01aff_ (const Integer &ldnob, const Integer &ldpred, const Integer &m, const Integer &n, Integer nobs[], Integer &num, double pred[], double &chis, double p[], Integer &npos, Integer &ndf, Integer &m1, Integer &n1, Integer &ifail) 
}

The routine may be called by the names g01aff or nagf_stat_contingency_table.
3
Description
The data consist of the frequencies for the twoway classification, denoted by ${n}_{\mathit{i}\mathit{j}}$, for $\mathit{i}=1,2,\dots ,m$ and $\mathit{j}=1,2,\dots ,n$ with $m,n>1$.
A check is made to see whether any row or column of the matrix of frequencies consists entirely of zeros, and if so, the matrix of frequencies is reduced by omitting that row or column. Suppose the final size of the matrix is
${m}_{1}$ by
${n}_{1}$ (
${m}_{1},{n}_{1}>1$), and let
 ${R}_{\mathit{i}}={\displaystyle \sum _{j=1}^{{n}_{1}}}{n}_{\mathit{i}j}$, the total frequency for the $\mathit{i}$th row, for $\mathit{i}=1,2,\dots ,{m}_{1}$,
 ${C}_{\mathit{j}}={\displaystyle \sum _{i=1}^{{m}_{1}}}{n}_{i\mathit{j}}$, the total frequency for the $\mathit{j}$th column, for $\mathit{j}=1,2,\dots ,{n}_{1}$, and
 $T={\displaystyle \sum _{i=1}^{{m}_{1}}}{R}_{i}={\displaystyle \sum _{j=1}^{{n}_{1}}}{C}_{j}$, the total frequency.
There are two situations:

(i)If ${m}_{1}>2$ and/or ${n}_{1}>2$, or ${m}_{1}={n}_{1}=2$ and $T>40$, then the matrix of expected frequencies, denoted by ${r}_{ij}$, for $i=1,2,\dots ,{m}_{1}$ and $j=1,2,\dots ,{n}_{1}$, and the test statistic, ${\chi}^{2}$, are computed, where
and
where
is Yates' correction for continuity.
Under the assumption that there is no association between the two classifications, ${\chi}^{2}$ will have approximately a chisquare distribution with $\left({m}_{1}1\right)\times \left({n}_{1}1\right)$ degrees of freedom.
An option exists which allows for further ‘shrinkage’ of the matrix of frequencies in the case where ${r}_{ij}<1$ for the ($i,j$)th cell. If this is the case, then row $i$ or column $j$ will be combined with the adjacent row or column with smaller total. Row $i$ is selected for combination if ${R}_{i}\times {m}_{1}\le {C}_{j}\times {n}_{1}$. This ‘shrinking’ process is continued until ${r}_{ij}\ge 1$ for all cells ($i,j$).

(ii)If ${m}_{1}={n}_{1}=2$ and $T\le 40$, the probabilities to enable Fisher's exact test to be made are computed.
The matrix of frequencies may be rearranged so that
${R}_{1}$ is the smallest marginal (i.e., column and row) total, and
${C}_{2}\ge {C}_{1}$. Under the assumption of no association between the classifications, the probability of obtaining
$r$ entries in cell
$\left(1,1\right)$ is computed where
The probability of obtaining the table of given frequencies is returned. A test of the assumption against some alternative may then be made by summing the relevant values of
${P}_{r}$.
4
References
None.
5
Arguments

1:
$\mathbf{ldnob}$ – Integer
Input

On entry: the first dimension of the array
nobs as declared in the (sub)program from which
g01aff is called.
Constraint:
${\mathbf{ldnob}}\ge {\mathbf{m}}$.

2:
$\mathbf{ldpred}$ – Integer
Input

On entry: the first dimension of the array
pred as declared in the (sub)program from which
g01aff is called.
Constraint:
${\mathbf{ldpred}}\ge {\mathbf{m}}$.

3:
$\mathbf{m}$ – Integer
Input

On entry: $m+1$, one more than the number of rows of the frequency matrix.
Constraint:
${\mathbf{m}}>2$.

4:
$\mathbf{n}$ – Integer
Input

On entry: $n+1$, one more than the number of columns of the frequency matrix.
Constraint:
${\mathbf{n}}>2$.

5:
$\mathbf{nobs}\left({\mathbf{ldnob}},{\mathbf{n}}\right)$ – Integer array
Input/Output

On entry: the elements
${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)$, for
$\mathit{i}=1,2,\dots ,m$ and
$\mathit{j}=1,2,\dots ,n$, must contain the frequencies for the twoway classification. The
$\left(m+1\right)$th row and the
$\left(n+1\right)$th column of
nobs need not be set.
On exit: contains the following information:
 ${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)$, for $\mathit{i}=1,2,\dots ,{m}_{1}$ and $\mathit{j}=1,2,\dots ,{n}_{1}$, contain the frequencies for the twoway classification after ‘shrinkage’ has taken place (see Section 3).
 ${\mathbf{nobs}}\left(\mathit{i},n+1\right)$, for $\mathit{i}=1,2,\dots ,{m}_{1}$, contain the total frequencies in the remaining rows, ${R}_{i}$.
 ${\mathbf{nobs}}\left(m+1,\mathit{j}\right)$, for $\mathit{j}=1,2,\dots ,{n}_{1}$, contain the total frequencies in the remaining columns, ${C}_{j}$.
 ${\mathbf{nobs}}\left(m+1,n+1\right)$, contains the total frequency, $\mathrm{T}$.
If any ‘shrinkage’ has occurred, all other cells contain no useful information.
Constraint:
${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)\ge 0$, for $\mathit{i}=1,2,\dots ,{\mathbf{m}}1$ and $\mathit{j}=1,2,\dots ,{\mathbf{n}}1$.

6:
$\mathbf{num}$ – Integer
Input/Output

On entry: the value assigned to
num must determine whether automatic ‘shrinkage’ is required when any
${r}_{ij}<1$, as outlined in
Section 3(i).
If ${\mathbf{num}}=1$, shrinkage is required, otherwise shrinkage is not required.
On exit: when Fisher's exact test for a
$2\times 2$ classification is used then
num contains the number of elements used in the array
p, otherwise
num is set to zero.

7:
$\mathbf{pred}\left({\mathbf{ldpred}},{\mathbf{n}}\right)$ – Real (Kind=nag_wp) array
Output

On exit: the elements
${\mathbf{pred}}\left(i,j\right)$, where
$i=1,2,\dots ,{\mathbf{m1}}$ and
$j=1,2,\dots ,{\mathbf{n1}}$ contain the expected frequencies,
${r}_{ij}$ corresponding to the observed frequencies
${\mathbf{nobs}}\left(i,j\right)$, except in the case when Fisher's exact test for a
$2\times 2$ classification is to be used, when
pred is not used. No other elements are utilized.

8:
$\mathbf{chis}$ – Real (Kind=nag_wp)
Output

On exit: the value of the test statistic, ${\chi}^{2}$, except when Fisher's exact test for a $2\times 2$ classification is used in which case it is unspecified.

9:
$\mathbf{p}\left(21\right)$ – Real (Kind=nag_wp) array
Output

p is used only when Fisher's exact test for a
$2\times 2$ classification is to be used.
On exit: the first
num elements contain the probabilities associated with the various possible frequency tables,
${P}_{\mathit{r}}$, for
$\mathit{r}=0,1,\dots ,{R}_{1}$, the remainder are unspecified.

10:
$\mathbf{npos}$ – Integer
Output

npos is used only when Fisher's exact test for a
$2\times 2$ classification is to be used.
On exit: ${\mathbf{p}}\left({\mathbf{npos}}\right)$ holds the probability associated with the given table of frequencies.

11:
$\mathbf{ndf}$ – Integer
Output

On exit: the value of
ndf gives the number of degrees of freedom for the chisquare distribution,
$\left({m}_{1}1\right)\times \left({n}_{1}1\right)$; when Fisher's exact test is used
${\mathbf{ndf}}=1$.

12:
$\mathbf{m1}$ – Integer
Output

On exit: the number of rows of the twoway classification, after any ‘shrinkage’, ${m}_{1}$.

13:
$\mathbf{n1}$ – Integer
Output

On exit: the number of columns of the twoway classification, after any ‘shrinkage’, ${n}_{1}$.

14:
$\mathbf{ifail}$ – Integer
Input/Output

On entry:
ifail must be set to
$0$,
$1\text{or}1$. If you are unfamiliar with this argument you should refer to
Section 4 in the Introduction to the NAG Library FL Interface for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=1$

On entry, ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{m}}>2$.
On entry, ${\mathbf{n}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{n}}>2$.
The number of rows or columns of
nobs is less than
$2$.
 ${\mathbf{ifail}}=2$

At least one frequency is negative, or all frequencies are zero.
 ${\mathbf{ifail}}=4$

On entry, ${\mathbf{ldnob}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{ldnob}}\ge {\mathbf{m}}$.
On entry, ${\mathbf{ldpred}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{ldpred}}\ge {\mathbf{m}}$.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
The method used is believed to be stable.
8
Parallelism and Performance
g01aff is not threaded in any implementation.
The time taken by
g01aff will increase with
m and
n, except when Fisher's exact test is to be used, in which case it increases with size of the marginal and total frequencies.
If, on exit,
${\mathbf{num}}>0$, or alternatively
ndf is
$1$ and
${\mathbf{nobs}}\left({\mathbf{m}},{\mathbf{n}}\right)\le 40$, the probabilities for use in Fisher's exact test for a
$2\times 2$ classification will be calculated, and not the test statistic with approximately a chisquare distribution.
10
Example
In the example program, NPROB determines the number of twoway classifications to be analysed. For each classification the frequencies are read, g01aff called, and information given on how much ‘shrinkage’ has taken place. If Fisher's exact test is to be used, the given frequencies and the array of probabilities associated with the possible frequency tables are printed. Otherwise, if the chisquare test is to be used, the given and expected frequencies, and the test statistic with its degrees of freedom are printed. In the example, there is one $2\times 3$ classification, with shrinkage not requested.
10.1
Program Text
10.2
Program Data
10.3
Program Results