Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_stat_contingency_table (g01af)

## Purpose

nag_stat_contingency_table (g01af) performs the analysis of a two-way r × c$r×c$ contingency table or classification. If r = c = 2$r=c=2$, and the total number of objects classified is 40$40$ or fewer, then the probabilities for Fisher's exact test are computed. Otherwise, a test statistic is computed (with Yates' correction when r = c = 2$r=c=2$), which under the assumption of no association between the classifications has approximately a chi-square distribution with (r1) × (c1)$\left(r-1\right)×\left(c-1\right)$ degrees of freedom.

## Syntax

[nobs, num, pred, chis, p, npos, ndf, m1, n1, ifail] = g01af(nobs, 'm', m, 'n', n, 'num', num)
[nobs, num, pred, chis, p, npos, ndf, m1, n1, ifail] = nag_stat_contingency_table(nobs, 'm', m, 'n', n, 'num', num)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: m has been made optional
Mark 23: num now optional (default 0)
.

## Description

The data consist of the frequencies for the two-way classification, denoted by nij${n}_{\mathit{i}\mathit{j}}$, for i = 1,2,,m$\mathit{i}=1,2,\dots ,m$ and j = 1,2,,n$\mathit{j}=1,2,\dots ,n$ with m,n > 1$m,n>1$.
A check is made to see whether any row or column of the matrix of frequencies consists entirely of zeros, and if so, the matrix of frequencies is reduced by omitting that row or column. Suppose the final size of the matrix is m1${m}_{1}$ by n1${n}_{1}$ (m1,n1 > 1${m}_{1},{n}_{1}>1$), and let
• Ri = j = 1n1nij${R}_{\mathit{i}}=\sum _{j=1}^{{n}_{1}}{n}_{\mathit{i}j}$, the total frequency for the i$\mathit{i}$th row, for i = 1,2,,m1$\mathit{i}=1,2,\dots ,{m}_{1}$,
• Cj = i = 1m1nij${C}_{\mathit{j}}=\sum _{i=1}^{{m}_{1}}{n}_{i\mathit{j}}$, the total frequency for the j$\mathit{j}$th column, for j = 1,2,,n1$\mathit{j}=1,2,\dots ,{n}_{1}$, and
• T = i = 1m1Ri = j = 1n1Cj$T=\sum _{i=1}^{{m}_{1}}{R}_{i}=\sum _{j=1}^{{n}_{1}}{C}_{j}$, the total frequency.
There are two situations:
(i) If m1 > 2${m}_{1}>2$ and/or n1 > 2${n}_{1}>2$, or m1 = n1 = 2${m}_{1}={n}_{1}=2$ and T > 40$T>40$, then the matrix of expected frequencies, denoted by rij${r}_{ij}$, for i = 1,2,,m1$i=1,2,\dots ,{m}_{1}$ and j = 1,2,,n1$j=1,2,\dots ,{n}_{1}$, and the test statistic, χ2${\chi }^{2}$, are computed, where
 rij = RiCj / T,  i = 1,2, … ,m1;j = 1,2, … ,n1 $rij=RiCj/T, i=1,2,…,m1;j=1,2,…,n1$
and
 m1 n1 χ2 = ∑ ∑ [|rij − nij| − Y]2 / rij, i = 1 j = 1
$χ2=∑i= 1m1∑j= 1n1[|rij-nij|-Y]2/rij,$
where
Y =
 { (1/2)  if ​ m1 = n1 = 2 0  otherwise
$Y={ 12 if ​ m1=n1=2 0 otherwise$
is Yates' correction for continuity.
Under the assumption that there is no association between the two classifications, χ2${\chi }^{2}$ will have approximately a chi-square distribution with (m11) × (n11)$\left({m}_{1}-1\right)×\left({n}_{1}-1\right)$ degrees of freedom.
An option exists which allows for further ‘shrinkage’ of the matrix of frequencies in the case where rij < 1${r}_{ij}<1$ for the (i,j$i,j$)th cell. If this is the case, then row i$i$ or column j$j$ will be combined with the adjacent row or column with smaller total. Row i$i$ is selected for combination if Ri × m1Cj × n1${R}_{i}×{m}_{1}\le {C}_{j}×{n}_{1}$. This ‘shrinking’ process is continued until rij1${r}_{ij}\ge 1$ for all cells (i,j$i,j$).
(ii) If m1 = n1 = 2${m}_{1}={n}_{1}=2$ and T40$T\le 40$, the probabilities to enable Fisher's exact test to be made are computed.
The matrix of frequencies may be rearranged so that R1${R}_{1}$ is the smallest marginal (i.e., column and row) total, and C2C1${C}_{2}\ge {C}_{1}$. Under the assumption of no association between the classifications, the probability of obtaining r$r$ entries in cell (1,1)$\left(1,1\right)$ is computed where
 Pr + 1 = (R1 ! R2 ! C1 ! C2 ! )/(T ! r ! (R1 − r) ! (C1 − r) ! (T − C1 − R1 + r) ! ),  r = 0,1, … ,R1. $Pr+1=R1!R2!C1!C2! T!r!(R1-r)!(C1-r)!(T-C1-R1+r)! , r=0,1,…,R1.$
The probability of obtaining the table of given frequencies is returned. A test of the assumption against some alternative may then be made by summing the relevant values of Pr${P}_{r}$.

None.

## Parameters

### Compulsory Input Parameters

1:     nobs(ldnob,n) – int64int32nag_int array
ldnob, the first dimension of the array, must satisfy the constraint ldnobm$\mathit{ldnob}\ge {\mathbf{m}}$.
The elements nobs(i,j)${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)$, for i = 1,2,,m$\mathit{i}=1,2,\dots ,m$ and j = 1,2,,n$\mathit{j}=1,2,\dots ,n$, must contain the frequencies for the two-way classification. The (m + 1)$\left(m+1\right)$th row and the (n + 1)$\left(n+1\right)$th column of nobs need not be set.
Constraint: nobs(i,j)0${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)\ge 0$, for i = 1,2,,m1$\mathit{i}=1,2,\dots ,{\mathbf{m}}-1$ and j = 1,2,,n1$\mathit{j}=1,2,\dots ,{\mathbf{n}}-1$.

### Optional Input Parameters

1:     m – int64int32nag_int scalar
Default: The first dimension of the array nobs.
m + 1$m+1$, one more than the number of rows of the frequency matrix.
Constraint: m > 2${\mathbf{m}}>2$.
2:     n – int64int32nag_int scalar
Default: The second dimension of the array nobs.
n + 1$n+1$, one more than the number of columns of the frequency matrix.
Constraint: n > 2${\mathbf{n}}>2$.
3:     num – int64int32nag_int scalar
The value assigned to num must determine whether automatic ‘shrinkage’ is required when any rij < 1${r}_{ij}<1$, as outlined in Section [Description](i).
If num = 1${\mathbf{num}}=1$, shrinkage is required, otherwise shrinkage is not required.
Default: 0$0$

ldnob ldpred

### Output Parameters

1:     nobs(ldnob,n) – int64int32nag_int array
ldnobm$\mathit{ldnob}\ge {\mathbf{m}}$.
Contains the following information:
• nobs(i,j)${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)$, for i = 1,2,,m1$\mathit{i}=1,2,\dots ,{m}_{1}$ and j = 1,2,,n1$\mathit{j}=1,2,\dots ,{n}_{1}$, contain the frequencies for the two-way classification after ‘shrinkage’ has taken place (see Section [Description]).
• nobs(i,n + 1)${\mathbf{nobs}}\left(\mathit{i},n+1\right)$, for i = 1,2,,m1$\mathit{i}=1,2,\dots ,{m}_{1}$, contain the total frequencies in the remaining rows, Ri${R}_{i}$.
• nobs(m + 1,j)${\mathbf{nobs}}\left(m+1,\mathit{j}\right)$, for j = 1,2,,n1$\mathit{j}=1,2,\dots ,{n}_{1}$, contain the total frequencies in the remaining columns, Cj${C}_{j}$.
• nobs(m + 1,n + 1)${\mathbf{nobs}}\left(m+1,n+1\right)$, contains the total frequency, T$\mathrm{T}$.
If any ‘shrinkage’ has occurred, then all other cells contain no useful information.
2:     num – int64int32nag_int scalar
Default: 0$0$
When Fisher's exact test for a 2 × 2$2×2$ classification is used then num contains the number of elements used in the array p, otherwise num is set to zero.
3:     pred(ldpred,n) – double array
ldpredm$\mathit{ldpred}\ge {\mathbf{m}}$.
The elements pred(i,j)${\mathbf{pred}}\left(i,j\right)$, where i = 1,2,,m1$i=1,2,\dots ,{\mathbf{m1}}$ and j = 1,2,,n1$j=1,2,\dots ,{\mathbf{n1}}$ contain the expected frequencies, rij${r}_{ij}$ corresponding to the observed frequencies nobs(i,j)${\mathbf{nobs}}\left(i,j\right)$, except in the case when Fisher's exact test for a 2 × 2$2×2$ classification is to be used, when pred is not used. No other elements are utilized.
4:     chis – double scalar
The value of the test statistic, χ2${\chi }^{2}$, except when Fisher's exact test for a 2 × 2$2×2$ classification is used in which case it is unspecified.
5:     p(21$21$) – double array
The first num elements contain the probabilities associated with the various possible frequency tables, Pr${P}_{\mathit{r}}$, for r = 0,1,,R1$\mathit{r}=0,1,\dots ,{R}_{1}$, the remainder are unspecified.
6:     npos – int64int32nag_int scalar
p(npos)${\mathbf{p}}\left({\mathbf{npos}}\right)$ holds the probability associated with the given table of frequencies.
7:     ndf – int64int32nag_int scalar
The value of ndf gives the number of degrees of freedom for the chi-square distribution, (m11) × (n11)$\left({m}_{1}-1\right)×\left({n}_{1}-1\right)$; when Fisher's exact test is used ndf = 1${\mathbf{ndf}}=1$.
8:     m1 – int64int32nag_int scalar
The number of rows of the two-way classification, after any ‘shrinkage’, m1${m}_{1}$.
9:     n1 – int64int32nag_int scalar
The number of columns of the two-way classification, after any ‘shrinkage’, n1${n}_{1}$.
10:   ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
The number of rows or columns of nobs is less than 2$2$, possibly after shrinkage.
ifail = 2${\mathbf{ifail}}=2$
At least one frequency is negative, or all frequencies are zero.
ifail = 4${\mathbf{ifail}}=4$
 On entry, ldpred < m$\mathit{ldpred}<{\mathbf{m}}$, or ldnob < m$\mathit{ldnob}<{\mathbf{m}}$.

## Accuracy

The method used is believed to be stable.

The time taken by nag_stat_contingency_table (g01af) will increase with m and n, except when Fisher's exact test is to be used, in which case it increases with size of the marginal and total frequencies.
If, on exit, num > 0${\mathbf{num}}>0$, or alternatively ndf is 1$1$ and nobs(m,n)40${\mathbf{nobs}}\left({\mathbf{m}},{\mathbf{n}}\right)\le 40$, the probabilities for use in Fisher's exact test for a 2 × 2$2×2$ classification will be calculated, and not the test statistic with approximately a chi-square distribution.

## Example

```function nag_stat_contingency_table_example
nobs = [int64(86),51,13,-1078555088;130,115,41,832;-1232907664,10574844,262144,-1078555088];
[nobsOut, num, pred, chis, p, npos, ndf, m1, n1, ifail] = nag_stat_contingency_table(nobs)
```
```

nobsOut =

86                   51                   13                  150
130                  115                   41                  286
216                  166                   54                  436

num =

0

pred =

74.3119   57.1101   18.5780         0
141.6881  108.8899   35.4220         0
0         0         0         0

chis =

6.3522

p =

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

npos =

0

ndf =

2

m1 =

2

n1 =

3

ifail =

0

```
```function g01af_example
nobs = [int64(86),51,13,-1078555088;130,115,41,832;-1232907664,10574844,262144,-1078555088];
[nobsOut, num, pred, chis, p, npos, ndf, m1, n1, ifail] = g01af(nobs)
```
```

nobsOut =

86                   51                   13                  150
130                  115                   41                  286
216                  166                   54                  436

num =

0

pred =

74.3119   57.1101   18.5780         0
141.6881  108.8899   35.4220         0
0         0         0         0

chis =

6.3522

p =

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

npos =

0

ndf =

2

m1 =

2

n1 =

3

ifail =

0

```