nag_chi_sq_2_way_table (g11aac) (PDF version)
g11 Chapter Contents
g11 Chapter Introduction
NAG Library Manual

# NAG Library Function Documentnag_chi_sq_2_way_table (g11aac)

## 1  Purpose

nag_chi_sq_2_way_table (g11aac) computes ${\chi }^{2}$ statistics for a two-way contingency table. For a $2×2$ table with a small number of observations exact probabilities are computed.

## 2  Specification

 #include #include
 void nag_chi_sq_2_way_table (Integer nrow, Integer ncol, const Integer nobst[], Integer tdt, double expt[], double chist[], double *prob, double *chi, double *g, double *df, NagError *fail)

## 3  Description

For a set of $n$ observations classified by two variables, with $r$ and $c$ levels respectively, a two-way table of frequencies with $r$ rows and $c$ columns can be computed.
 $n 11 n 12 ⋯ n 1c n 1. n 21 n 22 ⋯ n 2c n 2. ⋮ ⋮ ⋮ ⋮ ⋮ n r1 n r2 ⋯ n rc n r. n .1 n .2 ⋯ n .c n r.$
To measure the association between the two classification variables two statistics that can be used are:
The Pearson ${\chi }^{2}$ statistic $\text{}={\sum }_{i=1}^{r}{\sum }_{j=1}^{c}\frac{{\left({n}_{ij}-{f}_{ij}\right)}^{2}}{{f}_{ij}}$,
and
The likelihood ratio test statistic $\text{}=2{\sum }_{i=1}^{r}{\sum }_{j=1}^{c}{n}_{ij}×\mathrm{log}\left({n}_{ij}/{f}_{ij}\right)$.
Where ${f}_{ij}$ are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association. These values are the expected cell frequencies and are given by,
 $f ij = n i. n .j / n .$
Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a ${\chi }^{2}$ distribution with $\left(c-1\right)\left(r-1\right)$ degrees of freedom. This distribution is arrived at under the assumption that the expected cell frequencies, ${f}_{ij}$, are not too small. For a discussion of this point see Everitt (1977). He concludes by saying, ‘`... in the majority of cases the chi-square criterion may be used for tables with expectations in excess of $0.5$ in the smallest cell’'.
In the case of the $2×2$ table, i.e., $c=2$ and $r=2$, the ${\chi }^{2}$ approximation can be improved by using Yates' continuity correction factor. This decreases the absolute value of $\left({n}_{ij}-{f}_{ij}\right)$ by $\frac{1}{2}$. For $2×2$ tables with a small value of $n$ the exact probabilities from Fisher's test are computed. These are based on the hypergeometric distribution and are computed using nag_hypergeom_dist (g01blc). A two-tail probability is computed as $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(1,2{p}_{u},2{p}_{l}\right)$, where ${p}_{u}$ and ${p}_{l}$ are the upper and lower one-tail probabilities from the hypergeometric distribution.
Everitt B S (1977) The Analysis of Contingency Tables Chapman and Hall
Kendall M G and Stuart A (1979) The Advanced Theory of Statistics (3 Volumes) (4th Edition) Griffin

## 5  Arguments

1:    $\mathbf{nrow}$IntegerInput
On entry: the number of rows in the contingency table, $r$.
Constraint: ${\mathbf{nrow}}\ge 2$.
2:    $\mathbf{ncol}$IntegerInput
On entry: the number of columns in the contingency table, $c$.
Constraint: ${\mathbf{ncol}}\ge 2$.
3:    $\mathbf{nobst}\left[{\mathbf{nrow}}×{\mathbf{tdt}}\right]$const IntegerInput
On entry: the contingency table, ${\mathbf{nobst}}\left[\left(i-1\right)×{\mathbf{tdt}}+j-1\right]$ must contain ${n}_{ij}$, for $i=1,2,\dots ,r$ and $j=1,2,\dots ,c$.
Constraint: ${\mathbf{nobst}}\left[\left(i-1\right)×{\mathbf{tdt}}+j-1\right]\ge 0$ for $i=1,2,\dots ,r$ and $j=1,2,\dots ,c$.
4:    $\mathbf{tdt}$IntegerInput
On entry: the stride separating matrix column elements in the arrays nobst, expt, chist.
Constraint: ${\mathbf{tdt}}\ge {\mathbf{ncol}}$.
5:    $\mathbf{expt}\left[{\mathbf{nrow}}×{\mathbf{tdt}}\right]$doubleOutput
On exit: the table of expected values, ${\mathbf{expt}}\left[\left(i-1\right)×{\mathbf{tdt}}+j-1\right]$ contains ${f}_{ij}$, for $i=1,2,\dots ,r$ and $j=1,2,\dots ,c$.
6:    $\mathbf{chist}\left[{\mathbf{nrow}}×{\mathbf{tdt}}\right]$doubleOutput
On exit: the table of ${\chi }^{2}$ contributions, ${\mathbf{chist}}\left[\left(i-1\right)×{\mathbf{tdt}}+j-1\right]$ contains $\frac{{\left({n}_{ij}-{f}_{ij}\right)}^{2}}{{f}_{ij}}$, for $i=1,2,\dots ,r$ and $j=1,2,\dots ,c$.
7:    $\mathbf{prob}$double *Output
On exit: if $c=2$, $r=2$ and $n\le 40$ then prob contains the two-tail significance level for Fisher's exact test, otherwise prob contains the significance level from the Pearson ${\chi }^{2}$ statistic.
8:    $\mathbf{chi}$double *Output
On exit: the Pearson ${\chi }^{2}$ statistic.
9:    $\mathbf{g}$double *Output
On exit: the likelihood ratio test statistic.
10:  $\mathbf{df}$double *Output
On exit: the degrees of freedom for the statistics.
11:  $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 2.7 in How to Use the NAG Library and its Documentation).

## 6  Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, ${\mathbf{tdt}}=〈\mathit{\text{value}}〉$ while ${\mathbf{ncol}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{tdt}}\ge {\mathbf{ncol}}$.
NE_2D_INT_ARR_ELEM
On entry, ${\mathbf{nobst}}\left[\left(〈\mathit{\text{value}}〉\right)×{\mathbf{tdt}}+〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$. All elements of this array must be $\ge 0$.
NE_2D_INT_ARR_ELEMS
On entry, all elements of the array nobst are 0. At least one element of this array must be $>0$.
NE_INT_ARG_LT
On entry, ${\mathbf{ncol}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ncol}}\ge 2$.
On entry, ${\mathbf{nrow}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nrow}}\ge 2$
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_LOW_EXPECTED_FREQ
At least one cell has expected frequency $\le 0.5$. The chi-square approximation may be poor.
NE_TABLE_DEGENERATE
On entry, a 2 by 2 table has a row or column with both elements zero, i.e., the table is degenerate.

## 7  Accuracy

For the accuracy of the probabilities for Fisher's exact test see nag_hypergeom_dist (g01blc).

## 8  Parallelism and Performance

nag_chi_sq_2_way_table (g11aac) is not threaded in any implementation.

## 9  Further Comments

Multidimensional contingency tables can be analysed using log-linear models fitted by nag_glm_binomial (g02gbc).

## 10  Example

The data below, taken from Everitt (1977), is from 141 patients with brain tumours. The row classification variable is the site of the tumour: frontal lobes, temporal lobes and other cerebral areas. The column classification variable is the type of tumour: benign, malignant and other cerebral tumours.
 $23 9 6 38 21 4 3 28 34 24 17 75 78 37 26 141$
The data is read in and the statistics computed and printed.

### 10.1  Program Text

Program Text (g11aace.c)

### 10.2  Program Data

Program Data (g11aace.d)

### 10.3  Program Results

Program Results (g11aace.r)

nag_chi_sq_2_way_table (g11aac) (PDF version)
g11 Chapter Contents
g11 Chapter Introduction
NAG Library Manual