Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_contab_chisq (g11aa)

## Purpose

nag_contab_chisq (g11aa) computes χ2${\chi }^{2}$ statistics for a two-way contingency table. For a 2 × 2$2×2$ table with a small number of observations exact probabilities are computed.

## Syntax

[expt, chist, prob, chi, g, df, ifail] = g11aa(nrow, nobs, 'ncol', ncol)
[expt, chist, prob, chi, g, df, ifail] = nag_contab_chisq(nrow, nobs, 'ncol', ncol)

## Description

For a set of n$n$ observations classified by two variables, with r$r$ and c$c$ levels respectively, a two-way table of frequencies with r$r$ rows and c$c$ columns can be computed.
 n11 n12 … n1c n1 . n21 n22 … n2c n2 . ⋮ ⋮ ⋮ ⋮ ⋮ nr1 nr2 … nrc nr . n . 1 n . 2 … n . c n
$n11 n12 … n1c n1. n21 n22 … n2c n2. ⋮ ⋮ ⋮ ⋮ ⋮ nr1 nr2 … nrc nr. n.1 n.2 … n.c n$
To measure the association between the two classification variables two statistics that can be used are, the Pearson χ2${\chi }^{2}$ statistic, i = 1rj = 1c ((nijfij)2)/(fij) $\sum _{i=1}^{r}\sum _{j=1}^{c}\frac{{\left({n}_{ij}-{f}_{ij}\right)}^{2}}{{f}_{ij}}$, and the likelihood ratio test statistic, 2i = 1rj = 1c nij × log(nij / fij)$2\sum _{i=1}^{r}\sum _{j=1}^{c}{n}_{ij}×\mathrm{log}\left({n}_{ij}/{f}_{ij}\right)$, where fij${f}_{ij}$ are the fitted values from the model that assumes the effects due to the classification variables are additive, i.e., there is no association. These values are the expected cell frequencies and are given by
 fij = ni . n . j / n. $fij=ni.n.j/n.$
Under the hypothesis of no association between the two classification variables, both these statistics have, approximately, a χ2${\chi }^{2}$-distribution with (c1)(r1)$\left(c-1\right)\left(r-1\right)$ degrees of freedom. This distribution is arrived at under the assumption that the expected cell frequencies, fij${f}_{ij}$, are not too small. For a discussion of this point see Everitt (1977). He concludes by saying, ‘... in the majority of cases the chi-square criterion may be used for tables with expectations in excess of 0.5$0.5$ in the smallest cell’.
In the case of the 2 × 2$2×2$ table, i.e., c = 2$c=2$ and r = 2$r=2$, the χ2${\chi }^{2}$ approximation can be improved by using Yates' continuity correction factor. This decreases the absolute value of (nijfij)$\left({n}_{ij}-{f}_{ij}\right)$ by (1/2) $\frac{1}{2}$. For 2 × 2$2×2$ tables with a small value of n$n$ the exact probabilities from Fisher's test are computed. These are based on the hypergeometric distribution and are computed using nag_stat_prob_hypergeom (g01bl). A two tail probability is computed as min (1,2pu,2pl) $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(1,2{p}_{u},2{p}_{l}\right)$, where pu${p}_{u}$ and pl${p}_{l}$ are the upper and lower one-tail probabilities from the hypergeometric distribution.

## References

Everitt B S (1977) The Analysis of Contingency Tables Chapman and Hall
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin

## Parameters

### Compulsory Input Parameters

1:     nrow – int64int32nag_int scalar
r$r$, the number of rows in the contingency table.
Constraint: nrow2${\mathbf{nrow}}\ge 2$.
2:     nobs(ldnobs,ncol) – int64int32nag_int array
ldnobs, the first dimension of the array, must satisfy the constraint ldnobsnrow$\mathit{ldnobs}\ge {\mathbf{nrow}}$.
The contingency table nobs(i,j)${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)$ must contain nij${n}_{\mathit{i}\mathit{j}}$, for i = 1,2,,r$\mathit{i}=1,2,\dots ,r$ and j = 1,2,,c$\mathit{j}=1,2,\dots ,c$.
Constraint: nobs(i,j)0${\mathbf{nobs}}\left(\mathit{i},\mathit{j}\right)\ge 0$, for i = 1,2,,r$\mathit{i}=1,2,\dots ,r$ and j = 1,2,,c$\mathit{j}=1,2,\dots ,c$.

### Optional Input Parameters

1:     ncol – int64int32nag_int scalar
Default: The second dimension of the array nobs.
c$c$, the number of columns in the contingency table.
Constraint: ncol2${\mathbf{ncol}}\ge 2$.

ldnobs

### Output Parameters

1:     expt(ldnobs,ncol) – double array
ldnobsnrow$\mathit{ldnobs}\ge {\mathbf{nrow}}$.
The table of expected values. expt(i,j)${\mathbf{expt}}\left(\mathit{i},\mathit{j}\right)$ contains fij${f}_{\mathit{i}\mathit{j}}$, for i = 1,2,,r$\mathit{i}=1,2,\dots ,r$ and j = 1,2,,c$\mathit{j}=1,2,\dots ,c$.
2:     chist(ldnobs,ncol) – double array
ldnobsnrow$\mathit{ldnobs}\ge {\mathbf{nrow}}$.
The table of χ2${\chi }^{2}$ contributions. chist(i,j)${\mathbf{chist}}\left(\mathit{i},\mathit{j}\right)$ contains ((nijfij)2)/(fij) $\frac{{\left({n}_{\mathit{i}\mathit{j}}-{f}_{\mathit{i}\mathit{j}}\right)}^{2}}{{f}_{\mathit{i}\mathit{j}}}$, for i = 1,2,,r$\mathit{i}=1,2,\dots ,r$ and j = 1,2,,c$\mathit{j}=1,2,\dots ,c$.
3:     prob – double scalar
If c = 2$c=2$, r = 2$r=2$ and n40$n\le 40$ then prob contains the two tail significance level for Fisher's exact test, otherwise prob contains the significance level from the Pearson χ2${\chi }^{2}$ statistic.
4:     chi – double scalar
The Pearson χ2${\chi }^{2}$ statistic.
5:     g – double scalar
The likelihood ratio test statistic.
6:     df – double scalar
The degrees of freedom for the statistics.
7:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Note: nag_contab_chisq (g11aa) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

ifail = 1${\mathbf{ifail}}=1$
 On entry, nrow < 2${\mathbf{nrow}}<2$, or ncol < 2${\mathbf{ncol}}<2$, or ldnobs < nrow$\mathit{ldnobs}<{\mathbf{nrow}}$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, a value in nobs < 0${\mathbf{nobs}}<0$, or all values in nobs are zero.
ifail = 3${\mathbf{ifail}}=3$
 On entry, a 2 × 2$2×2$ table has a row or column with both values 0$0$.
W ifail = 4${\mathbf{ifail}}=4$
At least one cell has expected frequency, fij${f}_{ij}$, 0.5$\text{}\le 0.5$. The χ2${\chi }^{2}$ approximation may be poor.

## Accuracy

For the accuracy of the probabilities for Fisher's exact test see nag_stat_prob_hypergeom (g01bl).

The function nag_stat_contingency_table (g01af) allows for the automatic amalgamation of rows and columns. In most circumstances this is not recommended; see Everitt (1977).
Multidimensional contingency tables can be analysed using log-linear models fitted by nag_correg_glm_binomial (g02gb).

## Example

```function nag_contab_chisq_example
nrow = int64(3);
nobst = [int64(23),9,6; ...
21,4,3; ...
34,24,17];
[expt, chist, prob, chi, g, df, ifail] = nag_contab_chisq(nrow, nobst)
```
```

expt =

21.0213    9.9716    7.0071
15.4894    7.3475    5.1631
41.4894   19.6809   13.8298

chist =

0.1863    0.0947    0.1447
1.9605    1.5251    0.9063
1.3519    0.9479    0.7267

prob =

0.0975

chi =

7.8441

g =

8.0958

df =

4

ifail =

0

```
```function g11aa_example
nrow = int64(3);
nobst = [int64(23),9,6; ...
21,4,3; ...
34,24,17];
[expt, chist, prob, chi, g, df, ifail] = g11aa(nrow, nobst)
```
```

expt =

21.0213    9.9716    7.0071
15.4894    7.3475    5.1631
41.4894   19.6809   13.8298

chist =

0.1863    0.0947    0.1447
1.9605    1.5251    0.9063
1.3519    0.9479    0.7267

prob =

0.0975

chi =

7.8441

g =

8.0958

df =

4

ifail =

0

```