g01 Chapter Contents
g01 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_prob_2_sample_ks (g01ezc)

## 1  Purpose

nag_prob_2_sample_ks (g01ezc) returns the probability associated with the upper tail of the Kolmogorov–Smirnov two sample distribution.

## 2  Specification

 #include #include
 double nag_prob_2_sample_ks (Integer n1, Integer n2, double d, NagError *fail)

## 3  Description

Let ${F}_{{n}_{1}}\left(x\right)$ and ${G}_{{n}_{2}}\left(x\right)$ denote the empirical cumulative distribution functions for the two samples, where ${n}_{1}$ and ${n}_{2}$ are the sizes of the first and second samples respectively.
The function nag_prob_2_sample_ks (g01ezc) computes the upper tail probability for the Kolmogorov–Smirnov two sample two-sided test statistic ${D}_{{n}_{1},{n}_{2}}$, where
 $Dn1,n2=supxFn1x-Gn2x.$
The probability is computed exactly if ${n}_{1},{n}_{2}\le 10000$ and $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\le 2500$ using a method given by Kim and Jenrich (1973). For the case where $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\le 10%$ of the $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)$ and $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\le 80$ the Smirnov approximation is used. For all other cases the Kolmogorov approximation is used. These two approximations are discussed in Kim and Jenrich (1973).

## 4  References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kim P J and Jenrich R I (1973) Tables of exact sampling distribution of the two sample Kolmogorov–Smirnov criterion ${D}_{mn}\left(m Selected Tables in Mathematical Statistics 1 80–129 American Mathematical Society
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281

## 5  Arguments

1:     n1IntegerInput
On entry: the number of observations in the first sample, ${n}_{1}$.
Constraint: ${\mathbf{n1}}\ge 1$.
2:     n2IntegerInput
On entry: the number of observations in the second sample, ${n}_{2}$.
Constraint: ${\mathbf{n2}}\ge 1$.
3:     ddoubleInput
On entry: the test statistic ${D}_{{n}_{1},{n}_{2}}$, for the two sample Kolmogorov–Smirnov goodness-of-fit test, that is the maximum difference between the empirical cumulative distribution functions (CDFs) of the two samples.
Constraint: $0.0\le {\mathbf{d}}\le 1.0$.
4:     failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_CONVERGENCE
The Smirnov approximation used for large samples did not converge in $200$ iterations. The probability is set to $1.0$.
NE_INT
On entry, ${\mathbf{n1}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n2}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n1}}\ge 1$ and ${\mathbf{n2}}\ge 1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL
On entry, ${\mathbf{d}}<0.0$ or ${\mathbf{d}}>1.0$: ${\mathbf{d}}=〈\mathit{\text{value}}〉$.

## 7  Accuracy

The large sample distributions used as approximations to the exact distribution should have a relative error of less than 5% for most cases.

## 8  Further Comments

The upper tail probability for the one-sided statistics, ${D}_{{n}_{1},{n}_{2}}^{+}$ or ${D}_{{n}_{1},{n}_{2}}^{-}$, can be approximated by halving the two-sided upper tail probability returned by nag_prob_2_sample_ks (g01ezc), that is $p/2$. This approximation to the upper tail probability for either ${D}_{{n}_{1},{n}_{2}}^{+}$ or ${D}_{{n}_{1},{n}_{2}}^{-}$ is good for small probabilities, (e.g., $p\le 0.10$) but becomes poor for larger probabilities.
The time taken by the function increases with ${n}_{1}$ and ${n}_{2}$, until ${n}_{1}{n}_{2}>10000$ or $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\ge 2500$. At this point one of the approximations is used and the time decreases significantly. The time then increases again modestly with ${n}_{1}$ and ${n}_{2}$.

## 9  Example

The following example reads in $10$ different sample sizes and values for the test statistic ${D}_{{n}_{1},{n}_{2}}$. The upper tail probability is computed and printed for each case.

### 9.1  Program Text

Program Text (g01ezce.c)

### 9.2  Program Data

Program Data (g01ezce.d)

### 9.3  Program Results

Program Results (g01ezce.r)