NAG FL Interface
g01ezf (prob_kolmogorov2)
1
Purpose
g01ezf returns the probability associated with the upper tail of the Kolmogorov–Smirnov two sample distribution.
2
Specification
Fortran Interface
Real (Kind=nag_wp) 
:: 
g01ezf 
Integer, Intent (In) 
:: 
n1, n2 
Integer, Intent (Inout) 
:: 
ifail 
Real (Kind=nag_wp), Intent (In) 
:: 
d 

C Header Interface
#include <nag.h>
double 
g01ezf_ (const Integer *n1, const Integer *n2, const double *d, Integer *ifail) 

C++ Header Interface
#include <nag.h> extern "C" {
double 
g01ezf_ (const Integer &n1, const Integer &n2, const double &d, Integer &ifail) 
}

The routine may be called by the names g01ezf or nagf_stat_prob_kolmogorov2.
3
Description
Let ${F}_{{n}_{1}}\left(x\right)$ and ${G}_{{n}_{2}}\left(x\right)$ denote the empirical cumulative distribution functions for the two samples, where ${n}_{1}$ and ${n}_{2}$ are the sizes of the first and second samples respectively.
The function
g01ezf computes the upper tail probability for the Kolmogorov–Smirnov two sample twosided test statistic
${D}_{{n}_{1},{n}_{2}}$, where
The probability is computed exactly if
${n}_{1},{n}_{2}\le 10000$ and
$\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\le 2500$ using a method given by
Kim and Jenrich (1973). For the case where
$\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\le 10\%$ of the
$\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)$ and
$\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\le 80$ the Smirnov approximation is used. For all other cases the Kolmogorov approximation is used. These two approximations are discussed in
Kim and Jenrich (1973).
4
References
Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kim P J and Jenrich R I (1973) Tables of exact sampling distribution of the two sample Kolmogorov–Smirnov criterion ${D}_{mn}\left(m<n\right)$ Selected Tables in Mathematical Statistics 1 80–129 American Mathematical Society
Siegel S (1956) Nonparametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281
5
Arguments

1:
$\mathbf{n1}$ – Integer
Input

On entry: the number of observations in the first sample, ${n}_{1}$.
Constraint:
${\mathbf{n1}}\ge 1$.

2:
$\mathbf{n2}$ – Integer
Input

On entry: the number of observations in the second sample, ${n}_{2}$.
Constraint:
${\mathbf{n2}}\ge 1$.

3:
$\mathbf{d}$ – Real (Kind=nag_wp)
Input

On entry: the test statistic ${D}_{{n}_{1},{n}_{2}}$, for the two sample Kolmogorov–Smirnov goodnessoffit test, that is the maximum difference between the empirical cumulative distribution functions (CDFs) of the two samples.
Constraint:
$0.0\le {\mathbf{d}}\le 1.0$.

4:
$\mathbf{ifail}$ – Integer
Input/Output

On entry:
ifail must be set to
$0$,
$1\text{or}1$. If you are unfamiliar with this argument you should refer to
Section 4 in the Introduction to the NAG Library FL Interface for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=1$

On entry, ${\mathbf{n1}}=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{n2}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{n1}}\ge 1$ and ${\mathbf{n2}}\ge 1$.
 ${\mathbf{ifail}}=2$

On entry, ${\mathbf{d}}<0.0$ or ${\mathbf{d}}>1.0$: ${\mathbf{d}}=\u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=3$

The Smirnov approximation used for large samples did not converge in $200$ iterations. The probability is set to $1.0$.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
7
Accuracy
The large sample distributions used as approximations to the exact distribution should have a relative error of less than 5% for most cases.
8
Parallelism and Performance
g01ezf is not threaded in any implementation.
The upper tail probability for the onesided statistics, ${D}_{{n}_{1},{n}_{2}}^{+}$ or ${D}_{{n}_{1},{n}_{2}}^{}$, can be approximated by halving the twosided upper tail probability returned by g01ezf, that is $p/2$. This approximation to the upper tail probability for either ${D}_{{n}_{1},{n}_{2}}^{+}$ or ${D}_{{n}_{1},{n}_{2}}^{}$ is good for small probabilities, (e.g., $p\le 0.10$) but becomes poor for larger probabilities.
The time taken by the routine increases with ${n}_{1}$ and ${n}_{2}$, until ${n}_{1}{n}_{2}>10000$ or $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\ge 2500$. At this point one of the approximations is used and the time decreases significantly. The time then increases again modestly with ${n}_{1}$ and ${n}_{2}$.
10
Example
The following example reads in $10$ different sample sizes and values for the test statistic ${D}_{{n}_{1},{n}_{2}}$. The upper tail probability is computed and printed for each case.
10.1
Program Text
10.2
Program Data
10.3
Program Results