The routine may be called by the names g01ezf or nagf_stat_prob_kolmogorov2.
3Description
Let ${F}_{{n}_{1}}\left(x\right)$ and ${G}_{{n}_{2}}\left(x\right)$ denote the empirical cumulative distribution functions for the two samples, where ${n}_{1}$ and ${n}_{2}$ are the sizes of the first and second samples respectively.
The function g01ezf computes the upper tail probability for the Kolmogorov–Smirnov two sample two-sided test statistic ${D}_{{n}_{1},{n}_{2}}$, where
The probability is computed exactly if ${n}_{1},{n}_{2}\le 10000$ and $\mathrm{max}\phantom{\rule{0.125em}{0ex}}({n}_{1},{n}_{2})\le 2500$ using a method given by Kim and Jenrich (1973). For the case where $\mathrm{min}\phantom{\rule{0.125em}{0ex}}({n}_{1},{n}_{2})\le 10\%$ of the $\mathrm{max}\phantom{\rule{0.125em}{0ex}}({n}_{1},{n}_{2})$ and $\mathrm{min}\phantom{\rule{0.125em}{0ex}}({n}_{1},{n}_{2})\le 80$ the Smirnov approximation is used. For all other cases the Kolmogorov approximation is used. These two approximations are discussed in Kim and Jenrich (1973).
4References
Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist.19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kim P J and Jenrich R I (1973) Tables of exact sampling distribution of the two sample Kolmogorov–Smirnov criterion ${D}_{mn}(m<n)$Selected Tables in Mathematical Statistics1 80–129 American Mathematical Society
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist.19 279–281
5Arguments
1: $\mathbf{n1}$ – IntegerInput
On entry: the number of observations in the first sample, ${n}_{1}$.
Constraint:
${\mathbf{n1}}\ge 1$.
2: $\mathbf{n2}$ – IntegerInput
On entry: the number of observations in the second sample, ${n}_{2}$.
Constraint:
${\mathbf{n2}}\ge 1$.
3: $\mathbf{d}$ – Real (Kind=nag_wp)Input
On entry: the test statistic ${D}_{{n}_{1},{n}_{2}}$, for the two sample Kolmogorov–Smirnov goodness-of-fit test, that is the maximum difference between the empirical cumulative distribution functions (CDFs) of the two samples.
Constraint:
$0.0\le {\mathbf{d}}\le 1.0$.
4: $\mathbf{ifail}$ – IntegerInput/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{n1}}=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{n2}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{n1}}\ge 1$ and ${\mathbf{n2}}\ge 1$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{d}}<0.0$ or ${\mathbf{d}}>1.0$: ${\mathbf{d}}=\u27e8\mathit{\text{value}}\u27e9$.
${\mathbf{ifail}}=3$
The Smirnov approximation used for large samples did not converge in $200$ iterations. The probability is set to $1.0$.
${\mathbf{ifail}}=-99$
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
7Accuracy
The large sample distributions used as approximations to the exact distribution should have a relative error of less than 5% for most cases.
8Parallelism and Performance
g01ezf is not threaded in any implementation.
9Further Comments
The upper tail probability for the one-sided statistics, ${D}_{{n}_{1},{n}_{2}}^{+}$ or ${D}_{{n}_{1},{n}_{2}}^{-}$, can be approximated by halving the two-sided upper tail probability returned by g01ezf, that is $p/2$. This approximation to the upper tail probability for either ${D}_{{n}_{1},{n}_{2}}^{+}$ or ${D}_{{n}_{1},{n}_{2}}^{-}$ is good for small probabilities, (e.g., $p\le 0.10$) but becomes poor for larger probabilities.
The time taken by the routine increases with ${n}_{1}$ and ${n}_{2}$, until ${n}_{1}{n}_{2}>10000$ or $\mathrm{max}\phantom{\rule{0.125em}{0ex}}({n}_{1},{n}_{2})\ge 2500$. At this point one of the approximations is used and the time decreases significantly. The time then increases again modestly with ${n}_{1}$ and ${n}_{2}$.
10Example
The following example reads in $10$ different sample sizes and values for the test statistic ${D}_{{n}_{1},{n}_{2}}$. The upper tail probability is computed and printed for each case.