# NAG FL Interfaceg01eyf (prob_​kolmogorov1)

## 1Purpose

g01eyf returns the upper tail probability associated with the one sample Kolmogorov–Smirnov distribution.

## 2Specification

Fortran Interface
 Function g01eyf ( n, d,
 Real (Kind=nag_wp) :: g01eyf Integer, Intent (In) :: n Integer, Intent (Inout) :: ifail Real (Kind=nag_wp), Intent (In) :: d
#include <nag.h>
 double g01eyf_ (const Integer *n, const double *d, Integer *ifail)
The routine may be called by the names g01eyf or nagf_stat_prob_kolmogorov1.

## 3Description

Let ${S}_{n}\left(x\right)$ be the sample cumulative distribution function and ${F}_{0}\left(x\right)$ the hypothesised theoretical distribution function.
g01eyf returns the upper tail probability, $p$, associated with the one-sided Kolmogorov–Smirnov test statistic ${D}_{n}^{+}$ or ${D}_{n}^{-}$, where these one-sided statistics are defined as follows;
 $Dn+ = supxSnx-F0x, Dn- = supxF0x-Snx.$
If $n\le 100$ an exact method is used; for the details see Conover (1980). Otherwise a large sample approximation derived by Smirnov is used; see Feller (1948), Kendall and Stuart (1973) or Smirnov (1948).

## 4References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281

## 5Arguments

1: $\mathbf{n}$Integer Input
On entry: $n$, the number of observations in the sample.
Constraint: ${\mathbf{n}}\ge 1$.
2: $\mathbf{d}$Real (Kind=nag_wp) Input
On entry: contains the test statistic, ${D}_{n}^{+}$ or ${D}_{n}^{-}$.
Constraint: $0.0\le {\mathbf{d}}\le 1.0$.
3: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 1$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{d}}<0.0$ or ${\mathbf{d}}>1.0$: ${\mathbf{d}}=〈\mathit{\text{value}}〉$.
${\mathbf{ifail}}=-99$
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

The large sample distribution used as an approximation to the exact distribution should have a relative error of less than $2.5$% for most cases.

## 8Parallelism and Performance

g01eyf is not threaded in any implementation.

The upper tail probability for the two-sided statistic, ${D}_{n}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({D}_{n}^{+},{D}_{n}^{-}\right)$, can be approximated by twice the probability returned via g01eyf, that is $2p$. (Note that if the probability from g01eyf is greater than $0.5$ then the two-sided probability should be truncated to $1.0$). This approximation to the tail probability for ${D}_{n}$ is good for small probabilities, (e.g., $p\le 0.10$) but becomes very poor for larger probabilities.
The time taken by the routine increases with $n$, until $n>100$. At this point the approximation is used and the time decreases significantly. The time then increases again modestly with $n$.

## 10Example

The following example reads in $10$ different sample sizes and values for the test statistic ${D}_{n}$. The upper tail probability is computed and printed for each case.

### 10.1Program Text

Program Text (g01eyfe.f90)

### 10.2Program Data

Program Data (g01eyfe.d)

### 10.3Program Results

Program Results (g01eyfe.r)