PDF version (NAG web site
, 64bit version, 64bit version)
NAG Toolbox: nag_nonpar_test_ks_1sample_user (g08cc)
Purpose
nag_nonpar_test_ks_1sample_user (g08cc) performs the one sample Kolmogorov–Smirnov distribution test, using a userspecified distribution.
Syntax
Description
The data consists of a single sample of $n$ observations, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$. Let ${S}_{n}\left({x}_{\left(i\right)}\right)$ and ${F}_{0}\left({x}_{\left(i\right)}\right)$ represent the sample cumulative distribution function and the theoretical (null) cumulative distribution function respectively at the point ${x}_{\left(i\right)}$, where ${x}_{\left(i\right)}$ is the $i$th smallest sample observation.
The Kolmogorov–Smirnov test provides a test of the null hypothesis
${H}_{0}$: the data are a random sample of observations from a theoretical distribution specified by you (in
cdf) against one of the following alternative hypotheses.
(i) 
${H}_{1}$: the data cannot be considered to be a random sample from the specified null distribution. 
(ii) 
${H}_{2}$: the data arise from a distribution which dominates the specified null distribution. In practical terms, this would be demonstrated if the values of the sample cumulative distribution function ${S}_{n}\left(x\right)$ tended to exceed the corresponding values of the theoretical cumulative distribution function ${F}_{0\left(x\right)}$. 
(iii) 
${H}_{3}$: the data arise from a distribution which is dominated by the specified null distribution. In practical terms, this would be demonstrated if the values of the theoretical cumulative distribution function ${F}_{0}\left(x\right)$ tended to exceed the corresponding values of the sample cumulative distribution function ${S}_{n}\left(x\right)$. 
One of the following test statistics is computed depending on the particular alternative hypothesis specified (see the description of the argument
ntype in
Arguments).
For the alternative hypothesis
${H}_{1}$:
 ${D}_{n}$ – the largest absolute deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally ${D}_{n}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left\{{D}_{n}^{+},{D}_{n}^{}\right\}$.
For the alternative hypothesis
${H}_{2}$:
 ${D}_{n}^{+}$ – the largest positive deviation between the sample cumulative distribution function and the theoretical cumulative distribution function. Formally ${D}_{n}^{+}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left\{{S}_{n}\left({x}_{\left(i\right)}\right){F}_{0}\left({x}_{\left(i\right)}\right),0\right\}$.
For the alternative hypothesis
${H}_{3}$:
 ${D}_{n}^{}$ – the largest positive deviation between the theoretical cumulative distribution function and the sample cumulative distribution function. Formally ${D}_{n}^{}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left\{{F}_{0}\left({x}_{\left(i\right)}\right){S}_{n}\left({x}_{\left(i1\right)}\right),0\right\}$. This is only true for continuous distributions. See Further Comments for comments on discrete distributions.
The standardized statistic,
$Z=D\times \sqrt{n}$, is also computed, where
$D$ may be
${D}_{n},{D}_{n}^{+}$ or
${D}_{n}^{}$ depending on the choice of the alternative hypothesis. This is the standardized value of
$D$ with no continuity correction applied and the distribution of
$Z$ converges asymptotically to a limiting distribution, first derived by
Kolmogorov (1933), and then tabulated by
Smirnov (1948). The asymptotic distributions for the onesided statistics were obtained by
Smirnov (1933).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If
$n\le 100$, an exact method given by
Conover (1980) is used. Note that the method used is only exact for continuous theoretical distributions and does not include Conover's modification for discrete distributions. This method computes the onesided probabilities. The twosided probabilities are estimated by doubling the onesided probability. This is a good estimate for small
$p$, that is
$p\le 0.10$, but it becomes very poor for larger
$p$. If
$n>100$ then
$p$ is computed using the Kolmogorov–Smirnov limiting distributions; see
Feller (1948),
Kendall and Stuart (1973),
Kolmogorov (1933),
Smirnov (1933) and
Smirnov (1948).
References
Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kolmogorov A N (1933) Sulla determinazione empirica di una legge di distribuzione Giornale dell' Istituto Italiano degli Attuari 4 83–91
Siegel S (1956) Nonparametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent samples Bull. Moscow Univ. 2(2) 3–16
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281
Parameters
Compulsory Input Parameters
 1:
$\mathrm{x}\left({\mathbf{n}}\right)$ – double array

The sample observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$.
 2:
$\mathrm{cdf}$ – function handle or string containing name of mfile

cdf must return the value of the theoretical (null) cumulative distribution function for a given value of its argument.
[result] = cdf(x)
Input Parameters
 1:
$\mathrm{x}$ – double scalar

The argument for which
cdf must be evaluated.
Output Parameters
 1:
$\mathrm{result}$ – double scalar

The value of the theoretical (null) cumulative distribution function evaluated at
x.
Constraint:
${\mathbf{cdf}}$ must always return a value in the range
$\left[0.0,1.0\right]$ and
cdf must always satify the condition that
${\mathbf{cdf}}\left({x}_{1}\right)\le {\mathbf{cdf}}\left({x}_{2}\right)$ for any
${x}_{1}\le {x}_{2}$.
 3:
$\mathrm{ntype}$ – int64int32nag_int scalar

The statistic to be calculated, i.e., the choice of alternative hypothesis.
 ${\mathbf{ntype}}=1$
 Computes ${D}_{n}$, to test ${H}_{0}$ against ${H}_{1}$.
 ${\mathbf{ntype}}=2$
 Computes ${D}_{n}^{+}$, to test ${H}_{0}$ against ${H}_{2}$.
 ${\mathbf{ntype}}=3$
 Computes ${D}_{n}^{}$, to test ${H}_{0}$ against ${H}_{3}$.
Constraint:
${\mathbf{ntype}}=1$, $2$ or $3$.
Optional Input Parameters
 1:
$\mathrm{n}$ – int64int32nag_int scalar

Default:
the dimension of the array
x.
$n$, the number of observations in the sample.
Constraint:
${\mathbf{n}}\ge 1$.
Output Parameters
 1:
$\mathrm{d}$ – double scalar

The Kolmogorov–Smirnov test statistic (
${D}_{n}$,
${D}_{n}^{+}$ or
${D}_{n}^{}$ according to the value of
ntype).
 2:
$\mathrm{z}$ – double scalar

A standardized value, $Z$, of the test statistic, $D$, without the continuity correction applied.
 3:
$\mathrm{p}$ – double scalar

The probability,
$p$, associated with the observed value of
$D$, where
$D$ may
${D}_{n}$,
${D}_{n}^{+}$ or
${D}_{n}^{}$ depending on the value of
ntype (see
Description).
 4:
$\mathrm{sx}\left({\mathbf{n}}\right)$ – double array

The sample observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$, sorted in ascending order.
 5:
$\mathrm{ifail}$ – int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see
Error Indicators and Warnings).
Error Indicators and Warnings
Errors or warnings detected by the function:
 ${\mathbf{ifail}}=1$

On entry,  ${\mathbf{n}}<1$. 
 ${\mathbf{ifail}}=2$

On entry,  ${\mathbf{ntype}}\ne 1$, $2$ or $3$. 
 ${\mathbf{ifail}}=3$

The supplied theoretical cumulative distribution function returns a value less than $0.0$ or greater than $1.0$, thereby violating the definition of the cumulative distribution function.
 ${\mathbf{ifail}}=4$

The supplied theoretical cumulative distribution function is not a nondecreasing function thereby violating the definition of a cumulative distribution function, that is ${F}_{0}\left(x\right)>{F}_{0}\left(y\right)$ for some $x<y$.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
Accuracy
For most cases the approximation for $p$ given when $n>100$ has a relative error of less than $0.01$. The twosided probability is approximated by doubling the onesided probability. This is only good for small $p$, that is $p<0.10$, but very poor for large $p$. The error is always on the conservative side.
Further Comments
The time taken by nag_nonpar_test_ks_1sample_user (g08cc) increases with $n$ until $n>100$ at which point it drops and then increases slowly.
For a discrete theoretical cumulative distribution function
${F}_{0}\left(x\right)$,
${D}_{n}^{}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left\{{F}_{0}\left({x}_{\left(i\right)}\right){S}_{n}\left({x}_{\left(i\right)}\right),0\right\}$. Thus if you wish to provide a discrete distribution function the following adjustment needs to be made,
 for ${D}_{n}^{+}$, return $F\left(x\right)$ as $x$ as usual;
 for ${D}_{n}^{}$, return $F\left(xd\right)$ at $x$ where $d$ is the discrete jump in the distribution. For example $d=1$ for the Poisson or binomial distributions.
Example
The following example performs the one sample Kolmogorov–Smirnov test to test whether a sample of $30$ observations arise firstly from a uniform distribution $U\left(0,1\right)$ or secondly from a Normal distribution with mean $0.75$ and standard deviation $0.5$. The twosided test statistic, ${D}_{n}$, the standardized test statistic, $Z$, and the upper tail probability, $p$, are computed and then printed for each test.
Open in the MATLAB editor:
g08cc_example
function g08cc_example
fprintf('g08cc example results\n\n');
global xmean std;
xmean = 0.75;
std = 0.5;
x = [0.01; 0.30; 0.20; 0.90; 1.20; 0.09; 1.30; 0.18; 0.90; 0.48;
1.98; 0.03; 0.50; 0.07; 0.70; 0.60; 0.95; 1.00; 0.31; 1.45;
1.04; 1.25; 0.15; 0.75; 0.85; 0.22; 1.56; 0.81; 0.57; 0.55];
ntype = int64(1);
[d, z, p, sx, ifail] = g08cc( ...
x, @cdf, ntype);
fprintf('Test against normal distribution:\n');
fprintf(' mean = %7.2f\n', xmean);
fprintf(' standard deviation = %7.2f\n', std);
fprintf('\n\nTest statistic D = %8.4f\n', d);
fprintf('Z statistic = %8.4f\n', z);
fprintf('Tail probability = %8.4f\n', p);
function [result] = cdf(x)
global xmean std;
z = (xxmean)/std;
[result,ifail] = s15ab(z);
g08cc example results
Test against normal distribution:
mean = 0.75
standard deviation = 0.50
Test statistic D = 0.1439
Z statistic = 0.7882
Tail probability = 0.5262
PDF version (NAG web site
, 64bit version, 64bit version)
© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015