hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_nonpar_test_ks_1sample_user (g08cc)

Purpose

nag_nonpar_test_ks_1sample_user (g08cc) performs the one sample Kolmogorov–Smirnov distribution test, using a user-specified distribution.

Syntax

[d, z, p, sx, ifail] = g08cc(x, cdf, ntype, 'n', n)
[d, z, p, sx, ifail] = nag_nonpar_test_ks_1sample_user(x, cdf, ntype, 'n', n)

Description

The data consists of a single sample of nn observations, denoted by x1,x2,,xnx1,x2,,xn. Let Sn(x(i))Sn(x(i)) and F0(x(i))F0(x(i)) represent the sample cumulative distribution function and the theoretical (null) cumulative distribution function respectively at the point x(i)x(i), where x(i)x(i) is the iith smallest sample observation.
The Kolmogorov–Smirnov test provides a test of the null hypothesis H0H0: the data are a random sample of observations from a theoretical distribution specified by you (in cdf) against one of the following alternative hypotheses.
(i) H1H1: the data cannot be considered to be a random sample from the specified null distribution.
(ii) H2H2: the data arise from a distribution which dominates the specified null distribution. In practical terms, this would be demonstrated if the values of the sample cumulative distribution function Sn(x)Sn(x) tended to exceed the corresponding values of the theoretical cumulative distribution function F0(x)F0(x).
(iii) H3H3: the data arise from a distribution which is dominated by the specified null distribution. In practical terms, this would be demonstrated if the values of the theoretical cumulative distribution function F0(x)F0(x) tended to exceed the corresponding values of the sample cumulative distribution function Sn(x)Sn(x).
One of the following test statistics is computed depending on the particular alternative hypothesis specified (see the description of the parameter ntype in Section [Parameters]).
For the alternative hypothesis H1H1:
For the alternative hypothesis H2H2:
For the alternative hypothesis H3H3:
The standardized statistic, Z = D × sqrt(n)Z=D×n, is also computed, where DD may be Dn,Dn + Dn,Dn+ or DnDn- depending on the choice of the alternative hypothesis. This is the standardized value of DD with no continuity correction applied and the distribution of ZZ converges asymptotically to a limiting distribution, first derived by Kolmogorov (1933), and then tabulated by Smirnov (1948). The asymptotic distributions for the one-sided statistics were obtained by Smirnov (1933).
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If n100n100, an exact method given by Conover (1980) is used. Note that the method used is only exact for continuous theoretical distributions and does not include Conover's modification for discrete distributions. This method computes the one-sided probabilities. The two-sided probabilities are estimated by doubling the one-sided probability. This is a good estimate for small pp, that is p0.10p0.10, but it becomes very poor for larger pp. If n > 100n>100 then pp is computed using the Kolmogorov–Smirnov limiting distributions; see Feller (1948), Kendall and Stuart (1973), Kolmogorov (1933), Smirnov (1933) and Smirnov (1948).

References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kolmogorov A N (1933) Sulla determinazione empirica di una legge di distribuzione Giornale dell' Istituto Italiano degli Attuari 4 83–91
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent samples Bull. Moscow Univ. 2(2) 3–16
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281

Parameters

Compulsory Input Parameters

1:     x(n) – double array
n, the dimension of the array, must satisfy the constraint n1n1.
The sample observations, x1,x2,,xnx1,x2,,xn.
2:     cdf – function handle or string containing name of m-file
cdf must return the value of the theoretical (null) cumulative distribution function for a given value of its argument.
[result] = cdf(x)

Input Parameters

1:     x – double scalar
The argument for which cdf must be evaluated.

Output Parameters

1:     result – double scalar
The result of the function.
Constraint: cdfcdf must always return a value in the range [0.0,1.0][0.0,1.0] and cdf must always satify the condition that cdf(x1)cdf(x2)cdfx1cdfx2 for any x1x2x1x2.
3:     ntype – int64int32nag_int scalar
The statistic to be calculated, i.e., the choice of alternative hypothesis.
ntype = 1ntype=1
Computes DnDn, to test H0H0 against H1H1.
ntype = 2ntype=2
Computes Dn + Dn+, to test H0H0 against H2H2.
ntype = 3ntype=3
Computes DnDn-, to test H0H0 against H3H3.
Constraint: ntype = 1ntype=1, 22 or 33.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array x.
nn, the number of observations in the sample.
Constraint: n1n1.

Input Parameters Omitted from the MATLAB Interface

None.

Output Parameters

1:     d – double scalar
The Kolmogorov–Smirnov test statistic ( Dn D n , Dn + D n + or Dn D n - according to the value of ntype).
2:     z – double scalar
A standardized value, ZZ, of the test statistic, DD, without the continuity correction applied.
3:     p – double scalar
The probability, pp, associated with the observed value of DD, where DD may DnDn, Dn + Dn+ or DnDn- depending on the value of ntype (see Section [Description]).
4:     sx(n) – double array
The sample observations, x1,x2,,xnx1,x2,,xn, sorted in ascending order.
5:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,n < 1n<1.
  ifail = 2ifail=2
On entry,ntype1ntype1, 22 or 33.
  ifail = 3ifail=3
The supplied theoretical cumulative distribution function returns a value less than 0.00.0 or greater than 1.01.0, thereby violating the definition of the cumulative distribution function.
  ifail = 4ifail=4
The supplied theoretical cumulative distribution function is not a nondecreasing function thereby violating the definition of a cumulative distribution function, that is F0(x) > F0(y)F0(x)>F0(y) for some x < yx<y.

Accuracy

For most cases the approximation for pp given when n > 100n>100 has a relative error of less than 0.010.01. The two-sided probability is approximated by doubling the one-sided probability. This is only good for small pp, that is p < 0.10p<0.10, but very poor for large pp. The error is always on the conservative side.

Further Comments

The time taken by nag_nonpar_test_ks_1sample_user (g08cc) increases with nn until n > 100n>100 at which point it drops and then increases slowly.
For a discrete theoretical cumulative distribution function F0(x)F0(x), Dn = max {F0(x(i))Sn(x(i)),0}Dn-=max{F0(x(i))-Sn(x(i)),0}. Thus if you wish to provide a discrete distribution function the following adjustment needs to be made,

Example

function nag_nonpar_test_ks_1sample_user_example
x = [0.01;
     0.3;
     0.2;
     0.9;
     1.2;
     0.09;
     1.3;
     0.18;
     0.9;
     0.48;
     1.98;
     0.03;
     0.5;
     0.07;
     0.7;
     0.6;
     0.95;
     1;
     0.31;
     1.45;
     1.04;
     1.25;
     0.15;
     0.75;
     0.85;
     0.22;
     1.56;
     0.81;
     0.57;
     0.55];
ntype = int64(1);
[d, z, p, sx, ifail] = nag_nonpar_test_ks_1sample_user(x, @cdf, ntype)

function [result] = cdf(x)

  if x < 0
    result = 0;
  elseif x > 2
    result = 1;
  else
    result = x/2;
  end
 

d =

    0.2800


z =

    1.5336


p =

    0.0143


sx =

    0.0100
    0.0300
    0.0700
    0.0900
    0.1500
    0.1800
    0.2000
    0.2200
    0.3000
    0.3100
    0.4800
    0.5000
    0.5500
    0.5700
    0.6000
    0.7000
    0.7500
    0.8100
    0.8500
    0.9000
    0.9000
    0.9500
    1.0000
    1.0400
    1.2000
    1.2500
    1.3000
    1.4500
    1.5600
    1.9800


ifail =

                    0


function g08cc_example
x = [0.01;
     0.3;
     0.2;
     0.9;
     1.2;
     0.09;
     1.3;
     0.18;
     0.9;
     0.48;
     1.98;
     0.03;
     0.5;
     0.07;
     0.7;
     0.6;
     0.95;
     1;
     0.31;
     1.45;
     1.04;
     1.25;
     0.15;
     0.75;
     0.85;
     0.22;
     1.56;
     0.81;
     0.57;
     0.55];
ntype = int64(1);
[d, z, p, sx, ifail] = g08cc(x, @cdf, ntype)

function [result] = cdf(x)

  if x < 0
    result = 0;
  elseif x > 2
    result = 1;
  else
    result = x/2;
  end
 

d =

    0.2800


z =

    1.5336


p =

    0.0143


sx =

    0.0100
    0.0300
    0.0700
    0.0900
    0.1500
    0.1800
    0.2000
    0.2200
    0.3000
    0.3100
    0.4800
    0.5000
    0.5500
    0.5700
    0.6000
    0.7000
    0.7500
    0.8100
    0.8500
    0.9000
    0.9000
    0.9500
    1.0000
    1.0400
    1.2000
    1.2500
    1.3000
    1.4500
    1.5600
    1.9800


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013