Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_nonpar_test_ks_2sample (g08cd)

## Purpose

nag_nonpar_test_ks_2sample (g08cd) performs the two sample Kolmogorov–Smirnov distribution test.

## Syntax

[d, z, p, sx, sy, ifail] = g08cd(x, y, ntype, 'n1', n1, 'n2', n2)
[d, z, p, sx, sy, ifail] = nag_nonpar_test_ks_2sample(x, y, ntype, 'n1', n1, 'n2', n2)

## Description

The data consists of two independent samples, one of size n1${n}_{1}$, denoted by x1,x2,,xn1${x}_{1},{x}_{2},\dots ,{x}_{{n}_{1}}$, and the other of size n2${n}_{2}$ denoted by y1,y2,,yn2${y}_{1},{y}_{2},\dots ,{y}_{{n}_{2}}$. Let F(x)$F\left(x\right)$ and G(x)$G\left(x\right)$ represent their respective, unknown, distribution functions. Also let S1(x)${S}_{1}\left(x\right)$ and S2(x)${S}_{2}\left(x\right)$ denote the values of the sample cumulative distribution functions at the point x$x$ for the two samples respectively.
The Kolmogorov–Smirnov test provides a test of the null hypothesis H0${H}_{0}$: F(x) = G(x)$F\left(x\right)=G\left(x\right)$ against one of the following alternative hypotheses:
 (i) H1${H}_{1}$: F(x) ≠ G(x)$F\left(x\right)\ne G\left(x\right)$. (ii) H2${H}_{2}$: F(x) > G(x)$F\left(x\right)>G\left(x\right)$. This alternative hypothesis is sometimes stated as, ‘The x$x$'s tend to be smaller than the y$y$'s’, i.e., it would be demonstrated in practical terms if the values of S1(x)${S}_{1}\left(x\right)$ tended to exceed the corresponding values of S2(x)${S}_{2}\left(x\right)$. (iii) H3${H}_{3}$: F(x) < G(x)$F\left(x\right). This alternative hypothesis is sometimes stated as, ‘The x$x$'s tend to be larger than the y$y$'s’, i.e., it would be demonstrated in practical terms if the values of S2(x)${S}_{2}\left(x\right)$ tended to exceed the corresponding values of S1(x)${S}_{1}\left(x\right)$.
One of the following test statistics is computed depending on the particular alternative null hypothesis specified (see the description of the parameter ntype in Section [Parameters]).
For the alternative hypothesis H1${H}_{1}$.
• Dn1,n2${D}_{{n}_{1},{n}_{2}}$ – the largest absolute deviation between the two sample cumulative distribution functions.
For the alternative hypothesis H2${H}_{2}$.
• Dn1,n2 + ${D}_{{n}_{1},{n}_{2}}^{+}$ – the largest positive deviation between the sample cumulative distribution function of the first sample, S1(x)${S}_{1}\left(x\right)$, and the sample cumulative distribution function of the second sample, S2(x)${S}_{2}\left(x\right)$. Formally Dn1,n2 + = max {S1(x)S2(x),0}${D}_{{n}_{1},{n}_{2}}^{+}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left\{{S}_{1}\left(x\right)-{S}_{2}\left(x\right),0\right\}$.
For the alternative hypothesis H3${H}_{3}$.
• Dn1,n2${D}_{{n}_{1},{n}_{2}}^{-}$ – the largest positive deviation between the sample cumulative distribution function of the second sample, S2(x)${S}_{2}\left(x\right)$, and the sample cumulative distribution function of the first sample, S1(x)${S}_{1}\left(x\right)$. Formally Dn1,n2 = max {S2(x)S1(x),0}${D}_{{n}_{1},{n}_{2}}^{-}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left\{{S}_{2}\left(x\right)-{S}_{1}\left(x\right),0\right\}$.
nag_nonpar_test_ks_2sample (g08cd) also returns the standardized statistic Z = sqrt((n1 + n2)/(n1n2)) × D$Z=\sqrt{\frac{{n}_{1}+{n}_{2}}{{n}_{1}{n}_{2}}}×D$, where D$D$ may be Dn1,n2${D}_{{n}_{1},{n}_{2}}$, Dn1,n2 + ${D}_{{n}_{1},{n}_{2}}^{+}$ or Dn1,n2${D}_{{n}_{1},{n}_{2}}^{-}$ depending on the choice of the alternative hypothesis. The distribution of this statistic converges asymptotically to a distribution given by Smirnov as n1${n}_{1}$ and n2${n}_{2}$ increase; see Feller (1948), Kendall and Stuart (1973), Kim and Jenrich (1973), Smirnov (1933) or Smirnov (1948)
The probability, under the null hypothesis, of obtaining a value of the test statistic as extreme as that observed, is computed. If max (n1,n2)2500$\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\le 2500$ and n1n210000${n}_{1}{n}_{2}\le 10000$ then an exact method given by Kim and Jenrich (see Kim and Jenrich (1973)) is used. Otherwise p$p$ is computed using the approximations suggested by Kim and Jenrich (1973). Note that the method used is only exact for continuous theoretical distributions. This method computes the two-sided probability. The one-sided probabilities are estimated by halving the two-sided probability. This is a good estimate for small p$p$, that is p0.10$p\le 0.10$, but it becomes very poor for larger p$p$.

## References

Conover W J (1980) Practical Nonparametric Statistics Wiley
Feller W (1948) On the Kolmogorov–Smirnov limit theorems for empirical distributions Ann. Math. Statist. 19 179–181
Kendall M G and Stuart A (1973) The Advanced Theory of Statistics (Volume 2) (3rd Edition) Griffin
Kim P J and Jenrich R I (1973) Tables of exact sampling distribution of the two sample Kolmogorov–Smirnov criterion Dmn(m < n)${D}_{mn}\left(m Selected Tables in Mathematical Statistics 1 80–129 American Mathematical Society
Siegel S (1956) Non-parametric Statistics for the Behavioral Sciences McGraw–Hill
Smirnov N (1933) Estimate of deviation between empirical distribution functions in two independent samples Bull. Moscow Univ. 2(2) 3–16
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions Ann. Math. Statist. 19 279–281

## Parameters

### Compulsory Input Parameters

1:     x(n1) – double array
n1, the dimension of the array, must satisfy the constraint n11${\mathbf{n1}}\ge 1$.
The observations from the first sample, x1,x2,,xn1${x}_{1},{x}_{2},\dots ,{x}_{{n}_{1}}$.
2:     y(n2) – double array
n2, the dimension of the array, must satisfy the constraint n21${\mathbf{n2}}\ge 1$.
The observations from the second sample, y1,y2,,yn2${y}_{1},{y}_{2},\dots ,{y}_{{n}_{2}}$.
3:     ntype – int64int32nag_int scalar
The statistic to be computed, i.e., the choice of alternative hypothesis.
ntype = 1${\mathbf{ntype}}=1$
Computes Dn1n2${D}_{{n}_{1}{n}_{2}}$, to test against H1${H}_{1}$.
ntype = 2${\mathbf{ntype}}=2$
Computes Dn1n2 + ${D}_{{n}_{1}{n}_{2}}^{+}$, to test against H2${H}_{2}$.
ntype = 3${\mathbf{ntype}}=3$
Computes Dn1n2${D}_{{n}_{1}{n}_{2}}^{-}$, to test against H3${H}_{3}$.
Constraint: ntype = 1${\mathbf{ntype}}=1$, 2$2$ or 3$3$.

### Optional Input Parameters

1:     n1 – int64int32nag_int scalar
Default: The dimension of the array x.
The number of observations in the first sample, n1${n}_{1}$.
Constraint: n11${\mathbf{n1}}\ge 1$.
2:     n2 – int64int32nag_int scalar
Default: The dimension of the array y.
The number of observations in the second sample, n2${n}_{2}$.
Constraint: n21${\mathbf{n2}}\ge 1$.

None.

### Output Parameters

1:     d – double scalar
The Kolmogorov–Smirnov test statistic (Dn1n2${D}_{{n}_{1}{n}_{2}}$, Dn1n2 + ${D}_{{n}_{1}{n}_{2}}^{+}$ or Dn1n2${D}_{{n}_{1}{n}_{2}}^{-}$ according to the value of ntype).
2:     z – double scalar
A standardized value, Z $Z$, of the test statistic, D $D$, without any correction for continuity.
3:     p – double scalar
The tail probability associated with the observed value of D$D$, where D$D$ may be Dn1,n2,Dn1,n2 + ${D}_{{n}_{1},{n}_{2}},{D}_{{n}_{1},{n}_{2}}^{+}$ or Dn1,n2${D}_{{n}_{1},{n}_{2}}^{-}$ depending on the value of ntype (see Section [Description]).
4:     sx(n1) – double array
The observations from the first sample sorted in ascending order.
5:     sy(n2) – double array
The observations from the second sample sorted in ascending order.
6:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, n1 < 1${\mathbf{n1}}<1$, or n2 < 1${\mathbf{n2}}<1$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, ntype ≠ 1${\mathbf{ntype}}\ne 1$, 2$2$ or 3$3$.
ifail = 3${\mathbf{ifail}}=3$
The iterative procedure used in the approximation of the probability for large n1${n}_{1}$ and n2${n}_{2}$ did not converge. For the two-sided test, p = 1$p=1$ is returned. For the one-sided test, p = 0.5$p=0.5$ is returned.

## Accuracy

The large sample distributions used as approximations to the exact distribution should have a relative error of less than 5% for most cases.

The time taken by nag_nonpar_test_ks_2sample (g08cd) increases with n1${n}_{1}$ and n2${n}_{2}$, until n1n2 > 10000${n}_{1}{n}_{2}>10000$ or max (n1,n2)2500$\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({n}_{1},{n}_{2}\right)\ge 2500$. At this point one of the approximations is used and the time decreases significantly. The time then increases again modestly with n1${n}_{1}$ and n2${n}_{2}$.

## Example

```function nag_nonpar_test_ks_2sample_example
x = [1.5902480498365;
0.4514344715575777;
0.7425605404715657;
0.4500701410943835;
1.757489613962739;
0.09494659709066995;
0.3611426417219349;
0.8655310991929152;
0.07874046899432821;
1.15034115657093;
1.909225474130498;
1.248205642566939;
1.278136611206321;
1.376242882235565;
0.4024793749242044;
1.762011956845783;
0.6952326353936773;
0.9876414433367825;
0.6059391432983038;
0.2078601007360235;
1.564305035404043;
1.746505783532216;
0.1778870249628505;
0.1076281399262869;
0.07704165044712488;
0.04567587466526592;
0.8039773560869385;
1.241984667916896;
1.797786082814494;
0.852819977142019;
1.747622472821148;
1.912044530456294;
0.4706479344188908;
1.243486362938038;
1.279342351641843;
0.8903608734478126;
1.727202408903865;
0.2581867825828414;
0.9415224497978657;
1.74941093213823;
1.20905525205517;
0.9257769700555145;
0.9774905558012059;
0.8557640721625349;
0.2648216307181972;
0.2305151298211215;
1.214582160870085;
0.0008087845523660937;
0.04826634252569287;
1.422324123840062;
1.64461234171838;
1.242902747480194;
1.859348219383043;
0.8759548065432361;
0.9770887783336585;
1.077045946574884;
0.07040286404180876;
0.725208635893021;
1.129721212172624;
0.7794878837807858;
1.236501744194477;
0.8327862848080944;
1.932002357334167;
0.01096554652191538;
1.430490703604279;
0.9004992406397488;
0.8208346310843287;
0.7213215653615261;
1.032756476884525;
0.3378646599255028;
1.975473898973496;
1.307202749307935;
1.022550018984354;
0.5703968003457041;
1.174687331604432;
1.691927534708742;
0.09038851332572448;
0.08328616335992274;
0.3974243748119088;
0.4000804015026145;
1.734779923254609;
0.072131066247919;
1.330366567063721;
0.08900990107193633;
0.6128580701836494;
1.25978558572682;
0.3681357787827774;
1.766612133578424;
0.5447167419613813;
0.1468692016200379;
0.08565369904623682;
1.73695078139111;
1.502452715832918;
1.023663015528883;
1.548981938796881;
0.06591087486198742;
1.089678519368579;
0.7863284085754246;
0.4083539838920379;
0.1336746295053808];
y = [0.3959382482212902;
2.189059814171972;
0.5309091506354565;
1.398213120735397;
1.226993173102978;
2.092465206711672;
1.376152675038049;
0.2635476162603962;
0.6826139872981392;
1.725437958005721;
1.218220941115947;
0.2907559334793868;
0.2888496459940526;
2.237322438311175;
1.335206853771786;
1.482575912633977;
1.511554656117706;
1.49444503365132;
1.647783675961666;
1.093415807808477;
1.929516829926758;
1.771633171736251;
1.362424985201225;
0.7600483503138998;
1.474848032784191;
1.579250523709051;
1.100106550316349;
0.9988674396189126;
1.974550189113563;
0.6728895918936394;
0.6568563195351251;
1.038967939288667;
1.420212415024878;
2.156198816080763;
1.879016986416847;
1.409577351753394;
1.953857139342458;
0.3220659541919986;
2.244466171121614;
1.738164359951565;
0.4534069003591212;
0.8715218615975232;
1.596006863571959;
1.501743503682336;
2.099525595470502;
0.8538189862774034;
0.9554670591088673;
1.668907067970278;
1.517236797575171;
1.478170841410264];
ntype = int64(1);
[d, z, p, sx, sy, ifail] = nag_nonpar_test_ks_2sample(x, y, ntype)
```
```

d =

0.3600

z =

0.0624

p =

2.8438e-04

sx =

0.0008
0.0110
0.0457
0.0483
0.0659
0.0704
0.0721
0.0770
0.0787
0.0833
0.0857
0.0890
0.0904
0.0949
0.1076
0.1337
0.1469
0.1779
0.2079
0.2305
0.2582
0.2648
0.3379
0.3611
0.3681
0.3974
0.4001
0.4025
0.4084
0.4501
0.4514
0.4706
0.5447
0.5704
0.6059
0.6129
0.6952
0.7213
0.7252
0.7426
0.7795
0.7863
0.8040
0.8208
0.8328
0.8528
0.8558
0.8655
0.8760
0.8904
0.9005
0.9258
0.9415
0.9771
0.9775
0.9876
1.0226
1.0237
1.0328
1.0770
1.0897
1.1297
1.1503
1.1747
1.2091
1.2146
1.2365
1.2420
1.2429
1.2435
1.2482
1.2598
1.2781
1.2793
1.3072
1.3304
1.3762
1.4223
1.4305
1.5025
1.5490
1.5643
1.5902
1.6446
1.6919
1.7272
1.7348
1.7370
1.7465
1.7476
1.7494
1.7575
1.7620
1.7666
1.7978
1.8593
1.9092
1.9120
1.9320
1.9755

sy =

0.2635
0.2888
0.2908
0.3221
0.3959
0.4534
0.5309
0.6569
0.6729
0.6826
0.7600
0.8538
0.8715
0.9555
0.9989
1.0390
1.0934
1.1001
1.2182
1.2270
1.3352
1.3624
1.3762
1.3982
1.4096
1.4202
1.4748
1.4782
1.4826
1.4944
1.5017
1.5116
1.5172
1.5793
1.5960
1.6478
1.6689
1.7254
1.7382
1.7716
1.8790
1.9295
1.9539
1.9746
2.0925
2.0995
2.1562
2.1891
2.2373
2.2445

ifail =

0

```
```function g08cd_example
x = [1.5902480498365;
0.4514344715575777;
0.7425605404715657;
0.4500701410943835;
1.757489613962739;
0.09494659709066995;
0.3611426417219349;
0.8655310991929152;
0.07874046899432821;
1.15034115657093;
1.909225474130498;
1.248205642566939;
1.278136611206321;
1.376242882235565;
0.4024793749242044;
1.762011956845783;
0.6952326353936773;
0.9876414433367825;
0.6059391432983038;
0.2078601007360235;
1.564305035404043;
1.746505783532216;
0.1778870249628505;
0.1076281399262869;
0.07704165044712488;
0.04567587466526592;
0.8039773560869385;
1.241984667916896;
1.797786082814494;
0.852819977142019;
1.747622472821148;
1.912044530456294;
0.4706479344188908;
1.243486362938038;
1.279342351641843;
0.8903608734478126;
1.727202408903865;
0.2581867825828414;
0.9415224497978657;
1.74941093213823;
1.20905525205517;
0.9257769700555145;
0.9774905558012059;
0.8557640721625349;
0.2648216307181972;
0.2305151298211215;
1.214582160870085;
0.0008087845523660937;
0.04826634252569287;
1.422324123840062;
1.64461234171838;
1.242902747480194;
1.859348219383043;
0.8759548065432361;
0.9770887783336585;
1.077045946574884;
0.07040286404180876;
0.725208635893021;
1.129721212172624;
0.7794878837807858;
1.236501744194477;
0.8327862848080944;
1.932002357334167;
0.01096554652191538;
1.430490703604279;
0.9004992406397488;
0.8208346310843287;
0.7213215653615261;
1.032756476884525;
0.3378646599255028;
1.975473898973496;
1.307202749307935;
1.022550018984354;
0.5703968003457041;
1.174687331604432;
1.691927534708742;
0.09038851332572448;
0.08328616335992274;
0.3974243748119088;
0.4000804015026145;
1.734779923254609;
0.072131066247919;
1.330366567063721;
0.08900990107193633;
0.6128580701836494;
1.25978558572682;
0.3681357787827774;
1.766612133578424;
0.5447167419613813;
0.1468692016200379;
0.08565369904623682;
1.73695078139111;
1.502452715832918;
1.023663015528883;
1.548981938796881;
0.06591087486198742;
1.089678519368579;
0.7863284085754246;
0.4083539838920379;
0.1336746295053808];
y = [0.3959382482212902;
2.189059814171972;
0.5309091506354565;
1.398213120735397;
1.226993173102978;
2.092465206711672;
1.376152675038049;
0.2635476162603962;
0.6826139872981392;
1.725437958005721;
1.218220941115947;
0.2907559334793868;
0.2888496459940526;
2.237322438311175;
1.335206853771786;
1.482575912633977;
1.511554656117706;
1.49444503365132;
1.647783675961666;
1.093415807808477;
1.929516829926758;
1.771633171736251;
1.362424985201225;
0.7600483503138998;
1.474848032784191;
1.579250523709051;
1.100106550316349;
0.9988674396189126;
1.974550189113563;
0.6728895918936394;
0.6568563195351251;
1.038967939288667;
1.420212415024878;
2.156198816080763;
1.879016986416847;
1.409577351753394;
1.953857139342458;
0.3220659541919986;
2.244466171121614;
1.738164359951565;
0.4534069003591212;
0.8715218615975232;
1.596006863571959;
1.501743503682336;
2.099525595470502;
0.8538189862774034;
0.9554670591088673;
1.668907067970278;
1.517236797575171;
1.478170841410264];
ntype = int64(1);
[d, z, p, sx, sy, ifail] = g08cd(x, y, ntype)
```
```

d =

0.3600

z =

0.0624

p =

2.8438e-04

sx =

0.0008
0.0110
0.0457
0.0483
0.0659
0.0704
0.0721
0.0770
0.0787
0.0833
0.0857
0.0890
0.0904
0.0949
0.1076
0.1337
0.1469
0.1779
0.2079
0.2305
0.2582
0.2648
0.3379
0.3611
0.3681
0.3974
0.4001
0.4025
0.4084
0.4501
0.4514
0.4706
0.5447
0.5704
0.6059
0.6129
0.6952
0.7213
0.7252
0.7426
0.7795
0.7863
0.8040
0.8208
0.8328
0.8528
0.8558
0.8655
0.8760
0.8904
0.9005
0.9258
0.9415
0.9771
0.9775
0.9876
1.0226
1.0237
1.0328
1.0770
1.0897
1.1297
1.1503
1.1747
1.2091
1.2146
1.2365
1.2420
1.2429
1.2435
1.2482
1.2598
1.2781
1.2793
1.3072
1.3304
1.3762
1.4223
1.4305
1.5025
1.5490
1.5643
1.5902
1.6446
1.6919
1.7272
1.7348
1.7370
1.7465
1.7476
1.7494
1.7575
1.7620
1.7666
1.7978
1.8593
1.9092
1.9120
1.9320
1.9755

sy =

0.2635
0.2888
0.2908
0.3221
0.3959
0.4534
0.5309
0.6569
0.6729
0.6826
0.7600
0.8538
0.8715
0.9555
0.9989
1.0390
1.0934
1.1001
1.2182
1.2270
1.3352
1.3624
1.3762
1.3982
1.4096
1.4202
1.4748
1.4782
1.4826
1.4944
1.5017
1.5116
1.5172
1.5793
1.5960
1.6478
1.6689
1.7254
1.7382
1.7716
1.8790
1.9295
1.9539
1.9746
2.0925
2.0995
2.1562
2.1891
2.2373
2.2445

ifail =

0

```