Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_univar_robust_1var_ci (g07ea)

## Purpose

nag_univar_robust_1var_ci (g07ea) computes a rank based (nonparametric) estimate and confidence interval for the location parameter of a single population.

## Syntax

[theta, thetal, thetau, estcl, wlower, wupper, ifail] = g07ea(method, x, clevel, 'n', n)
[theta, thetal, thetau, estcl, wlower, wupper, ifail] = nag_univar_robust_1var_ci(method, x, clevel, 'n', n)

## Description

Consider a vector of independent observations, x = (x1,x2,,xn)T$x={\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)}^{\mathrm{T}}$ with unknown common symmetric density f(xiθ)$f\left({x}_{i}-\theta \right)$. nag_univar_robust_1var_ci (g07ea) computes the Hodges–Lehmann location estimator (see Lehmann (1975)) of the centre of symmetry θ$\theta$, together with an associated confidence interval. The Hodges–Lehmann estimate is defined as
 θ̂ = median {(xi + xj)/2,1 ≤ i ≤ j ≤ n} . $θ^=median {xi+xj2,1≤i≤j≤n} .$
Let m = (n(n + 1)) / 2$m=\left(n\left(n+1\right)\right)/2$ and let ak${a}_{\mathit{k}}$, for k = 1,2,,m$\mathit{k}=1,2,\dots ,m$ denote the m$m$ ordered averages (xi + xj) / 2$\left({x}_{i}+{x}_{j}\right)/2$ for 1ijn$1\le i\le j\le n$. Then
• if m$m$ is odd, θ̂ = ak$\stackrel{^}{\theta }={a}_{k}$ where k = (m + 1) / 2$k=\left(m+1\right)/2$;
• if m$m$ is even, θ̂ = (ak + ak + 1) / 2$\stackrel{^}{\theta }=\left({a}_{k}+{a}_{k+1}\right)/2$ where k = m / 2$k=m/2$.
This estimator arises from inverting the one-sample Wilcoxon signed-rank test statistic, W(xθ0)$W\left(x-{\theta }_{0}\right)$, for testing the hypothesis that θ = θ0$\theta ={\theta }_{0}$. Effectively W(xθ0)$W\left(x-{\theta }_{0}\right)$ is a monotonically decreasing step function of θ0${\theta }_{0}$ with
 mean ​(W) = μ = (n(n + 1))/4, var(W) = σ2 = (n(n + 1)(2n + 1))/24.
$mean ​(W)=μ= n(n+1)4, var(W)=σ2= n(n+1)(2n+1)24.$
The estimate θ̂$\stackrel{^}{\theta }$ is the solution to the equation W(xθ̂) = μ$W\left(x-\stackrel{^}{\theta }\right)=\mu$; two methods are available for solving this equation. These methods avoid the computation of all the ordered averages ak${a}_{k}$; this is because for large n$n$ both the storage requirements and the computation time would be excessive.
The first is an exact method based on a set partitioning procedure on the set of all ordered averages (xi + xj) / 2$\left({x}_{i}+{x}_{j}\right)/2$ for ij$i\le j$. This is based on the algorithm proposed by Monahan (1984).
The second is an iterative algorithm, based on the Illinois method which is a modification of the regula falsi method, see McKean and Ryan (1977). This algorithm has proved suitable for the function W(xθ0)$W\left(x-{\theta }_{0}\right)$ which is asymptotically linear as a function of θ0${\theta }_{0}$.
The confidence interval limits are also based on the inversion of the Wilcoxon test statistic.
Given a desired percentage for the confidence interval, 1α$1-\alpha$, expressed as a proportion between 0$0$ and 1$1$, initial estimates for the lower and upper confidence limits of the Wilcoxon statistic are found from
 Wl = μ − 0.5 + (σΦ − 1(α / 2)) $Wl=μ-0.5+(σΦ-1(α/2))$
and
 Wu = μ + 0.5 + (σΦ − 1(1 − α / 2)), $Wu=μ+ 0.5+(σ Φ-1(1-α /2)),$
where Φ1${\Phi }^{-1}$ is the inverse cumulative Normal distribution function.
Wl${W}_{l}$ and Wu${W}_{u}$ are rounded to the nearest integer values. These estimates are then refined using an exact method if n80$n\le 80$, and a Normal approximation otherwise, to find Wl${W}_{l}$ and Wu${W}_{u}$ satisfying
 P(W ≤ Wl) ≤ α / 2 P(W ≤ Wl + 1) > α / 2
$P(W≤Wl)≤α/2 P(W≤Wl+1)>α/2$
and
 P(W ≥ Wu) ≤ α / 2 P(W ≥ Wu − 1) > α / 2.
$P(W≥Wu)≤α /2 P(W≥Wu- 1)>α /2.$
Let Wu = mk${W}_{u}=m-k$; then θl = ak + 1${\theta }_{l}={a}_{k+1}$. This is the largest value θl${\theta }_{l}$ such that W(xθl) = Wu$W\left(x-{\theta }_{l}\right)={W}_{u}$.
Let Wl = k${W}_{l}=k$; then θu = amk${\theta }_{u}={a}_{m-k}$. This is the smallest value θu${\theta }_{u}$ such that W(xθu) = Wl$W\left(x-{\theta }_{u}\right)={W}_{l}$.
As in the case of θ̂$\stackrel{^}{\theta }$, these equations may be solved using either the exact or the iterative methods to find the values θl${\theta }_{l}$ and θu${\theta }_{u}$.
Then (θl,θu)$\left({\theta }_{l},{\theta }_{u}\right)$ is the confidence interval for θ$\theta$. The confidence interval is thus defined by those values of θ0${\theta }_{0}$ such that the null hypothesis, θ = θ0$\theta ={\theta }_{0}$, is not rejected by the Wilcoxon signed-rank test at the (100 × α)%$\left(100×\alpha \right)%$ level.

## References

Lehmann E L (1975) Nonparametrics: Statistical Methods Based on Ranks Holden–Day
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne
McKean J W and Ryan T A (1977) Algorithm 516: An algorithm for obtaining confidence intervals and point estimates based on ranks in the two-sample location problem ACM Trans. Math. Software 10 183–185
Monahan J F (1984) Algorithm 616: Fast computation of the Hodges–Lehman location estimator ACM Trans. Math. Software 10 265–270

## Parameters

### Compulsory Input Parameters

1:     method – string (length ≥ 1)
Specifies the method to be used.
method = 'E'${\mathbf{method}}=\text{'E'}$
The exact algorithm is used.
method = 'A'${\mathbf{method}}=\text{'A'}$
The iterative algorithm is used.
Constraint: method = 'E'${\mathbf{method}}=\text{'E'}$ or 'A'$\text{'A'}$.
2:     x(n) – double array
n, the dimension of the array, must satisfy the constraint n2${\mathbf{n}}\ge 2$.
The sample observations, xi${x}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
3:     clevel – double scalar
The confidence interval desired.
For example, for a 95%$95%$ confidence interval set clevel = 0.95${\mathbf{clevel}}=0.95$.
Constraint: 0.0 < clevel < 1.0$0.0<{\mathbf{clevel}}<1.0$.

### Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array x.
n$n$, the sample size.
Constraint: n2${\mathbf{n}}\ge 2$.

wrk iwrk

### Output Parameters

1:     theta – double scalar
The estimate of the location, θ̂$\stackrel{^}{\theta }$.
2:     thetal – double scalar
The estimate of the lower limit of the confidence interval, θl${\theta }_{l}$.
3:     thetau – double scalar
The estimate of the upper limit of the confidence interval, θu${\theta }_{u}$.
4:     estcl – double scalar
An estimate of the actual percentage confidence of the interval found, as a proportion between (0.0,1.0)$\left(0.0,1.0\right)$.
5:     wlower – double scalar
The upper value of the Wilcoxon test statistic, Wu${W}_{u}$, corresponding to the lower limit of the confidence interval.
6:     wupper – double scalar
The lower value of the Wilcoxon test statistic, Wl${W}_{l}$, corresponding to the upper limit of the confidence interval.
7:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, method ≠ 'E'${\mathbf{method}}\ne \text{'E'}$ or 'A'$\text{'A'}$, or n < 2${\mathbf{n}}<2$, or clevel ≤ 0.0${\mathbf{clevel}}\le 0.0$, or clevel ≥ 1.0${\mathbf{clevel}}\ge 1.0$.
ifail = 2${\mathbf{ifail}}=2$
There is not enough information to compute a confidence interval since the whole sample consists of identical values.
ifail = 3${\mathbf{ifail}}=3$
For at least one of the estimates θ̂$\stackrel{^}{\theta }$, θl${\theta }_{l}$ and θu${\theta }_{u}$, the underlying iterative algorithm (when method = 'A'${\mathbf{method}}=\text{'A'}$) failed to converge. This is an unlikely exit but the estimate should still be a reasonable approximation.

## Accuracy

nag_univar_robust_1var_ci (g07ea) should produce results accurate to five significant figures in the width of the confidence interval; that is the error for any one of the three estimates should be less than 0.00001 × (thetauthetal)$0.00001×\left({\mathbf{thetau}}-{\mathbf{thetal}}\right)$.

The time taken increases with the sample size n$n$.

## Example

```function nag_univar_robust_1var_ci_example
method = 'Exact';
x = [-0.23;
0.35;
-0.77;
0.35;
0.27;
-0.72;
0.08;
-0.4;
-0.76;
0.45;
0.73;
0.74;
0.83;
-0.87;
0.21;
0.29;
-0.91;
-0.04;
0.82;
-0.38;
-0.31;
0.24;
-0.47;
-0.68;
-0.77;
-0.86;
-0.59;
0.73;
0.39;
-0.44;
0.63;
-0.22;
-0.07;
-0.43;
-0.21;
-0.31;
0.64;
-1;
-0.86;
-0.73];
clevel = 0.95;
[theta, thetal, thetau, estcl, wlower, wupper, ifail] = ...
nag_univar_robust_1var_ci(method, x, clevel)
```
```

theta =

-0.1300

thetal =

-0.3300

thetau =

0.0350

estcl =

0.9514

wlower =

556

wupper =

264

ifail =

0

```
```function g07ea_example
method = 'Exact';
x = [-0.23;
0.35;
-0.77;
0.35;
0.27;
-0.72;
0.08;
-0.4;
-0.76;
0.45;
0.73;
0.74;
0.83;
-0.87;
0.21;
0.29;
-0.91;
-0.04;
0.82;
-0.38;
-0.31;
0.24;
-0.47;
-0.68;
-0.77;
-0.86;
-0.59;
0.73;
0.39;
-0.44;
0.63;
-0.22;
-0.07;
-0.43;
-0.21;
-0.31;
0.64;
-1;
-0.86;
-0.73];
clevel = 0.95;
[theta, thetal, thetau, estcl, wlower, wupper, ifail] = g07ea(method, x, clevel)
```
```

theta =

-0.1300

thetal =

-0.3300

thetau =

0.0350

estcl =

0.9514

wlower =

556

wupper =

264

ifail =

0

```

Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013