Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_univar_robust_1var_ci (g07ea)

## Purpose

nag_univar_robust_1var_ci (g07ea) computes a rank based (nonparametric) estimate and confidence interval for the location argument of a single population.

## Syntax

[theta, thetal, thetau, estcl, wlower, wupper, ifail] = g07ea(method, x, clevel, 'n', n)
[theta, thetal, thetau, estcl, wlower, wupper, ifail] = nag_univar_robust_1var_ci(method, x, clevel, 'n', n)

## Description

Consider a vector of independent observations, $x={\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)}^{\mathrm{T}}$ with unknown common symmetric density $f\left({x}_{i}-\theta \right)$. nag_univar_robust_1var_ci (g07ea) computes the Hodges–Lehmann location estimator (see Lehmann (1975)) of the centre of symmetry $\theta$, together with an associated confidence interval. The Hodges–Lehmann estimate is defined as
 $θ^=median xi+xj2,1≤i≤j≤n .$
Let $m=\left(n\left(n+1\right)\right)/2$ and let ${a}_{\mathit{k}}$, for $\mathit{k}=1,2,\dots ,m$ denote the $m$ ordered averages $\left({x}_{i}+{x}_{j}\right)/2$ for $1\le i\le j\le n$. Then
• if $m$ is odd, $\stackrel{^}{\theta }={a}_{k}$ where $k=\left(m+1\right)/2$;
• if $m$ is even, $\stackrel{^}{\theta }=\left({a}_{k}+{a}_{k+1}\right)/2$ where $k=m/2$.
This estimator arises from inverting the one-sample Wilcoxon signed-rank test statistic, $W\left(x-{\theta }_{0}\right)$, for testing the hypothesis that $\theta ={\theta }_{0}$. Effectively $W\left(x-{\theta }_{0}\right)$ is a monotonically decreasing step function of ${\theta }_{0}$ with
 $mean ​W=μ= nn+14, varW=σ2= nn+12n+124.$
The estimate $\stackrel{^}{\theta }$ is the solution to the equation $W\left(x-\stackrel{^}{\theta }\right)=\mu$; two methods are available for solving this equation. These methods avoid the computation of all the ordered averages ${a}_{k}$; this is because for large $n$ both the storage requirements and the computation time would be excessive.
The first is an exact method based on a set partitioning procedure on the set of all ordered averages $\left({x}_{i}+{x}_{j}\right)/2$ for $i\le j$. This is based on the algorithm proposed by Monahan (1984).
The second is an iterative algorithm, based on the Illinois method which is a modification of the regula falsi method, see McKean and Ryan (1977). This algorithm has proved suitable for the function $W\left(x-{\theta }_{0}\right)$ which is asymptotically linear as a function of ${\theta }_{0}$.
The confidence interval limits are also based on the inversion of the Wilcoxon test statistic.
Given a desired percentage for the confidence interval, $1-\alpha$, expressed as a proportion between $0$ and $1$, initial estimates for the lower and upper confidence limits of the Wilcoxon statistic are found from
 $Wl=μ-0.5+σΦ-1α/2$
and
 $Wu=μ+ 0.5+σ Φ-11-α /2,$
where ${\Phi }^{-1}$ is the inverse cumulative Normal distribution function.
${W}_{l}$ and ${W}_{u}$ are rounded to the nearest integer values. These estimates are then refined using an exact method if $n\le 80$, and a Normal approximation otherwise, to find ${W}_{l}$ and ${W}_{u}$ satisfying
 $PW≤Wl≤α/2 PW≤Wl+1>α/2$
and
 $PW≥Wu≤α /2 PW≥Wu- 1>α /2.$
Let ${W}_{u}=m-k$; then ${\theta }_{l}={a}_{k+1}$. This is the largest value ${\theta }_{l}$ such that $W\left(x-{\theta }_{l}\right)={W}_{u}$.
Let ${W}_{l}=k$; then ${\theta }_{u}={a}_{m-k}$. This is the smallest value ${\theta }_{u}$ such that $W\left(x-{\theta }_{u}\right)={W}_{l}$.
As in the case of $\stackrel{^}{\theta }$, these equations may be solved using either the exact or the iterative methods to find the values ${\theta }_{l}$ and ${\theta }_{u}$.
Then $\left({\theta }_{l},{\theta }_{u}\right)$ is the confidence interval for $\theta$. The confidence interval is thus defined by those values of ${\theta }_{0}$ such that the null hypothesis, $\theta ={\theta }_{0}$, is not rejected by the Wilcoxon signed-rank test at the $\left(100×\alpha \right)%$ level.

## References

Lehmann E L (1975) Nonparametrics: Statistical Methods Based on Ranks Holden–Day
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne
McKean J W and Ryan T A (1977) Algorithm 516: An algorithm for obtaining confidence intervals and point estimates based on ranks in the two-sample location problem ACM Trans. Math. Software 10 183–185
Monahan J F (1984) Algorithm 616: Fast computation of the Hodges–Lehman location estimator ACM Trans. Math. Software 10 265–270

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{method}$ – string (length ≥ 1)
Specifies the method to be used.
${\mathbf{method}}=\text{'E'}$
The exact algorithm is used.
${\mathbf{method}}=\text{'A'}$
The iterative algorithm is used.
Constraint: ${\mathbf{method}}=\text{'E'}$ or $\text{'A'}$.
2:     $\mathrm{x}\left({\mathbf{n}}\right)$ – double array
The sample observations, ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
3:     $\mathrm{clevel}$ – double scalar
The confidence interval desired.
For example, for a $95%$ confidence interval set ${\mathbf{clevel}}=0.95$.
Constraint: $0.0<{\mathbf{clevel}}<1.0$.

### Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the dimension of the array x.
$n$, the sample size.
Constraint: ${\mathbf{n}}\ge 2$.

### Output Parameters

1:     $\mathrm{theta}$ – double scalar
The estimate of the location, $\stackrel{^}{\theta }$.
2:     $\mathrm{thetal}$ – double scalar
The estimate of the lower limit of the confidence interval, ${\theta }_{l}$.
3:     $\mathrm{thetau}$ – double scalar
The estimate of the upper limit of the confidence interval, ${\theta }_{u}$.
4:     $\mathrm{estcl}$ – double scalar
An estimate of the actual percentage confidence of the interval found, as a proportion between $\left(0.0,1.0\right)$.
5:     $\mathrm{wlower}$ – double scalar
The upper value of the Wilcoxon test statistic, ${W}_{u}$, corresponding to the lower limit of the confidence interval.
6:     $\mathrm{wupper}$ – double scalar
The lower value of the Wilcoxon test statistic, ${W}_{l}$, corresponding to the upper limit of the confidence interval.
7:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{method}}\ne \text{'E'}$ or $\text{'A'}$, or ${\mathbf{n}}<2$, or ${\mathbf{clevel}}\le 0.0$, or ${\mathbf{clevel}}\ge 1.0$.
${\mathbf{ifail}}=2$
There is not enough information to compute a confidence interval since the whole sample consists of identical values.
${\mathbf{ifail}}=3$
For at least one of the estimates $\stackrel{^}{\theta }$, ${\theta }_{l}$ and ${\theta }_{u}$, the underlying iterative algorithm (when ${\mathbf{method}}=\text{'A'}$) failed to converge. This is an unlikely exit but the estimate should still be a reasonable approximation.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

nag_univar_robust_1var_ci (g07ea) should produce results accurate to five significant figures in the width of the confidence interval; that is the error for any one of the three estimates should be less than $0.00001×\left({\mathbf{thetau}}-{\mathbf{thetal}}\right)$.

The time taken increases with the sample size $n$.

## Example

The following program calculates a 95% confidence interval for $\theta$, a measure of symmetry of the sample of $50$ observations.
```function g07ea_example

fprintf('g07ea example results\n\n');

x = [-0.23;  0.35; -0.77;  0.35;  0.27; -0.72;  0.08; -0.40; -0.76;  0.45;
0.73;  0.74;  0.83; -0.87;  0.21;  0.29; -0.91; -0.04;  0.82; -0.38;
-0.31;  0.24; -0.47; -0.68; -0.77; -0.86; -0.59;  0.73;  0.39; -0.44;
0.63; -0.22; -0.07; -0.43; -0.21; -0.31;  0.64; -1.00; -0.86; -0.73];

method = 'Exact';
clevel = 0.95;

[theta, thetal, thetau, estcl, wlower, wupper, ifail] = ...
g07ea(method, x, clevel);

fprintf(' Location estimator     Confidence Interval\n\n');
fprintf('%13.4f%12s%7.4f,%7.4f )\n\n', theta, '(', thetal, thetau);
fprintf(' Corresponding Wilcoxon statistics\n\n');
fprintf('  Lower : %8.2f\n', wlower);
fprintf('  Upper : %8.2f\n', wupper);

```
```g07ea example results

Location estimator     Confidence Interval

-0.1300           (-0.3300, 0.0350 )

Corresponding Wilcoxon statistics

Lower :   556.00
Upper :   264.00
```