Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_univar_robust_1var_mestim (g07db)

## Purpose

nag_univar_robust_1var_mestim (g07db) computes an M$M$-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

## Syntax

[theta, sigma, rs, nit, wrk, ifail] = g07db(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, 'n', n, 'maxit', maxit)
[theta, sigma, rs, nit, wrk, ifail] = nag_univar_robust_1var_mestim(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, 'n', n, 'maxit', maxit)

## Description

The data consists of a sample of size n$n$, denoted by x1,x2,,xn${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable X$X$.
The xi${x}_{i}$ are assumed to be independent with an unknown distribution function of the form
 F((xi − θ) / σ) $F((xi-θ)/σ)$
where θ$\theta$ is a location parameter, and σ$\sigma$ is a scale parameter. M$M$-estimators of θ$\theta$ and σ$\sigma$ are given by the solution to the following system of equations:
 n ∑ ψ((xi − θ̂) / σ̂) = 0 i = 1
$∑i=1nψ ( (xi-θ^) /σ^)=0$
(1)
 n ∑ χ((xi − θ̂) / σ̂) = (n − 1)β i = 1
$∑i=1nχ ( (xi-θ^) /σ^)=(n-1)β$
(2)
where ψ$\psi$ and χ$\chi$ are given functions, and β$\beta$ is a constant, such that σ̂$\stackrel{^}{\sigma }$ is an unbiased estimator when xi${x}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$ has a Normal distribution. Optionally, the second equation can be omitted and the first equation is solved for θ̂$\stackrel{^}{\theta }$ using an assigned value of σ = σc$\sigma ={\sigma }_{c}$.
The values of ψ ((xiθ̂)/(σ̂)) σ̂$\psi \left(\frac{{x}_{i}-\stackrel{^}{\theta }}{\stackrel{^}{\sigma }}\right)\stackrel{^}{\sigma }$ are known as the Winsorized residuals.
The following functions are available for ψ$\psi$ and χ$\chi$ in nag_univar_robust_1var_mestim (g07db).
(a) Null Weights
 ψ(t) = t(t2)/2$\psi \left(t\right)=t\phantom{\frac{{t}^{2}}{2}}$ χ(t) = (t2)/2 $\chi \left(t\right)=\frac{{t}^{2}}{2}$
Use of these null functions leads to the mean and standard deviation of the data.
(b) Huber's Function
 ψ(t) = max ( − c,min (c,t)) (t2)/2 $\psi \left(t\right)=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(-c,\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(c,t\right)\right)\phantom{\frac{{t}^{2}}{2}}$ χ(t) = (‖t‖2)/2 ‖t‖ ≤ d$\chi \left(t\right)=\frac{{‖t‖}^{2}}{2}‖t‖\le d$ χ(t) = (d2)/2 ‖t‖ > d$\chi \left(t\right)=\frac{{d}^{2}}{2}‖t‖>d$
(c) Hampel's Piecewise Linear Function
 ψh1,h2,h3(t) = − ψh1,h2,h3( − t)${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)=-{\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(-t\right)$ ψh1,h2,h3(t) = t(|t|2)/2${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)=t\phantom{\frac{{|t|}^{2}}{2}}$ 0 ≤ t ≤ h1(|t|2)/2$0\le t\le {h}_{1}\phantom{\frac{{|t|}^{2}}{2}}$ χ(t) = (|t|2)/2 |t| ≤ d$\chi \left(t\right)=\frac{{|t|}^{2}}{2}|t|\le d$ ψh1,h2,h3(t) = h1${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)={h}_{1}$ h1 ≤ t ≤ h2${h}_{1}\le t\le {h}_{2}$ ψh1,h2,h3(t) = h1(h3 − t) / (h3 − h2)(|t|2)/2${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)={h}_{1}\left({h}_{3}-t\right)/\left({h}_{3}-{h}_{2}\right)\phantom{\frac{{|t|}^{2}}{2}}$ h2 ≤ t ≤ h3(|t|2)/2${h}_{2}\le t\le {h}_{3}\phantom{\frac{{|t|}^{2}}{2}}$ χ(t) = (d2)/2 |t| > d$\chi \left(t\right)=\frac{{d}^{2}}{2}|t|>d$ ψh1,h2,h3(t) = 0${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)=0$ t > h3$t>{h}_{3}$
(d) Andrew's Sine Wave Function
 ψ(t) = sint(d2)/2$\psi \left(t\right)=\mathrm{sin}t\phantom{\frac{{d}^{2}}{2}}$ − π ≤ t ≤ π(d2)/2$-\pi \le t\le \pi \phantom{\frac{{d}^{2}}{2}}$ χ(t) = (|t|2)/2 |t| ≤ d$\chi \left(t\right)=\frac{{|t|}^{2}}{2}|t|\le d$ ψ(t) = 0(d2)/2$\psi \left(t\right)=0\phantom{\frac{{d}^{2}}{2}}$ otherwise (d2)/2$\phantom{\frac{{d}^{2}}{2}}$ χ(t) = (d2)/2 |t| > d$\chi \left(t\right)=\frac{{d}^{2}}{2}|t|>d$
(e) Tukey's Bi-weight
 ψ(t) = t(1 − t2)2(|t|2)/2$\psi \left(t\right)=t{\left(1-{t}^{2}\right)}^{2}\phantom{\frac{{|t|}^{2}}{2}}$ |t| ≤ 1(|t|2)/2$|t|\le 1\phantom{\frac{{|t|}^{2}}{2}}$ χ(t) = (|t|2)/2 |t| ≤ d$\chi \left(t\right)=\frac{{|t|}^{2}}{2}|t|\le d$ ψ(t) = t(1 − t2)2 = 0(|t|2)/2$\psi \left(t\right)=t{\left(1-{t}^{2}\right)}^{2}=0\phantom{\frac{{|t|}^{2}}{2}}$ otherwise (|t|2)/2$\phantom{\frac{{|t|}^{2}}{2}}$ χ(t) = (d2)/2 |t| > d$\chi \left(t\right)=\frac{{d}^{2}}{2}|t|>d$
where c$c$, h1${h}_{1}$, h2${h}_{2}$, h3${h}_{3}$ and d$d$ are constants.
Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:
σ̂k =
 sqrt(1/(β(n − 1)) (n ) ∑ χ((xi − θ̂k − 1)/(σ̂k − 1))i = 1 σ̂k − 12)
$σ^k=1β(n-1) (∑i=1nχ (xi-θ^k-1σ^k-1) ) σ^k-12$
and
 n θ̂k = θ̂k − 1 + 1/n ∑ ψ((xi − θ̂k − 1)/(σ̂k))σ̂k i = 1
$θ^k=θ^k- 1+1n∑i= 1nψ (xi-θ^k- 1σ^k) σ^k$
or
 σ̂k = σc,   if  σ  is fixed. $σ^k=σc, if σ is fixed.$
The initial values for θ̂$\stackrel{^}{\theta }$ and σ̂$\stackrel{^}{\sigma }$ may either be user-supplied or calculated within nag_univar_robust_1var_mestim (g07db) as the sample median and an estimate of σ$\sigma$ based on the median absolute deviation respectively.
nag_univar_robust_1var_mestim (g07db) is based upon function LYHALG within the ROBETH library, see Marazzi (1987).

## References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## Parameters

### Compulsory Input Parameters

1:     isigma – int64int32nag_int scalar
The value assigned to isigma determines whether σ̂$\stackrel{^}{\sigma }$ is to be simultaneously estimated.
isigma = 0${\mathbf{isigma}}=0$
The estimation of σ̂$\stackrel{^}{\sigma }$ is bypassed and sigma is set equal to σc${\sigma }_{c}$.
isigma = 1${\mathbf{isigma}}=1$
σ̂$\stackrel{^}{\sigma }$ is estimated simultaneously.
2:     x(n) – double array
n, the dimension of the array, must satisfy the constraint n > 1${\mathbf{n}}>1$.
The vector of observations, x1,x2,,xn${x}_{1},{x}_{2},\dots ,{x}_{n}$.
3:     ipsi – int64int32nag_int scalar
Which ψ$\psi$ function is to be used.
ipsi = 0${\mathbf{ipsi}}=0$
ψ(t) = t$\psi \left(t\right)=t$.
ipsi = 1${\mathbf{ipsi}}=1$
Huber's function.
ipsi = 2${\mathbf{ipsi}}=2$
Hampel's piecewise linear function.
ipsi = 3${\mathbf{ipsi}}=3$
Andrew's sine wave,
ipsi = 4${\mathbf{ipsi}}=4$
Tukey's bi-weight.
4:     c – double scalar
If ipsi = 1${\mathbf{ipsi}}=1$, c must specify the parameter, c$c$, of Huber's ψ$\psi$ function. c is not referenced if ipsi1${\mathbf{ipsi}}\ne 1$.
Constraint: if ipsi = 1${\mathbf{ipsi}}=1$, c > 0.0${\mathbf{c}}>0.0$.
5:     h1 – double scalar
6:     h2 – double scalar
7:     h3 – double scalar
If ipsi = 2${\mathbf{ipsi}}=2$, h1, h2 and h3 must specify the parameters, h1${h}_{1}$, h2${h}_{2}$, and h3${h}_{3}$, of Hampel's piecewise linear ψ$\psi$ function. h1, h2 and h3 are not referenced if ipsi2${\mathbf{ipsi}}\ne 2$.
Constraint: 0h1h2h3$0\le {\mathbf{h1}}\le {\mathbf{h2}}\le {\mathbf{h3}}$ and h3 > 0.0${\mathbf{h3}}>0.0$ if ipsi = 2${\mathbf{ipsi}}=2$.
8:     dchi – double scalar
d$d$, the parameter of the χ$\chi$ function. dchi is not referenced if ipsi = 0${\mathbf{ipsi}}=0$.
Constraint: if ipsi0${\mathbf{ipsi}}\ne 0$, dchi > 0.0${\mathbf{dchi}}>0.0$.
9:     theta – double scalar
If sigma > 0${\mathbf{sigma}}>0$ then theta must be set to the required starting value of the estimation of the location parameter θ̂$\stackrel{^}{\theta }$. A reasonable initial value for θ̂$\stackrel{^}{\theta }$ will often be the sample mean or median.
10:   sigma – double scalar
The role of sigma depends on the value assigned to isigma, as follows:
• if isigma = 1${\mathbf{isigma}}=1$, sigma must be assigned a value which determines the values of the starting points for the calculations of θ̂$\stackrel{^}{\theta }$ and σ̂$\stackrel{^}{\sigma }$. If sigma0.0${\mathbf{sigma}}\le 0.0$ then nag_univar_robust_1var_mestim (g07db) will determine the starting points of θ̂$\stackrel{^}{\theta }$ and σ̂$\stackrel{^}{\sigma }$. Otherwise the value assigned to sigma will be taken as the starting point for σ̂$\stackrel{^}{\sigma }$, and theta must be assigned a value before entry, see above;
• if isigma = 0${\mathbf{isigma}}=0$, sigma must be assigned a value which determines the value of σc${\sigma }_{c}$, which is held fixed during the iterations, and the starting value for the calculation of θ̂$\stackrel{^}{\theta }$. If sigma0${\mathbf{sigma}}\le 0$, then nag_univar_robust_1var_mestim (g07db) will determine the value of σc${\sigma }_{c}$ as the median absolute deviation adjusted to reduce bias (see nag_univar_robust_1var_median (g07da)) and the starting point for θ̂$\stackrel{^}{\theta }$. Otherwise, the value assigned to sigma will be taken as the value of σc${\sigma }_{c}$ and theta must be assigned a relevant value before entry, see above.
11:   tol – double scalar
The relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than tol × max (1.0,σk1)${\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1.0,{\sigma }_{k-1}\right)$.
Constraint: tol > 0.0${\mathbf{tol}}>0.0$.

### Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array x.
n$n$, the number of observations.
Constraint: n > 1${\mathbf{n}}>1$.
2:     maxit – int64int32nag_int scalar
The maximum number of iterations that should be used during the estimation.
Default: 50$50$
Constraint: maxit > 0${\mathbf{maxit}}>0$.

None.

### Output Parameters

1:     theta – double scalar
The M$M$-estimate of the location parameter, θ̂$\stackrel{^}{\theta }$.
2:     sigma – double scalar
Contains the M$M$-estimate of the scale parameter, σ̂$\stackrel{^}{\sigma }$, if isigma was assigned the value 1$1$ on entry, otherwise sigma will contain the initial fixed value σc${\sigma }_{c}$.
3:     rs(n) – double array
The Winsorized residuals.
4:     nit – int64int32nag_int scalar
The number of iterations that were used during the estimation.
5:     wrk(n) – double array
If sigma0.0${\mathbf{sigma}}\le 0.0$ on entry, wrk will contain the n$n$ observations in ascending order.
6:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, n ≤ 1${\mathbf{n}}\le 1$, or maxit ≤ 0${\mathbf{maxit}}\le 0$, or tol ≤ 0.0${\mathbf{tol}}\le 0.0$, or isigma ≠ 0${\mathbf{isigma}}\ne 0$ or 1$1$, or ipsi < 0${\mathbf{ipsi}}<0$, or ipsi > 4${\mathbf{ipsi}}>4$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, c ≤ 0.0${\mathbf{c}}\le 0.0$ and ipsi = 1${\mathbf{ipsi}}=1$, or h1 < 0.0${\mathbf{h1}}<0.0$ and ipsi = 2${\mathbf{ipsi}}=2$, or h1 = h2 = h3 = 0.0${\mathbf{h1}}={\mathbf{h2}}={\mathbf{h3}}=0.0$ and ipsi = 2${\mathbf{ipsi}}=2$, or h1 > h2${\mathbf{h1}}>{\mathbf{h2}}$ and ipsi = 2${\mathbf{ipsi}}=2$, or h1 > h3${\mathbf{h1}}>{\mathbf{h3}}$ and ipsi = 2${\mathbf{ipsi}}=2$, or h2 > h3${\mathbf{h2}}>{\mathbf{h3}}$ and ipsi = 2${\mathbf{ipsi}}=2$, or dchi ≤ 0.0${\mathbf{dchi}}\le 0.0$ and ipsi ≠ 0${\mathbf{ipsi}}\ne 0$.
ifail = 3${\mathbf{ifail}}=3$
 On entry, all elements of the input array x are equal.
ifail = 4${\mathbf{ifail}}=4$
sigma, the current estimate of σ$\sigma$, is zero or negative. This error exit is very unlikely, although it may be caused by too large an initial value of sigma.
ifail = 5${\mathbf{ifail}}=5$
The number of iterations required exceeds maxit.
ifail = 6${\mathbf{ifail}}=6$
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the isigma = 0${\mathbf{isigma}}=0$ option with a redescending ψ$\psi$ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of σ$\sigma$ is too small, then the standardized residuals (xiθ̂k)/(σc) $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{{\sigma }_{c}}$, will be large and all the residuals may fall into the region for which ψ(t) = 0$\psi \left(t\right)=0$. This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of σc${\sigma }_{c}$ or with isigma = 1${\mathbf{isigma}}=1$.

## Accuracy

On successful exit the accuracy of the results is related to the value of tol, see Section [Parameters].

When you supply the initial values, care has to be taken over the choice of the initial value of σ$\sigma$. If too small a value of σ$\sigma$ is chosen then initial values of the standardized residuals (xiθ̂k)/σ $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{\sigma }$ will be large. If the redescending ψ$\psi$ functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of Hampel et al. (1986).

## Example

```function nag_univar_robust_1var_mestim_example
isigma = int64(1);
x = [13;
11;
16;
5;
3;
18;
9;
8;
6;
27;
7];
ipsi = int64(2);
c = 0;
h1 = 1.5;
h2 = 3;
h3 = 4.5;
dchi = 1.5;
theta = 0;
sigma = -1;
tol = 0.0001;
[thetaOut, sigmaOut, rs, nit, wrk, ifail] = ...
nag_univar_robust_1var_mestim(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol)
```
```

thetaOut =

10.5487

sigmaOut =

6.3247

rs =

2.4513
0.4513
5.4513
-5.5487
-7.5487
7.4513
-1.5487
-2.5487
-4.5487
16.4513
-3.5487

nit =

8

wrk =

3
5
6
7
8
9
11
13
16
18
27

ifail =

0

```
```function g07db_example
isigma = int64(1);
x = [13;
11;
16;
5;
3;
18;
9;
8;
6;
27;
7];
ipsi = int64(2);
c = 0;
h1 = 1.5;
h2 = 3;
h3 = 4.5;
dchi = 1.5;
theta = 0;
sigma = -1;
tol = 0.0001;
[thetaOut, sigmaOut, rs, nit, wrk, ifail] = ...
g07db(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol)
```
```

thetaOut =

10.5487

sigmaOut =

6.3247

rs =

2.4513
0.4513
5.4513
-5.5487
-7.5487
7.4513
-1.5487
-2.5487
-4.5487
16.4513
-3.5487

nit =

8

wrk =

3
5
6
7
8
9
11
13
16
18
27

ifail =

0

```