g07 Chapter Contents
g07 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_robust_m_estim_1var (g07dbc)

## 1  Purpose

nag_robust_m_estim_1var (g07dbc) computes an $M$-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

## 2  Specification

 #include #include
 void nag_robust_m_estim_1var (Nag_SigmaSimulEst sigma_est, Integer n, const double x[], Nag_PsiFun psifun, double c, double h1, double h2, double h3, double dchi, double *theta, double *sigma, Integer maxit, double tol, double rs[], Integer *nit, double sorted_x[], NagError *fail)

## 3  Description

The data consists of a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$.
The ${x}_{i}$ are assumed to be independent with an unknown distribution function of the form
 $F x i - θ / σ$
where $\theta$ is a location argument, and $\sigma$ is a scale argument. $M$-estimators of $\theta$ and $\sigma$ are given by the solution to the following system of equations:
 $∑ i=1 n ψ x i - θ ^ / σ ^ = 0$ (1)
 $∑ i=1 n χ x i - θ ^ / σ ^ = n-1 β$ (2)
where $\psi$ and $\chi$ are given functions, and $\beta$ is a constant, such that $\stackrel{^}{\sigma }$ is an unbiased estimator when ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$, has a normal distribution. Optionally, the second equation can be omitted and the first equation is solved for $\stackrel{^}{\theta }$ using an assigned value of $\sigma ={\sigma }_{c}$.
The values of $\psi \left(\frac{{x}_{i}-\stackrel{^}{\theta }}{\stackrel{^}{\sigma }}\right)\stackrel{^}{\sigma }$ are known as the Winsorized residuals.
The following functions are available for $\psi$ and $\chi$ in nag_robust_m_estim_1var (g07dbc):
(a) Null Weights
 $\psi \left(t\right)$ $\text{}=t$ $\chi \left(t\right)=\frac{{t}^{2}}{2}$
Use of these null functions leads to the mean and standard deviation of the data.
(b) Huber's Function
 $\psi \left(t\right)$ $\text{}=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(-c,\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(c,t\right)\right)$ $\chi \left(t\right)=\frac{{‖t‖}^{2}}{2}‖t‖\le d$ $\chi \left(t\right)=\frac{{d}^{2}}{2}‖t‖>d$
(c) Hampel's Piecewise Linear Function
 ${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)$ $\text{}=-{\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(-t\right)$ $\text{}=t$ $0\le t\le {h}_{1}$ $\chi \left(t\right)=\frac{{\left|t\right|}^{2}}{2}\left|t\right|\le d$ $\text{}={h}_{1}$ ${h}_{1}\le t\le {h}_{2}$ $\text{}={h}_{1}\left({h}_{3}-t\right)/\left({h}_{3}-{h}_{2}\right)$ ${h}_{2}\le t\le {h}_{3}$ $\chi \left(t\right)=\frac{{d}^{2}}{2}\left|t\right|>d$ $\text{}=0$ $t>{h}_{3}$
(d) Andrew's Sine Wave Function
 $\psi \left(t\right)$ $\text{}=\mathrm{sin}t$ $-\pi \le t\le \pi$ $\chi \left(t\right)=\frac{{\left|t\right|}^{2}}{2}\left|t\right|\le d$ $\text{}=0$ otherwise $\chi \left(t\right)=\frac{{d}^{2}}{2}\left|t\right|>d$
(e) Tukey's Bi-weight
 $\psi \left(t\right)$ $\text{}=t{\left(1-{t}^{2}\right)}^{2}$ $\left|t\right|\le 1$ $\chi \left(t\right)=\frac{{\left|t\right|}^{2}}{2}\left|t\right|\le d$ $\text{}=0$ otherwise $\chi \left(t\right)=\frac{{d}^{2}}{2}\left|t\right|>d$
where $c$, ${h}_{1}$, ${h}_{2}$, ${h}_{3}$ and $d$ are constants.
Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:
 $σ ^ k = 1 β n-1 ∑ i=1 n χ x i - θ ^ k-1 σ ^ k-1 σ ^ k-1 2$
and
 $θ ^ k = θ ^ k-1 + 1 n ∑ i=1 n ψ x i - θ ^ k-1 σ ^ k σ ^ k$
or
 $σ ^ k = σ c , if ​ σ ​ is fixed.$
The initial values for $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$ may either be user-supplied or calculated within nag_robust_m_estim_1var (g07dbc) as the sample median and an estimate of $\sigma$ based on the median absolute deviation respectively.
nag_robust_m_estim_1var (g07dbc) is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).

## 4  References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## 5  Arguments

1:     sigma_estNag_SigmaSimulEstInput
On entry: the value assigned to sigma_est determines whether $\stackrel{^}{\sigma }$ is to be simultaneously estimated.
${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$
The estimation of $\stackrel{^}{\sigma }$ is bypassed and sigma is set equal to ${\sigma }_{c}$;
${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$
$\stackrel{^}{\sigma }$ is estimated simultaneously.
Constraint: ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$ or $\mathrm{Nag_SigmaSimul}$.
2:     nIntegerInput
On entry: the number of observations, $n$.
Constraint: ${\mathbf{n}}>1$.
3:     x[n]const doubleInput
On entry: the vector of observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$.
4:     psifunNag_PsiFunInput
On entry: which $\psi$ function is to be used.
${\mathbf{psifun}}=\mathrm{Nag_Lsq}$
 $ψ t = t .$
${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$
Huber's function.
${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$
Hampel's piecewise linear function.
${\mathbf{psifun}}=\mathrm{Nag_AndrewFun}$
Andrew's sine wave.
${\mathbf{psifun}}=\mathrm{Nag_TukeyFun}$
Tukey's bi-weight.
Constraint: ${\mathbf{psifun}}=\mathrm{Nag_Lsq}$, $\mathrm{Nag_HuberFun}$, $\mathrm{Nag_HampelFun}$, $\mathrm{Nag_AndrewFun}$ or $\mathrm{Nag_TukeyFun}$.
5:     cdoubleInput
On entry: must specify the argument, $c$, of Huber's $\psi$ function, if ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$. c is not referenced if ${\mathbf{psifun}}\ne \mathrm{Nag_HuberFun}$.
Constraint: ${\mathbf{c}}>0.0$ if ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$.
6:     h1doubleInput
7:     h2doubleInput
8:     h3doubleInput
On entry: h1, h2, and h3 must specify the arguments ${h}_{1}$, ${h}_{2}$, and ${h}_{3}$, of Hampel's piecewise linear $\psi$ function, if ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$. h1, h2, and h3 are not referenced if ${\mathbf{psifun}}\ne \mathrm{Nag_HampelFun}$.
Constraint: $0\le {\mathbf{h1}}\le {\mathbf{h2}}\le {\mathbf{h3}}$ and ${\mathbf{h3}}>0.0$ if ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
9:     dchidoubleInput
On entry: the argument, $d$, of the $\chi$ function. dchi is not referenced if ${\mathbf{psifun}}=\mathrm{Nag_Lsq}$.
Constraint: ${\mathbf{dchi}}>0.0$ if ${\mathbf{psifun}}\ne \mathrm{Nag_Lsq}$.
On entry: if ${\mathbf{sigma}}>0$ then theta must be set to the required starting value of the estimation of the location argument $\stackrel{^}{\theta }$. A reasonable initial value for $\stackrel{^}{\theta }$ will often be the sample mean or median.
On exit: the $M$-estimate of the location argument, $\stackrel{^}{\theta }$.
The role of sigma depends on the value assigned to sigma_est (see above) as follows.
If ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$:
On entry: sigma must be assigned a value which determines the values of the starting points for the calculations of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. If ${\mathbf{sigma}}\le 0.0$ then nag_robust_m_estim_1var (g07dbc) will determine the starting points of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. Otherwise the value assigned to sigma will be taken as the starting point for $\stackrel{^}{\sigma }$, and theta must be assigned a value before entry, see above.
If ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$:
On entry: sigma must be assigned a value which determines the value of ${\sigma }_{c}$, which is held fixed during the iterations, and the starting value for the calculation of $\stackrel{^}{\theta }$. If ${\mathbf{sigma}}\le 0$, then nag_robust_m_estim_1var (g07dbc) will determine the value of ${\sigma }_{c}$ as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for $\stackrel{^}{\theta }$. Otherwise, the value assigned to sigma will be taken as the value of ${\sigma }_{c}$ and theta must be assigned a relevant value before entry, see above.
On exit: sigma contains the $M$ – estimate of the scale argument, $\stackrel{^}{\sigma }$, if ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$ on entry, otherwise sigma will contain the initial fixed value ${\sigma }_{c}$.
12:   maxitIntegerInput
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value: p ${\mathbf{maxit}}=50$.
Constraint: ${\mathbf{maxit}}>0$.
13:   toldoubleInput
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than ${\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1.0,{\sigma }_{k-1}\right)$.
Constraint: ${\mathbf{tol}}>0.0$.
14:   rs[n]doubleOutput
On exit: the Winsorized residuals.
15:   nitInteger *Output
On exit: the number of iterations that were used during the estimation.
16:   sorted_x[n]doubleOutput
On exit: if ${\mathbf{sigma}}\le 0.0$ on entry, sorted_x will contain the $n$ observations in ascending order.
17:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_2_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{h2}}=〈\mathit{\text{value}}〉$ and ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}\le {\mathbf{h2}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{h3}}=〈\mathit{\text{value}}〉$ and ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}\le {\mathbf{h3}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
On entry, ${\mathbf{h2}}=〈\mathit{\text{value}}〉$, ${\mathbf{h3}}=〈\mathit{\text{value}}〉$ and ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h2}}\le {\mathbf{h3}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_3_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{h2}}=〈\mathit{\text{value}}〉$, ${\mathbf{h3}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}={\mathbf{h2}}$=${\mathbf{h3}}\ne 0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_ALL_ELEMENTS_EQUAL
On entry, all the values in the array x must not be equal.
On entry, argument psifun had an illegal value.
On entry, argument sigma_est had an illegal value.
NE_ESTIM_SIGMA_ZERO
The estimated value of sigma was $\le 0.0$ during an iteration.
NE_INT_ARG_LE
On entry, ${\mathbf{maxit}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{maxit}}>0$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}>1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARG_LE
On entry, tol must not be less than or equal to 0.0: ${\mathbf{tol}}=〈\mathit{\text{value}}〉$.
NE_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{c}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{c}}>0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$.
On entry, ${\mathbf{dchi}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{dchi}}>0.0$, ${\mathbf{psifun}}\ne \mathrm{Nag_Lsq}$.
On entry, ${\mathbf{h1}}=〈\mathit{\text{value}}〉$, ${\mathbf{psifun}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{h1}}\ge 0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_TOO_MANY
Too many iterations ($〈\mathit{\text{value}}〉$ ).
NE_WINS_RES_ZERO
The Winsorized residuals are zero.
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$ option with a redescending $\psi$ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of $\sigma$ is too small, then the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{{\sigma }_{c}}$, will be large and all the residuals may fall into the region for which $\psi \left(t\right)=0$. This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of ${\sigma }_{c}$ or with ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$.

## 7  Accuracy

On successful exit the accuracy of the results is related to the value of TOL, see Section 4.

When you supply the initial values, care has to be taken over the choice of the initial value of $\sigma$. If too small a value of $\sigma$ is chosen then initial values of the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{\sigma }$ will be large. If the redescending $\psi$ functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see Hampel et al. (1986).

## 9  Example

The following program reads in a set of data consisting of eleven observations of a variable $X$.
For this example, Hampels's Piecewise Linear Function is used (${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$), values for ${h}_{1}$, ${h}_{2}$ and ${h}_{3}$ along with $d$ for the $\chi$ function, being read from the data file.
Using the following starting values various estimates of $\theta$ and $\sigma$ are calculated and printed along with the number of iterations used:
 (a) nag_robust_m_estim_1var (g07dbc) determines the starting values, $\sigma$ is estimated simultaneously. (b) You supply the starting values, $\sigma$ is estimated simultaneously. (c) nag_robust_m_estim_1var (g07dbc) determines the starting values, $\sigma$ is fixed. (d) You supply the starting values, $\sigma$ is fixed.

### 9.1  Program Text

Program Text (g07dbce.c)

### 9.2  Program Data

Program Data (g07dbce.d)

### 9.3  Program Results

Program Results (g07dbce.r)