# NAG CL Interfaceg07dbc (robust_​1var_​mestim)

Settings help

CL Name Style:

## 1Purpose

g07dbc computes an $M$-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

## 2Specification

 #include
 void g07dbc (Nag_SigmaSimulEst sigma_est, Integer n, const double x[], Nag_PsiFun psifun, double c, double h1, double h2, double h3, double dchi, double *theta, double *sigma, Integer maxit, double tol, double rs[], Integer *nit, double sorted_x[], NagError *fail)
The function may be called by the names: g07dbc, nag_univar_robust_1var_mestim or nag_robust_m_estim_1var.

## 3Description

The data consists of a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$.
The ${x}_{i}$ are assumed to be independent with an unknown distribution function of the form
 $F (( x i -θ)/σ)$
where $\theta$ is a location argument, and $\sigma$ is a scale argument. $M$-estimators of $\theta$ and $\sigma$ are given by the solution to the following system of equations:
 $∑ i=1 n ψ (( x i - θ ^)/ σ ^) = 0$ (1)
 $∑ i=1 n χ (( x i - θ ^)/ σ ^) = (n-1) β$ (2)
where $\psi$ and $\chi$ are given functions, and $\beta$ is a constant, such that $\stackrel{^}{\sigma }$ is an unbiased estimator when ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$, has a normal distribution. Optionally, the second equation can be omitted and the first equation is solved for $\stackrel{^}{\theta }$ using an assigned value of $\sigma ={\sigma }_{c}$.
The values of $\psi \left(\frac{{x}_{i}-\stackrel{^}{\theta }}{\stackrel{^}{\sigma }}\right)\stackrel{^}{\sigma }$ are known as the Winsorized residuals.
The following functions are available for $\psi$ and $\chi$ in g07dbc:
1. (a)Null Weights
 $\psi \left(t\right)=t$ $\chi \left(t\right)=\frac{{t}^{2}}{2}$
Use of these null functions leads to the mean and standard deviation of the data.
2. (b)Huber's Function
 $\psi \left(t\right)=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(-c,\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left(c,t\right)\right)$ $\chi \left(t\right)=\frac{{|t|}^{2}}{2}$ $|t|\le d$ $\chi \left(t\right)=\frac{{d}^{2}}{2}$ $|t|>d$
3. (c)Hampel's Piecewise Linear Function
 ${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)=-{\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(-t\right)$ ${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)=t$ $0\le t\le {h}_{1}$ ${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)={h}_{1}$ ${h}_{1}\le t\le {h}_{2}$ ${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)={h}_{1}\left({h}_{3}-t\right)/\left({h}_{3}-{h}_{2}\right)$ ${h}_{2}\le t\le {h}_{3}$ ${\psi }_{{h}_{1},{h}_{2},{h}_{3}}\left(t\right)=0$ $t>{h}_{3}$ $\chi \left(t\right)=\frac{{|t|}^{2}}{2}$ $|t|\le d$ $\chi \left(t\right)=\frac{{d}^{2}}{2}$ $|t|>d$
4. (d)Andrew's Sine Wave Function
 $\psi \left(t\right)=\mathrm{sin}t$ $-\pi \le t\le \pi$ $\psi \left(t\right)=0$ otherwise $\chi \left(t\right)=\frac{{|t|}^{2}}{2}$ $|t|\le d$ $\chi \left(t\right)=\frac{{d}^{2}}{2}$ $|t|>d$
5. (e)Tukey's Bi-weight
 $\psi \left(t\right)=t{\left(1-{t}^{2}\right)}^{2}$ $|t|\le 1$ $\psi \left(t\right)=0$ otherwise $\chi \left(t\right)=\frac{{|t|}^{2}}{2}$ $|t|\le d$ $\chi \left(t\right)=\frac{{d}^{2}}{2}$ $|t|>d$
where $c$, ${h}_{1}$, ${h}_{2}$, ${h}_{3}$ and $d$ are constants.
Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:
 $σ ^ k = 1 β (n-1) ( ∑ i=1 n χ( x i - θ ^ k-1 σ ^ k-1 )) σ ^ k-1 2$
and
 $θ ^ k = θ ^ k-1 + 1 n ∑ i=1 n ψ ( x i - θ ^ k-1 σ ^ k ) σ ^ k$
or
 $σ ^ k = σ c , if ​ σ ​ is fixed.$
The initial values for $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$ may either be user-supplied or calculated within g07dbc as the sample median and an estimate of $\sigma$ based on the median absolute deviation respectively.
g07dbc is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).

## 4References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## 5Arguments

1: $\mathbf{sigma_est}$Nag_SigmaSimulEst Input
On entry: the value assigned to sigma_est determines whether $\stackrel{^}{\sigma }$ is to be simultaneously estimated.
${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$
The estimation of $\stackrel{^}{\sigma }$ is bypassed and sigma is set equal to ${\sigma }_{c}$;
${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$
$\stackrel{^}{\sigma }$ is estimated simultaneously.
Constraint: ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$ or $\mathrm{Nag_SigmaSimul}$.
2: $\mathbf{n}$Integer Input
On entry: the number of observations, $n$.
Constraint: ${\mathbf{n}}>1$.
3: $\mathbf{x}\left[{\mathbf{n}}\right]$const double Input
On entry: the vector of observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$.
4: $\mathbf{psifun}$Nag_PsiFun Input
On entry: which $\psi$ function is to be used.
${\mathbf{psifun}}=\mathrm{Nag_Lsq}$
 $ψ (t) = t .$
${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$
Huber's function.
${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$
Hampel's piecewise linear function.
${\mathbf{psifun}}=\mathrm{Nag_AndrewFun}$
Andrew's sine wave.
${\mathbf{psifun}}=\mathrm{Nag_TukeyFun}$
Tukey's bi-weight.
Constraint: ${\mathbf{psifun}}=\mathrm{Nag_Lsq}$, $\mathrm{Nag_HuberFun}$, $\mathrm{Nag_HampelFun}$, $\mathrm{Nag_AndrewFun}$ or $\mathrm{Nag_TukeyFun}$.
5: $\mathbf{c}$double Input
On entry: must specify the argument, $c$, of Huber's $\psi$ function, if ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$. c is not referenced if ${\mathbf{psifun}}\ne \mathrm{Nag_HuberFun}$.
Constraint: ${\mathbf{c}}>0.0$ if ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$.
6: $\mathbf{h1}$double Input
7: $\mathbf{h2}$double Input
8: $\mathbf{h3}$double Input
On entry: h1, h2, and h3 must specify the arguments ${h}_{1}$, ${h}_{2}$, and ${h}_{3}$, of Hampel's piecewise linear $\psi$ function, if ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$. h1, h2, and h3 are not referenced if ${\mathbf{psifun}}\ne \mathrm{Nag_HampelFun}$.
Constraint: $0\le {\mathbf{h1}}\le {\mathbf{h2}}\le {\mathbf{h3}}$ and ${\mathbf{h3}}>0.0$ if ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
9: $\mathbf{dchi}$double Input
On entry: the argument, $d$, of the $\chi$ function. dchi is not referenced if ${\mathbf{psifun}}=\mathrm{Nag_Lsq}$.
Constraint: ${\mathbf{dchi}}>0.0$ if ${\mathbf{psifun}}\ne \mathrm{Nag_Lsq}$.
10: $\mathbf{theta}$double * Input/Output
On entry: if ${\mathbf{sigma}}>0$ then theta must be set to the required starting value of the estimation of the location argument $\stackrel{^}{\theta }$. A reasonable initial value for $\stackrel{^}{\theta }$ will often be the sample mean or median.
On exit: the $M$-estimate of the location argument, $\stackrel{^}{\theta }$.
11: $\mathbf{sigma}$double * Input/Output
The role of sigma depends on the value assigned to sigma_est (see above) as follows.
If ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$:
On entry: sigma must be assigned a value which determines the values of the starting points for the calculations of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. If ${\mathbf{sigma}}\le 0.0$ then g07dbc will determine the starting points of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. Otherwise the value assigned to sigma will be taken as the starting point for $\stackrel{^}{\sigma }$, and theta must be assigned a value before entry, see above.
If ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$:
On entry: sigma must be assigned a value which determines the value of ${\sigma }_{c}$, which is held fixed during the iterations, and the starting value for the calculation of $\stackrel{^}{\theta }$. If ${\mathbf{sigma}}\le 0$, then g07dbc will determine the value of ${\sigma }_{c}$ as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for $\stackrel{^}{\theta }$. Otherwise, the value assigned to sigma will be taken as the value of ${\sigma }_{c}$ and theta must be assigned a relevant value before entry, see above.
On exit: sigma contains the $M$ – estimate of the scale argument, $\stackrel{^}{\sigma }$, if ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$ on entry, otherwise sigma will contain the initial fixed value ${\sigma }_{c}$.
12: $\mathbf{maxit}$Integer Input
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value: p ${\mathbf{maxit}}=50$.
Constraint: ${\mathbf{maxit}}>0$.
13: $\mathbf{tol}$double Input
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than ${\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1.0,{\sigma }_{k-1}\right)$.
Constraint: ${\mathbf{tol}}>0.0$.
14: $\mathbf{rs}\left[{\mathbf{n}}\right]$double Output
On exit: the Winsorized residuals.
15: $\mathbf{nit}$Integer * Output
On exit: the number of iterations that were used during the estimation.
16: $\mathbf{sorted_x}\left[{\mathbf{n}}\right]$double Output
On exit: if ${\mathbf{sigma}}\le 0.0$ on entry, sorted_x will contain the $n$ observations in ascending order.
17: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

## 6Error Indicators and Warnings

NE_2_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{h1}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{h2}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{psifun}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{h1}}\le {\mathbf{h2}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
On entry, ${\mathbf{h1}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{h3}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{psifun}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{h1}}\le {\mathbf{h3}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
On entry, ${\mathbf{h2}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{h3}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{psifun}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{h2}}\le {\mathbf{h3}}$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_3_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{h1}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{h2}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{h3}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{psifun}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{h1}}={\mathbf{h2}}$=${\mathbf{h3}}\ne 0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_ALL_ELEMENTS_EQUAL
On entry, all the values in the array x must not be equal.
On entry, argument psifun had an illegal value.
On entry, argument sigma_est had an illegal value.
NE_ESTIM_SIGMA_ZERO
The estimated value of sigma was $\le 0.0$ during an iteration.
NE_INT_ARG_LE
On entry, ${\mathbf{maxit}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{maxit}}>0$.
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}>1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARG_LE
On entry, tol must not be less than or equal to 0.0: ${\mathbf{tol}}=⟨\mathit{\text{value}}⟩$.
NE_REAL_ENUM_ARG_CONS
On entry, ${\mathbf{c}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{psifun}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{c}}>0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HuberFun}$.
On entry, ${\mathbf{dchi}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{psifun}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{dchi}}>0.0$, ${\mathbf{psifun}}\ne \mathrm{Nag_Lsq}$.
On entry, ${\mathbf{h1}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{psifun}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{h1}}\ge 0.0$, ${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$.
NE_TOO_MANY
Too many iterations ($⟨\mathit{\text{value}}⟩$ ).
NE_WINS_RES_ZERO
The Winsorized residuals are zero.
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaBypas}$ option with a redescending $\psi$ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of $\sigma$ is too small, then the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{{\sigma }_{c}}$, will be large and all the residuals may fall into the region for which $\psi \left(t\right)=0$. This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of ${\sigma }_{c}$ or with ${\mathbf{sigma_est}}=\mathrm{Nag_SigmaSimul}$.

## 7Accuracy

On successful exit the accuracy of the results is related to the value of TOL, see Section 4.

## 8Parallelism and Performance

g07dbc is not threaded in any implementation.

When you supply the initial values, care has to be taken over the choice of the initial value of $\sigma$. If too small a value of $\sigma$ is chosen then initial values of the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{\sigma }$ will be large. If the redescending $\psi$ functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see Hampel et al. (1986).

## 10Example

The following program reads in a set of data consisting of eleven observations of a variable $X$.
For this example, Hampels's Piecewise Linear Function is used (${\mathbf{psifun}}=\mathrm{Nag_HampelFun}$), values for ${h}_{1}$, ${h}_{2}$ and ${h}_{3}$ along with $d$ for the $\chi$ function, being read from the data file.
Using the following starting values various estimates of $\theta$ and $\sigma$ are calculated and printed along with the number of iterations used:
1. (a)g07dbc determines the starting values, $\sigma$ is estimated simultaneously.
2. (b)You supply the starting values, $\sigma$ is estimated simultaneously.
3. (c)g07dbc determines the starting values, $\sigma$ is fixed.
4. (d)You supply the starting values, $\sigma$ is fixed.

### 10.1Program Text

Program Text (g07dbce.c)

### 10.2Program Data

Program Data (g07dbce.d)

### 10.3Program Results

Program Results (g07dbce.r)