NAG CL Interface
g07dbc (robust_​1var_​mestim)

Settings help

CL Name Style:


1 Purpose

g07dbc computes an M -estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

2 Specification

#include <nag.h>
void  g07dbc (Nag_SigmaSimulEst sigma_est, Integer n, const double x[], Nag_PsiFun psifun, double c, double h1, double h2, double h3, double dchi, double *theta, double *sigma, Integer maxit, double tol, double rs[], Integer *nit, double sorted_x[], NagError *fail)
The function may be called by the names: g07dbc, nag_univar_robust_1var_mestim or nag_robust_m_estim_1var.

3 Description

The data consists of a sample of size n , denoted by x 1 , x 2 , , x n , drawn from a random variable X .
The x i are assumed to be independent with an unknown distribution function of the form
F (( x i -θ)/σ)  
where θ is a location argument, and σ is a scale argument. M -estimators of θ and σ are given by the solution to the following system of equations:
i=1 n ψ (( x i - θ ^)/ σ ^) = 0 (1)
i=1 n χ (( x i - θ ^)/ σ ^) = (n-1) β (2)
where ψ and χ are given functions, and β is a constant, such that σ ^ is an unbiased estimator when x i , for i=1,2,,n, has a normal distribution. Optionally, the second equation can be omitted and the first equation is solved for θ ^ using an assigned value of σ = σ c .
The values of ψ ( x i - θ ^ σ ^ ) σ ^ are known as the Winsorized residuals.
The following functions are available for ψ and χ in g07dbc:
  1. (a)Null Weights
    ψ(t)=t
    χ(t) = t2 2
    Use of these null functions leads to the mean and standard deviation of the data.
  2. (b)Huber's Function
    ψ(t) = max(-c,min(c,t))
    χ(t) = |t|2 2 |t|d
    χ(t) = d2 2 |t|>d
  3. (c)Hampel's Piecewise Linear Function
    ψ h1 , h2 , h3 (t) = - ψ h1 , h2 , h3 (-t)
    ψ h1 , h2 , h3 (t) = t 0 t h1
    ψ h1 , h2 , h3 (t) = h1 h1 t h2
    ψ h1, h2, h3 (t) = h1 (h3-t) / (h3-h2) h2 t h3
    ψ h1 , h2 , h3 (t) = 0 t>h3
    χ(t) = |t|2 2 |t|d
    χ(t) = d2 2 |t|>d
  4. (d)Andrew's Sine Wave Function
    ψ(t) = sint -π t π
    ψ(t)=0 otherwise
    χ(t) = |t|22 |t|d
    χ(t) = d22 |t|>d
  5. (e)Tukey's Bi-weight
    ψ(t) = t (1-t2) 2 |t|1
    ψ(t)=0 otherwise
    χ(t) = |t|22 |t|d
    χ(t) = d22 |t|>d
where c , h 1 , h 2 , h 3 and d are constants.
Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:
σ ^ k = 1 β (n-1) ( i=1 n χ( x i - θ ^ k-1 σ ^ k-1 )) σ ^ k-1 2  
and
θ ^ k = θ ^ k-1 + 1 n i=1 n ψ ( x i - θ ^ k-1 σ ^ k ) σ ^ k  
or
σ ^ k = σ c , if ​ σ ​ is fixed.  
The initial values for θ ^ and σ ^ may either be user-supplied or calculated within g07dbc as the sample median and an estimate of σ based on the median absolute deviation respectively.
g07dbc is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).

4 References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

5 Arguments

1: sigma_est Nag_SigmaSimulEst Input
On entry: the value assigned to sigma_est determines whether σ ^ is to be simultaneously estimated.
sigma_est=Nag_SigmaBypas
The estimation of σ ^ is bypassed and sigma is set equal to σ c ;
sigma_est=Nag_SigmaSimul
σ ^ is estimated simultaneously.
Constraint: sigma_est=Nag_SigmaBypas or Nag_SigmaSimul.
2: n Integer Input
On entry: the number of observations, n .
Constraint: n>1 .
3: x[n] const double Input
On entry: the vector of observations, x 1 , x 2 , , x n .
4: psifun Nag_PsiFun Input
On entry: which ψ function is to be used.
psifun=Nag_Lsq
ψ (t) = t .  
psifun=Nag_HuberFun
Huber's function.
psifun=Nag_HampelFun
Hampel's piecewise linear function.
psifun=Nag_AndrewFun
Andrew's sine wave.
psifun=Nag_TukeyFun
Tukey's bi-weight.
Constraint: psifun=Nag_Lsq, Nag_HuberFun, Nag_HampelFun, Nag_AndrewFun or Nag_TukeyFun.
5: c double Input
On entry: must specify the argument, c , of Huber's ψ function, if psifun=Nag_HuberFun. c is not referenced if psifunNag_HuberFun.
Constraint: c>0.0 if psifun=Nag_HuberFun.
6: h1 double Input
7: h2 double Input
8: h3 double Input
On entry: h1, h2, and h3 must specify the arguments h 1 , h 2 , and h 3 , of Hampel's piecewise linear ψ function, if psifun=Nag_HampelFun. h1, h2, and h3 are not referenced if psifunNag_HampelFun.
Constraint: 0 h1 h2 h3 and h3>0.0 if psifun=Nag_HampelFun.
9: dchi double Input
On entry: the argument, d , of the χ function. dchi is not referenced if psifun=Nag_Lsq.
Constraint: dchi>0.0 if psifunNag_Lsq.
10: theta double * Input/Output
On entry: if sigma>0 then theta must be set to the required starting value of the estimation of the location argument θ ^ . A reasonable initial value for θ ^ will often be the sample mean or median.
On exit: the M -estimate of the location argument, θ ^ .
11: sigma double * Input/Output
The role of sigma depends on the value assigned to sigma_est (see above) as follows.
If sigma_est=Nag_SigmaSimul:
On entry: sigma must be assigned a value which determines the values of the starting points for the calculations of θ ^ and σ ^ . If sigma0.0 then g07dbc will determine the starting points of θ ^ and σ ^ . Otherwise the value assigned to sigma will be taken as the starting point for σ ^ , and theta must be assigned a value before entry, see above.
If sigma_est=Nag_SigmaBypas:
On entry: sigma must be assigned a value which determines the value of σ c , which is held fixed during the iterations, and the starting value for the calculation of θ ^ . If sigma0 , then g07dbc will determine the value of σ c as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for θ ^ . Otherwise, the value assigned to sigma will be taken as the value of σ c and theta must be assigned a relevant value before entry, see above.
On exit: sigma contains the M – estimate of the scale argument, σ ^ , if sigma_est=Nag_SigmaSimul on entry, otherwise sigma will contain the initial fixed value σ c .
12: maxit Integer Input
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value: p maxit=50 .
Constraint: maxit>0 .
13: tol double Input
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than tol × max(1.0, σ k-1 ) .
Constraint: tol>0.0 .
14: rs[n] double Output
On exit: the Winsorized residuals.
15: nit Integer * Output
On exit: the number of iterations that were used during the estimation.
16: sorted_x[n] double Output
On exit: if sigma0.0 on entry, sorted_x will contain the n observations in ascending order.
17: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_2_REAL_ENUM_ARG_CONS
On entry, h1=value , h2=value and psifun=value . These arguments must satisfy h1h2 , psifun=Nag_HampelFun.
On entry, h1=value , h3=value and psifun=value . These arguments must satisfy h1h3 , psifun=Nag_HampelFun.
On entry, h2=value , h3=value and psifun=value . These arguments must satisfy h2h3 , psifun=Nag_HampelFun.
NE_3_REAL_ENUM_ARG_CONS
On entry, h1=value , h2=value , h3=value , psifun=value . These arguments must satisfy h1=h2 = h30.0 , psifun=Nag_HampelFun.
NE_ALL_ELEMENTS_EQUAL
On entry, all the values in the array x must not be equal.
NE_BAD_PARAM
On entry, argument psifun had an illegal value.
On entry, argument sigma_est had an illegal value.
NE_ESTIM_SIGMA_ZERO
The estimated value of sigma was 0.0 during an iteration.
NE_INT_ARG_LE
On entry, maxit=value.
Constraint: maxit>0.
On entry, n=value.
Constraint: n>1.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARG_LE
On entry, tol must not be less than or equal to 0.0: tol=value .
NE_REAL_ENUM_ARG_CONS
On entry, c=value , psifun=value . These arguments must satisfy c>0.0 , psifun=Nag_HuberFun.
On entry, dchi=value , psifun=value . These arguments must satisfy dchi>0.0 , psifunNag_Lsq.
On entry, h1=value , psifun=value . These arguments must satisfy h10.0 , psifun=Nag_HampelFun.
NE_TOO_MANY
Too many iterations (value ).
NE_WINS_RES_ZERO
The Winsorized residuals are zero.
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the sigma_est=Nag_SigmaBypas option with a redescending ψ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of σ is too small, then the standardized residuals x i - θ ^ k σ c , will be large and all the residuals may fall into the region for which ψ (t) = 0 . This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of σ c or with sigma_est=Nag_SigmaSimul.

7 Accuracy

On successful exit the accuracy of the results is related to the value of TOL, see Section 4.

8 Parallelism and Performance

g07dbc is not threaded in any implementation.

9 Further Comments

When you supply the initial values, care has to be taken over the choice of the initial value of σ . If too small a value of σ is chosen then initial values of the standardized residuals x i - θ ^ k σ will be large. If the redescending ψ functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see Hampel et al. (1986).

10 Example

The following program reads in a set of data consisting of eleven observations of a variable X .
For this example, Hampels's Piecewise Linear Function is used (psifun=Nag_HampelFun), values for h 1 , h 2 and h 3 along with d for the χ function, being read from the data file.
Using the following starting values various estimates of θ and σ are calculated and printed along with the number of iterations used:
  1. (a)g07dbc determines the starting values, σ is estimated simultaneously.
  2. (b)You supply the starting values, σ is estimated simultaneously.
  3. (c)g07dbc determines the starting values, σ is fixed.
  4. (d)You supply the starting values, σ is fixed.

10.1 Program Text

Program Text (g07dbce.c)

10.2 Program Data

Program Data (g07dbce.d)

10.3 Program Results

Program Results (g07dbce.r)