hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_univar_robust_1var_mestim (g07db)

Purpose

nag_univar_robust_1var_mestim (g07db) computes an MM-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

Syntax

[theta, sigma, rs, nit, wrk, ifail] = g07db(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, 'n', n, 'maxit', maxit)
[theta, sigma, rs, nit, wrk, ifail] = nag_univar_robust_1var_mestim(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol, 'n', n, 'maxit', maxit)

Description

The data consists of a sample of size nn, denoted by x1,x2,,xnx1,x2,,xn, drawn from a random variable XX.
The xixi are assumed to be independent with an unknown distribution function of the form
F((xiθ) / σ)
F((xi-θ)/σ)
where θθ is a location parameter, and σσ is a scale parameter. MM-estimators of θθ and σσ are given by the solution to the following system of equations:
n
ψ((xiθ̂) / σ̂) = 0
i = 1
i=1nψ ( (xi-θ^) /σ^)=0
(1)
n
χ((xiθ̂) / σ̂) = (n1)β
i = 1
i=1nχ ( (xi-θ^) /σ^)=(n-1)β
(2)
where ψψ and χχ are given functions, and ββ is a constant, such that σ̂σ^ is an unbiased estimator when xixi, for i = 1,2,,ni=1,2,,n has a Normal distribution. Optionally, the second equation can be omitted and the first equation is solved for θ̂θ^ using an assigned value of σ = σcσ=σc.
The values of ψ ((xiθ̂)/(σ̂)) σ̂ψ ( xi-θ^σ^) σ^ are known as the Winsorized residuals.
The following functions are available for ψψ and χχ in nag_univar_robust_1var_mestim (g07db).
(a) Null Weights
ψ(t) = t(t2)/2ψ(t)=t t22   χ(t) = (t2)/2 χ(t)= t22
Use of these null functions leads to the mean and standard deviation of the data.
(b) Huber's Function
ψ(t) = max (c,min (c,t)) (t2)/2 ψ(t) = max(-c,min(c,t)) t22   χ(t) = (t2)/2 tdχ(t)= t22 td
 
    χ(t) = (d2)/2 t > dχ(t)= d22 t>d
(c) Hampel's Piecewise Linear Function
ψh1,h2,h3(t) = ψh1,h2,h3(t)ψh1,h2,h3(t)=-ψh1,h2,h3(-t)    
 
ψh1,h2,h3(t) = t(|t|2)/2ψh1,h2,h3(t)=t |t|22 0th1(|t|2)/20th1 |t|22 χ(t) = (|t|2)/2 |t|dχ(t)= |t|22 |t|d
 
ψh1,h2,h3(t) = h1ψh1,h2,h3(t)=h1 h1th2h1th2  
 
ψh1,h2,h3(t) = h1(h3t) / (h3h2)(|t|2)/2ψh1,h2,h3(t)=h1(h3-t)/(h3-h2) |t|22 h2th3(|t|2)/2h2th3 |t|22 χ(t) = (d2)/2 |t| > dχ(t)= d22 |t|>d
 
ψh1,h2,h3(t) = 0ψh1,h2,h3(t)=0 t > h3t>h3  
(d) Andrew's Sine Wave Function
ψ(t) = sint(d2)/2ψ(t)=sint d22 πtπ(d2)/2-πtπ d22 χ(t) = (|t|2)/2 |t|dχ(t)= |t|22 |t|d
 
ψ(t) = 0(d2)/2ψ(t)=0 d22 otherwise (d2)/2 d22 χ(t) = (d2)/2 |t| > dχ(t)= d22 |t|>d
(e) Tukey's Bi-weight
ψ(t) = t(1t2)2(|t|2)/2ψ(t)=t (1-t2) 2 |t|22 |t|1(|t|2)/2|t|1 |t|22 χ(t) = (|t|2)/2 |t|dχ(t)= |t|22 |t|d
 
ψ(t) = t(1t2)2 = 0(|t|2)/2ψ(t)=t (1-t2) 2=0 |t|22 otherwise (|t|2)/2 |t|22 χ(t) = (d2)/2 |t| > dχ(t)= d22 |t|>d
where cc, h1h1, h2h2, h3h3 and dd are constants.
Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:
σ̂k =
sqrt(1/(β(n − 1))(n ) ∑ χ((xi − θ̂k − 1)/(σ̂k − 1))i = 1 σ̂k − 12)
σ^k=1β(n-1) (i=1nχ (xi-θ^k-1σ^k-1) ) σ^k-12
and
n
θ̂k = θ̂k 1 + 1/nψ((xiθ̂k 1)/(σ̂k))σ̂k
i = 1
θ^k=θ^k- 1+1ni= 1nψ (xi-θ^k- 1σ^k) σ^k
or
σ̂k = σc,   if  σ  is fixed.
σ^k=σc,   if  σ  is fixed.
The initial values for θ̂θ^ and σ̂σ^ may either be user-supplied or calculated within nag_univar_robust_1var_mestim (g07db) as the sample median and an estimate of σσ based on the median absolute deviation respectively.
nag_univar_robust_1var_mestim (g07db) is based upon function LYHALG within the ROBETH library, see Marazzi (1987).

References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

Parameters

Compulsory Input Parameters

1:     isigma – int64int32nag_int scalar
The value assigned to isigma determines whether σ̂σ^ is to be simultaneously estimated.
isigma = 0isigma=0
The estimation of σ̂σ^ is bypassed and sigma is set equal to σcσc.
isigma = 1isigma=1
σ̂σ^ is estimated simultaneously.
2:     x(n) – double array
n, the dimension of the array, must satisfy the constraint n > 1n>1.
The vector of observations, x1,x2,,xnx1,x2,,xn.
3:     ipsi – int64int32nag_int scalar
Which ψψ function is to be used.
ipsi = 0ipsi=0
ψ(t) = tψ(t)=t.
ipsi = 1ipsi=1
Huber's function.
ipsi = 2ipsi=2
Hampel's piecewise linear function.
ipsi = 3ipsi=3
Andrew's sine wave,
ipsi = 4ipsi=4
Tukey's bi-weight.
4:     c – double scalar
If ipsi = 1ipsi=1, c must specify the parameter, cc, of Huber's ψψ function. c is not referenced if ipsi1ipsi1.
Constraint: if ipsi = 1ipsi=1, c > 0.0c>0.0.
5:     h1 – double scalar
6:     h2 – double scalar
7:     h3 – double scalar
If ipsi = 2ipsi=2, h1, h2 and h3 must specify the parameters, h1h1, h2h2, and h3h3, of Hampel's piecewise linear ψψ function. h1, h2 and h3 are not referenced if ipsi2ipsi2.
Constraint: 0h1h2h30h1h2h3 and h3 > 0.0h3>0.0 if ipsi = 2ipsi=2.
8:     dchi – double scalar
dd, the parameter of the χχ function. dchi is not referenced if ipsi = 0ipsi=0.
Constraint: if ipsi0ipsi0, dchi > 0.0dchi>0.0.
9:     theta – double scalar
If sigma > 0sigma>0 then theta must be set to the required starting value of the estimation of the location parameter θ̂θ^. A reasonable initial value for θ̂θ^ will often be the sample mean or median.
10:   sigma – double scalar
The role of sigma depends on the value assigned to isigma, as follows:
  • if isigma = 1isigma=1, sigma must be assigned a value which determines the values of the starting points for the calculations of θ̂θ^ and σ̂σ^. If sigma0.0sigma0.0 then nag_univar_robust_1var_mestim (g07db) will determine the starting points of θ̂θ^ and σ̂σ^. Otherwise the value assigned to sigma will be taken as the starting point for σ̂σ^, and theta must be assigned a value before entry, see above;
  • if isigma = 0isigma=0, sigma must be assigned a value which determines the value of σcσc, which is held fixed during the iterations, and the starting value for the calculation of θ̂θ^. If sigma0sigma0, then nag_univar_robust_1var_mestim (g07db) will determine the value of σcσc as the median absolute deviation adjusted to reduce bias (see nag_univar_robust_1var_median (g07da)) and the starting point for θ̂θ^. Otherwise, the value assigned to sigma will be taken as the value of σcσc and theta must be assigned a relevant value before entry, see above.
11:   tol – double scalar
The relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than tol × max (1.0,σk1)tol×max(1.0,σk-1).
Constraint: tol > 0.0tol>0.0.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array x.
nn, the number of observations.
Constraint: n > 1n>1.
2:     maxit – int64int32nag_int scalar
The maximum number of iterations that should be used during the estimation.
Default: 5050
Constraint: maxit > 0maxit>0.

Input Parameters Omitted from the MATLAB Interface

None.

Output Parameters

1:     theta – double scalar
The MM-estimate of the location parameter, θ̂θ^.
2:     sigma – double scalar
Contains the MM-estimate of the scale parameter, σ̂σ^, if isigma was assigned the value 11 on entry, otherwise sigma will contain the initial fixed value σcσc.
3:     rs(n) – double array
The Winsorized residuals.
4:     nit – int64int32nag_int scalar
The number of iterations that were used during the estimation.
5:     wrk(n) – double array
If sigma0.0sigma0.0 on entry, wrk will contain the nn observations in ascending order.
6:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
On entry,n1n1,
ormaxit0maxit0,
ortol0.0tol0.0,
orisigma0isigma0 or 11,
oripsi < 0ipsi<0,
oripsi > 4ipsi>4.
  ifail = 2ifail=2
On entry,c0.0c0.0 and ipsi = 1ipsi=1,
orh1 < 0.0h1<0.0 and ipsi = 2ipsi=2,
orh1 = h2 = h3 = 0.0h1=h2=h3=0.0 and ipsi = 2ipsi=2,
orh1 > h2h1>h2 and ipsi = 2ipsi=2,
orh1 > h3h1>h3 and ipsi = 2ipsi=2,
orh2 > h3h2>h3 and ipsi = 2ipsi=2,
ordchi0.0dchi0.0 and ipsi0ipsi0.
  ifail = 3ifail=3
On entry,all elements of the input array x are equal.
  ifail = 4ifail=4
sigma, the current estimate of σσ, is zero or negative. This error exit is very unlikely, although it may be caused by too large an initial value of sigma.
  ifail = 5ifail=5
The number of iterations required exceeds maxit.
  ifail = 6ifail=6
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the isigma = 0isigma=0 option with a redescending ψψ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of σσ is too small, then the standardized residuals (xiθ̂k)/(σc) xi-θ^kσc , will be large and all the residuals may fall into the region for which ψ(t) = 0ψ(t)=0. This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of σcσc or with isigma = 1isigma=1.

Accuracy

On successful exit the accuracy of the results is related to the value of tol, see Section [Parameters].

Further Comments

When you supply the initial values, care has to be taken over the choice of the initial value of σσ. If too small a value of σσ is chosen then initial values of the standardized residuals (xiθ̂k)/σ xi-θ^kσ  will be large. If the redescending ψψ functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of Hampel et al. (1986).

Example

function nag_univar_robust_1var_mestim_example
isigma = int64(1);
x = [13;
     11;
     16;
     5;
     3;
     18;
     9;
     8;
     6;
     27;
     7];
ipsi = int64(2);
c = 0;
h1 = 1.5;
h2 = 3;
h3 = 4.5;
dchi = 1.5;
theta = 0;
sigma = -1;
tol = 0.0001;
[thetaOut, sigmaOut, rs, nit, wrk, ifail] = ...
    nag_univar_robust_1var_mestim(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol)
 

thetaOut =

   10.5487


sigmaOut =

    6.3247


rs =

    2.4513
    0.4513
    5.4513
   -5.5487
   -7.5487
    7.4513
   -1.5487
   -2.5487
   -4.5487
   16.4513
   -3.5487


nit =

                    8


wrk =

     3
     5
     6
     7
     8
     9
    11
    13
    16
    18
    27


ifail =

                    0


function g07db_example
isigma = int64(1);
x = [13;
     11;
     16;
     5;
     3;
     18;
     9;
     8;
     6;
     27;
     7];
ipsi = int64(2);
c = 0;
h1 = 1.5;
h2 = 3;
h3 = 4.5;
dchi = 1.5;
theta = 0;
sigma = -1;
tol = 0.0001;
[thetaOut, sigmaOut, rs, nit, wrk, ifail] = ...
    g07db(isigma, x, ipsi, c, h1, h2, h3, dchi, theta, sigma, tol)
 

thetaOut =

   10.5487


sigmaOut =

    6.3247


rs =

    2.4513
    0.4513
    5.4513
   -5.5487
   -7.5487
    7.4513
   -1.5487
   -2.5487
   -4.5487
   16.4513
   -3.5487


nit =

                    8


wrk =

     3
     5
     6
     7
     8
     9
    11
    13
    16
    18
    27


ifail =

                    0



PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013