NAG Library Routine Document
G07DBF
1 Purpose
G07DBF computes an M-estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.
2 Specification
SUBROUTINE G07DBF ( |
ISIGMA, N, X, IPSI, C, H1, H2, H3, DCHI, THETA, SIGMA, MAXIT, TOL, RS, NIT, WRK, IFAIL) |
INTEGER |
ISIGMA, N, IPSI, MAXIT, NIT, IFAIL |
REAL (KIND=nag_wp) |
X(N), C, H1, H2, H3, DCHI, THETA, SIGMA, TOL, RS(N), WRK(N) |
|
3 Description
The data consists of a sample of size n, denoted by x1,x2,…,xn, drawn from a random variable X.
The
xi are assumed to be independent with an unknown distribution function of the form
where
θ is a location parameter, and
σ is a scale parameter.
M-estimators of
θ and
σ are given by the solution to the following system of equations:
where
ψ and
χ are given functions, and
β is a constant, such that
σ^ is an unbiased estimator when
xi, for
i=1,2,…,n has a Normal distribution. Optionally, the second equation can be omitted and the first equation is solved for
θ^ using an assigned value of
σ=σc.
The values of ψ
xi-θ^σ^
σ^ are known as the Winsorized residuals.
The following functions are available for
ψ and
χ in G07DBF.
(a) |
Null Weights
Use of these null functions leads to the mean and standard deviation of the data. |
(b) |
Huber's Function
ψt
=
max-c,minc,t
t22
|
|
χt=
t22 t≤d |
|
|
|
χt=
d22 t>d |
|
(c) |
Hampel's Piecewise Linear Function
ψh1,h2,h3t=-ψh1,h2,h3-t |
|
|
|
ψh1,h2,h3t=t
t22 |
0≤t≤h1
t22 |
χt=
t22 t≤d |
|
ψh1,h2,h3t=h1 |
h1≤t≤h2 |
|
|
ψh1,h2,h3t=h1h3-t/h3-h2
t22 |
h2≤t≤h3
t22 |
χt=
d22 t>d |
|
ψh1,h2,h3t=0 |
t>h3 |
|
|
(d) |
Andrew's Sine Wave Function
ψt=sint
d22 |
-π≤t≤π
d22 |
χt=
t22 t≤d |
|
ψt=0
d22 |
otherwise
d22 |
χt=
d22 t>d |
|
(e) |
Tukey's Bi-weight
ψt=t
1-t2
2
t22 |
t≤1
t22 |
χt=
t22 t≤d |
|
ψt=t
1-t2
2=0
t22 |
otherwise
t22 |
χt=
d22 t>d |
where c, h1, h2, h3 and d are constants. |
Equations
(1) and
(2) are solved by a simple iterative procedure suggested by Huber:
and
or
The initial values for
θ^ and
σ^ may either be user-supplied or calculated within G07DBF as the sample median and an estimate of
σ based on the median absolute deviation respectively.
G07DBF is based upon subroutine LYHALG within the ROBETH library, see
Marazzi (1987).
4 References
Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986)
Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981)
Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH
Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne
5 Parameters
- 1: ISIGMA – INTEGERInput
On entry: the value assigned to
ISIGMA determines whether
σ^ is to be simultaneously estimated.
- ISIGMA=0
- The estimation of σ^ is bypassed and SIGMA is set equal to σc.
- ISIGMA=1
- σ^ is estimated simultaneously.
- 2: N – INTEGERInput
On entry: n, the number of observations.
Constraint:
N>1.
- 3: X(N) – REAL (KIND=nag_wp) arrayInput
On entry: the vector of observations, x1,x2,…,xn.
- 4: IPSI – INTEGERInput
On entry: which
ψ function is to be used.
- IPSI=0
- ψt=t.
- IPSI=1
- Huber's function.
- IPSI=2
- Hampel's piecewise linear function.
- IPSI=3
- Andrew's sine wave,
- IPSI=4
- Tukey's bi-weight.
- 5: C – REAL (KIND=nag_wp)Input
On entry: if
IPSI=1,
C must specify the parameter,
c, of Huber's
ψ function.
C is not referenced if
IPSI≠1.
Constraint:
if IPSI=1, C>0.0.
- 6: H1 – REAL (KIND=nag_wp)Input
- 7: H2 – REAL (KIND=nag_wp)Input
- 8: H3 – REAL (KIND=nag_wp)Input
On entry: if
IPSI=2,
H1,
H2 and
H3 must specify the parameters,
h1,
h2, and
h3, of Hampel's piecewise linear
ψ function.
H1,
H2 and
H3 are not referenced if
IPSI≠2.
Constraint:
0≤H1≤H2≤H3 and H3>0.0 if IPSI=2.
- 9: DCHI – REAL (KIND=nag_wp)Input
On entry:
d, the parameter of the
χ function.
DCHI is not referenced if
IPSI=0.
Constraint:
if IPSI≠0, DCHI>0.0.
- 10: THETA – REAL (KIND=nag_wp)Input/Output
On entry: if
SIGMA>0 then
THETA must be set to the required starting value of the estimation of the location parameter
θ^. A reasonable initial value for
θ^ will often be the sample mean or median.
On exit: the M-estimate of the location parameter, θ^.
- 11: SIGMA – REAL (KIND=nag_wp)Input/Output
On entry: the role of
SIGMA depends on the value assigned to
ISIGMA, as follows:
- if ISIGMA=1, SIGMA must be assigned a value which determines the values of the starting points for the calculations of θ^ and σ^. If SIGMA≤0.0 then G07DBF will determine the starting points of θ^ and σ^. Otherwise the value assigned to SIGMA will be taken as the starting point for σ^, and THETA must be assigned a value before entry, see above;
- if ISIGMA=0, SIGMA must be assigned a value which determines the value of σc, which is held fixed during the iterations, and the starting value for the calculation of θ^. If SIGMA≤0, then G07DBF will determine the value of σc as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for θ^. Otherwise, the value assigned to SIGMA will be taken as the value of σc and THETA must be assigned a relevant value before entry, see above.
On exit: contains the
M-estimate of the scale parameter,
σ^, if
ISIGMA was assigned the value
1 on entry, otherwise
SIGMA will contain the initial fixed value
σc.
- 12: MAXIT – INTEGERInput
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value:
MAXIT=50.
Constraint:
MAXIT>0.
- 13: TOL – REAL (KIND=nag_wp)Input
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for
THETA, and
SIGMA are less than
TOL×max1.0,σk-1.
Constraint:
TOL>0.0.
- 14: RS(N) – REAL (KIND=nag_wp) arrayOutput
On exit: the Winsorized residuals.
- 15: NIT – INTEGEROutput
On exit: the number of iterations that were used during the estimation.
- 16: WRK(N) – REAL (KIND=nag_wp) arrayOutput
On exit: if
SIGMA≤0.0 on entry,
WRK will contain the
n observations in ascending order.
- 17: IFAIL – INTEGERInput/Output
-
On entry:
IFAIL must be set to
0,
-1 or 1. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
-1 or 1 is recommended. If the output of error messages is undesirable, then the value
1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
0.
When the value -1 or 1 is used it is essential to test the value of IFAIL on exit.
On exit:
IFAIL=0 unless the routine detects an error or a warning has been flagged (see
Section 6).
6 Error Indicators and Warnings
If on entry
IFAIL=0 or
-1, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
Errors or warnings detected by the routine:
- IFAIL=1
On entry, | N≤1, |
or | MAXIT≤0, |
or | TOL≤0.0, |
or | ISIGMA≠0 or 1, |
or | IPSI<0, |
or | IPSI>4. |
- IFAIL=2
On entry, | C≤0.0 and IPSI=1, |
or | H1<0.0 and IPSI=2, |
or | H1=H2=H3=0.0 and IPSI=2, |
or | H1>H2 and IPSI=2, |
or | H1>H3 and IPSI=2, |
or | H2>H3 and IPSI=2, |
or | DCHI≤0.0 and IPSI≠0. |
- IFAIL=3
-
On entry, | all elements of the input array X are equal. |
- IFAIL=4
-
SIGMA, the current estimate of
σ, is zero or negative. This error exit is very unlikely, although it may be caused by too large an initial value of
SIGMA.
- IFAIL=5
-
The number of iterations required exceeds
MAXIT.
- IFAIL=6
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the ISIGMA=0 option with a redescending ψ function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of
σ is too small, then the standardized residuals
xi-θ^kσc , will be large and all the residuals may fall into the region for which
ψt=0. This may incorrectly terminate the iterations thus making
THETA and
SIGMA invalid.
Re-enter the routine with a larger value of σc or with ISIGMA=1.
7 Accuracy
On successful exit the accuracy of the results is related to the value of
TOL, see
Section 5.
8 Further Comments
When you supply the initial values, care has to be taken over the choice of the initial value of
σ. If too small a value of
σ is chosen then initial values of the standardized residuals
xi-θ^kσ
will be large. If the redescending
ψ functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of
Hampel et al. (1986).
9 Example
The following program reads in a set of data consisting of eleven observations of a variable X.
For this example, Hampel's Piecewise Linear Function is used (IPSI=2), values for h1, h2 and h3 along with d for the χ function, being read from the data file.
Using the following starting values various estimates of
θ and
σ are calculated and printed along with the number of iterations used:
(a) |
G07DBF determines the starting values, σ is estimated simultaneously. |
(b) |
You must supply the starting values, σ is estimated simultaneously. |
(c) |
G07DBF determines the starting values, σ is fixed. |
(d) |
You must supply the starting values, σ is fixed. |
9.1 Program Text
Program Text (g07dbfe.f90)
9.2 Program Data
Program Data (g07dbfe.d)
9.3 Program Results
Program Results (g07dbfe.r)