NAG Library Routine Document
g07dcf (robust_1var_mestim_wgt)
1
Purpose
g07dcf computes an $M$estimate of location with (optional) simultaneous estimation of scale, where you provide the weight functions.
2
Specification
Fortran Interface
Subroutine g07dcf ( 
chi, psi, isigma, n, x, beta, theta, sigma, maxit, tol, rs, nit, wrk, ifail) 
Integer, Intent (In)  ::  isigma, n, maxit  Integer, Intent (Inout)  ::  ifail  Integer, Intent (Out)  ::  nit  Real (Kind=nag_wp), External  ::  chi, psi  Real (Kind=nag_wp), Intent (In)  ::  x(n), beta, tol  Real (Kind=nag_wp), Intent (Inout)  ::  theta, sigma  Real (Kind=nag_wp), Intent (Out)  ::  rs(n), wrk(n) 

C Header Interface
#include nagmk26.h
void 
g07dcf_ ( double (NAG_CALL *chi)(const double *t), double (NAG_CALL *psi)(const double *t), const Integer *isigma, const Integer *n, const double x[], const double *beta, double *theta, double *sigma, const Integer *maxit, const double *tol, double rs[], Integer *nit, double wrk[], Integer *ifail) 

3
Description
The data consists of a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$.
The
${x}_{i}$ are assumed to be independent with an unknown distribution function of the form,
where
$\theta $ is a location argument, and
$\sigma $ is a scale argument.
$M$estimators of
$\theta $ and
$\sigma $ are given by the solution to the following system of equations;
where
$\psi $ and
$\chi $ are usersupplied weight functions, and
$\beta $ is a constant. Optionally the second equation can be omitted and the first equation is solved for
$\hat{\theta}$ using an assigned value of
$\sigma ={\sigma}_{c}$.
The constant
$\beta $ should be chosen so that
$\hat{\sigma}$ is an unbiased estimator when
${x}_{\mathit{i}}$, for
$\mathit{i}=1,2,\dots ,n$ has a Normal distribution. To achieve this the value of
$\beta $ is calculated as:
The values of
$\psi \left(\frac{{x}_{i}\hat{\theta}}{\hat{\sigma}}\right)\hat{\sigma}$ are known as the Winsorized residuals.
The equations are solved by a simple iterative procedure, suggested by Huber:
and
or
if
$\sigma $ is fixed.
The initial values for
$\hat{\theta}$ and
$\hat{\sigma}$ may be usersupplied or calculated within
g07dbf as the sample median and an estimate of
$\sigma $ based on the median absolute deviation respectively.
g07dcf is based upon subroutine LYHALG within the ROBETH library, see
Marazzi (1987).
4
References
Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne
5
Arguments
 1: $\mathbf{chi}$ – real (Kind=nag_wp) Function, supplied by the user.External Procedure

chi must return the value of the weight function
$\chi $ for a given value of its argument. The value of
$\chi $ must be nonnegative.
The specification of
chi is:
Fortran Interface
Real (Kind=nag_wp)  ::  chi  Real (Kind=nag_wp), Intent (In)  ::  t 

C Header Interface
#include nagmk26.h
double 
chi (const double *t) 

 1: $\mathbf{t}$ – Real (Kind=nag_wp)Input

On entry: the argument for which
chi must be evaluated.
chi must either be a module subprogram USEd by, or declared as EXTERNAL in, the (sub)program from which
g07dcf is called. Arguments denoted as
Input must
not be changed by this procedure.
Note: chi should not return floatingpoint NaN (Not a Number) or infinity values, since these are not handled by
g07dcf. If your code inadvertently
does return any NaNs or infinities,
g07dcf is likely to produce unexpected results.
 2: $\mathbf{psi}$ – real (Kind=nag_wp) Function, supplied by the user.External Procedure

psi must return the value of the weight function
$\psi $ for a given value of its argument.
The specification of
psi is:
Fortran Interface
Real (Kind=nag_wp)  ::  psi  Real (Kind=nag_wp), Intent (In)  ::  t 

C Header Interface
#include nagmk26.h
double 
psi (const double *t) 

 1: $\mathbf{t}$ – Real (Kind=nag_wp)Input

On entry: the argument for which
psi must be evaluated.
psi must either be a module subprogram USEd by, or declared as EXTERNAL in, the (sub)program from which
g07dcf is called. Arguments denoted as
Input must
not be changed by this procedure.
Note: psi should not return floatingpoint NaN (Not a Number) or infinity values, since these are not handled by
g07dcf. If your code inadvertently
does return any NaNs or infinities,
g07dcf is likely to produce unexpected results.
 3: $\mathbf{isigma}$ – IntegerInput

On entry: the value assigned to
isigma determines whether
$\hat{\sigma}$ is to be simultaneously estimated.
 ${\mathbf{isigma}}=0$
 The estimation of $\hat{\sigma}$ is bypassed and sigma is set equal to ${\sigma}_{c}$.
 ${\mathbf{isigma}}=1$
 $\hat{\sigma}$ is estimated simultaneously.
 4: $\mathbf{n}$ – IntegerInput

On entry: $n$, the number of observations.
Constraint:
${\mathbf{n}}>1$.
 5: $\mathbf{x}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayInput

On entry: the vector of observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$.
 6: $\mathbf{beta}$ – Real (Kind=nag_wp)Input

On entry: the value of the constant
$\beta $ of the chosen
chi function.
Constraint:
${\mathbf{beta}}>0.0$.
 7: $\mathbf{theta}$ – Real (Kind=nag_wp)Input/Output

On entry: if
${\mathbf{sigma}}>0$,
theta must be set to the required starting value of the estimate of the location argument
$\hat{\theta}$. A reasonable initial value for
$\hat{\theta}$ will often be the sample mean or median.
On exit: the $M$estimate of the location argument $\hat{\theta}$.
 8: $\mathbf{sigma}$ – Real (Kind=nag_wp)Input/Output

On entry: the role of
sigma depends on the value assigned to
isigma as follows.
If
${\mathbf{isigma}}=1$,
sigma must be assigned a value which determines the values of the starting points for the calculation of
$\hat{\theta}$ and
$\hat{\sigma}$. If
${\mathbf{sigma}}\le 0.0$,
g07dcf will determine the starting points of
$\hat{\theta}$ and
$\hat{\sigma}$. Otherwise, the value assigned to
sigma will be taken as the starting point for
$\hat{\sigma}$, and
theta must be assigned a relevant value before entry, see above.
If
${\mathbf{isigma}}=0$,
sigma must be assigned a value which determines the values of
${\sigma}_{c}$, which is held fixed during the iterations, and the starting value for the calculation of
$\hat{\theta}$. If
${\mathbf{sigma}}\le 0$,
g07dcf will determine the value of
${\sigma}_{c}$ as the median absolute deviation adjusted to reduce bias (see
g07daf) and the starting point for
$\theta $. Otherwise, the value assigned to
sigma will be taken as the value of
${\sigma}_{c}$ and
theta must be assigned a relevant value before entry, see above.
On exit: the
$M$estimate of the scale argument
$\hat{\sigma}$, if
isigma was assigned the value
$1$ on entry, otherwise
sigma will contain the initial fixed value
${\sigma}_{c}$.
 9: $\mathbf{maxit}$ – IntegerInput

On entry: the maximum number of iterations that should be used during the estimation.
Suggested value:
${\mathbf{maxit}}=50$.
Constraint:
${\mathbf{maxit}}>0$.
 10: $\mathbf{tol}$ – Real (Kind=nag_wp)Input

On entry: the relative precision for the final estimates. Convergence is assumed when the increments for
theta, and
sigma are less than
${\mathbf{tol}}\times \mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1.0,{\sigma}_{k1}\right)$.
Constraint:
${\mathbf{tol}}>0.0$.
 11: $\mathbf{rs}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: the Winsorized residuals.
 12: $\mathbf{nit}$ – IntegerOutput

On exit: the number of iterations that were used during the estimation.
 13: $\mathbf{wrk}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayOutput

On exit: if
${\mathbf{sigma}}\le 0.0$ on entry,
wrk will contain the
$n$ observations in ascending order.
 14: $\mathbf{ifail}$ – IntegerInput/Output

On entry:
ifail must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this argument you should refer to
Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{ or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=1$

On entry,  ${\mathbf{n}}\le 1$, 
or  ${\mathbf{maxit}}\le 0$, 
or  ${\mathbf{tol}}\le 0.0$, 
or  ${\mathbf{isigma}}\ne 0$ or $1$. 
 ${\mathbf{ifail}}=2$

On entry,  ${\mathbf{beta}}\le 0.0$. 
 ${\mathbf{ifail}}=3$

On entry,  all elements of the input array x are equal. 
 ${\mathbf{ifail}}=4$

sigma, the current estimate of
$\sigma $, is zero or negative. This error exit is very unlikely, although it may be caused by too large an initial value of
sigma.
 ${\mathbf{ifail}}=5$

The number of iterations required exceeds
maxit.
 ${\mathbf{ifail}}=6$

On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the ${\mathbf{isigma}}=0$ option with a redescending $\psi $ function, i.e., $\psi =0$ if $\leftt\right>\tau $, for some positive constant $\tau $.
If the given value of
$\sigma $ is too small, the standardized residuals
$\frac{{x}_{i}{\hat{\theta}}_{k}}{{\sigma}_{c}}$, will be large and all the residuals may fall into the region for which
$\psi \left(t\right)=0$. This may incorrectly terminate the iterations thus making
theta and
sigma invalid.
Reenter the routine with a larger value of ${\sigma}_{c}$ or with ${\mathbf{isigma}}=1$.
 ${\mathbf{ifail}}=7$

The value returned by the
chi function is negative.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 3.9 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 3.8 in How to Use the NAG Library and its Documentation for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 3.7 in How to Use the NAG Library and its Documentation for further information.
7
Accuracy
On successful exit the accuracy of the results is related to the value of
tol, see
Section 5.
8
Parallelism and Performance
g07dcf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g07dcf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
Standard forms of the functions
$\psi $ and
$\chi $ are given in
Hampel et al. (1986),
Huber (1981) and
Marazzi (1987).
g07dbf calculates
$M$estimates using some standard forms for
$\psi $ and
$\chi $.
When you supply the initial values, care has to be taken over the choice of the initial value of
$\sigma $. If too small a value is chosen then initial values of the standardized residuals
$\frac{{x}_{i}{\hat{\theta}}_{k}}{\sigma}$ will be large. If the redescending
$\psi $ functions are used, i.e.,
$\psi =0$ if
$\leftt\right>\tau $, for some positive constant
$\tau $, then these large values are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of
Hampel et al. (1986).
10
Example
The following program reads in a set of data consisting of eleven observations of a variable $X$.
The
psi and
chi functions used are Hampel's Piecewise Linear Function and Hubers
chi function respectively.
Using the following starting values various estimates of
$\theta $ and
$\sigma $ are calculated and printed along with the number of iterations used:
(a) 
g07dcf determined the starting values, $\sigma $ is estimated simultaneously. 
(b) 
You must supply the starting values, $\sigma $ is estimated simultaneously. 
(c) 
g07dcf determined the starting values, $\sigma $ is fixed. 
(d) 
You must supply the starting values, $\sigma $ is fixed. 
10.1
Program Text
Program Text (g07dcfe.f90)
10.2
Program Data
Program Data (g07dcfe.d)
10.3
Program Results
Program Results (g07dcfe.r)