Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_univar_robust_1var_mestim_wgt (g07dc)

## Purpose

nag_univar_robust_1var_mestim_wgt (g07dc) computes an $M$-estimate of location with (optional) simultaneous estimation of scale, where you provide the weight functions.

## Syntax

[theta, sigma, rs, nit, wrk, ifail] = g07dc(chi, psi, isigma, x, beta, theta, sigma, tol, 'n', n, 'maxit', maxit)
[theta, sigma, rs, nit, wrk, ifail] = nag_univar_robust_1var_mestim_wgt(chi, psi, isigma, x, beta, theta, sigma, tol, 'n', n, 'maxit', maxit)

## Description

The data consists of a sample of size $n$, denoted by ${x}_{1},{x}_{2},\dots ,{x}_{n}$, drawn from a random variable $X$.
The ${x}_{i}$ are assumed to be independent with an unknown distribution function of the form,
 $Fxi-θ/σ$
where $\theta$ is a location argument, and $\sigma$ is a scale argument. $M$-estimators of $\theta$ and $\sigma$ are given by the solution to the following system of equations;
 $∑i=1nψxi-θ^/σ^ = 0 ∑i=1nχxi-θ^/σ^ = n-1β$
where $\psi$ and $\chi$ are user-supplied weight functions, and $\beta$ is a constant. Optionally the second equation can be omitted and the first equation is solved for $\stackrel{^}{\theta }$ using an assigned value of $\sigma ={\sigma }_{c}$.
The constant $\beta$ should be chosen so that $\stackrel{^}{\sigma }$ is an unbiased estimator when ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$ has a Normal distribution. To achieve this the value of $\beta$ is calculated as:
 $β=Eχ=∫-∞∞χz12πexp-z22dz$
The values of $\psi \left(\frac{{x}_{i}-\stackrel{^}{\theta }}{\stackrel{^}{\sigma }}\right)\stackrel{^}{\sigma }$ are known as the Winsorized residuals.
The equations are solved by a simple iterative procedure, suggested by Huber:
 $σ^k=1βn-1 ∑i=1nχ xi-θ^k-1σ^k-1 σ^k-12$
and
 $θ^k=θ^k- 1+1n ∑i= 1nψ xi-θ^k- 1σ^k σ^k$
or
 $σ^k=σc$
if $\sigma$ is fixed.
The initial values for $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$ may be user-supplied or calculated within nag_univar_robust_1var_mestim (g07db) as the sample median and an estimate of $\sigma$ based on the median absolute deviation respectively.
nag_univar_robust_1var_mestim_wgt (g07dc) is based upon function LYHALG within the ROBETH library, see Marazzi (1987).

## References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{chi}$ – function handle or string containing name of m-file
chi must return the value of the weight function $\chi$ for a given value of its argument. The value of $\chi$ must be non-negative.
[result] = chi(t)

Input Parameters

1:     $\mathrm{t}$ – double scalar
The argument for which chi must be evaluated.

Output Parameters

1:     $\mathrm{result}$ – double scalar
The value of the weight function $\varphi$ evaluated at t.
2:     $\mathrm{psi}$ – function handle or string containing name of m-file
psi must return the value of the weight function $\psi$ for a given value of its argument.
[result] = psi(t)

Input Parameters

1:     $\mathrm{t}$ – double scalar
The argument for which psi must be evaluated.

Output Parameters

1:     $\mathrm{result}$ – double scalar
The value of the weight function $\psi$ evaluated at t.
3:     $\mathrm{isigma}$int64int32nag_int scalar
The value assigned to isigma determines whether $\stackrel{^}{\sigma }$ is to be simultaneously estimated.
${\mathbf{isigma}}=0$
The estimation of $\stackrel{^}{\sigma }$ is bypassed and sigma is set equal to ${\sigma }_{c}$.
${\mathbf{isigma}}=1$
$\stackrel{^}{\sigma }$ is estimated simultaneously.
4:     $\mathrm{x}\left({\mathbf{n}}\right)$ – double array
The vector of observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$.
5:     $\mathrm{beta}$ – double scalar
The value of the constant $\beta$ of the chosen chi function.
Constraint: ${\mathbf{beta}}>0.0$.
6:     $\mathrm{theta}$ – double scalar
If ${\mathbf{sigma}}>0$, then theta must be set to the required starting value of the estimate of the location argument $\stackrel{^}{\theta }$. A reasonable initial value for $\stackrel{^}{\theta }$ will often be the sample mean or median.
7:     $\mathrm{sigma}$ – double scalar
The role of sigma depends on the value assigned to isigma as follows.
If ${\mathbf{isigma}}=1$, sigma must be assigned a value which determines the values of the starting points for the calculation of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. If ${\mathbf{sigma}}\le 0.0$, then nag_univar_robust_1var_mestim_wgt (g07dc) will determine the starting points of $\stackrel{^}{\theta }$ and $\stackrel{^}{\sigma }$. Otherwise, the value assigned to sigma will be taken as the starting point for $\stackrel{^}{\sigma }$, and theta must be assigned a relevant value before entry, see above.
If ${\mathbf{isigma}}=0$, sigma must be assigned a value which determines the values of ${\sigma }_{c}$, which is held fixed during the iterations, and the starting value for the calculation of $\stackrel{^}{\theta }$. If ${\mathbf{sigma}}\le 0$, then nag_univar_robust_1var_mestim_wgt (g07dc) will determine the value of ${\sigma }_{c}$ as the median absolute deviation adjusted to reduce bias (see nag_univar_robust_1var_median (g07da)) and the starting point for $\theta$. Otherwise, the value assigned to sigma will be taken as the value of ${\sigma }_{c}$ and theta must be assigned a relevant value before entry, see above.
8:     $\mathrm{tol}$ – double scalar
The relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than ${\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1.0,{\sigma }_{k-1}\right)$.
Constraint: ${\mathbf{tol}}>0.0$.

### Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the dimension of the array x.
$n$, the number of observations.
Constraint: ${\mathbf{n}}>1$.
2:     $\mathrm{maxit}$int64int32nag_int scalar
Default: $50$
The maximum number of iterations that should be used during the estimation.
Constraint: ${\mathbf{maxit}}>0$.

### Output Parameters

1:     $\mathrm{theta}$ – double scalar
The $M$-estimate of the location argument $\stackrel{^}{\theta }$.
2:     $\mathrm{sigma}$ – double scalar
The $M$-estimate of the scale argument $\stackrel{^}{\sigma }$, if isigma was assigned the value $1$ on entry, otherwise sigma will contain the initial fixed value ${\sigma }_{c}$.
3:     $\mathrm{rs}\left({\mathbf{n}}\right)$ – double array
The Winsorized residuals.
4:     $\mathrm{nit}$int64int32nag_int scalar
The number of iterations that were used during the estimation.
5:     $\mathrm{wrk}\left({\mathbf{n}}\right)$ – double array
If ${\mathbf{sigma}}\le 0.0$ on entry, wrk will contain the $n$ observations in ascending order.
6:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{n}}\le 1$, or ${\mathbf{maxit}}\le 0$, or ${\mathbf{tol}}\le 0.0$, or ${\mathbf{isigma}}\ne 0$ or $1$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{beta}}\le 0.0$.
${\mathbf{ifail}}=3$
 On entry, all elements of the input array x are equal.
${\mathbf{ifail}}=4$
sigma, the current estimate of $\sigma$, is zero or negative. This error exit is very unlikely, although it may be caused by too large an initial value of sigma.
${\mathbf{ifail}}=5$
The number of iterations required exceeds maxit.
${\mathbf{ifail}}=6$
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the ${\mathbf{isigma}}=0$ option with a redescending $\psi$ function, i.e., $\psi =0$ if $\left|t\right|>\tau$, for some positive constant $\tau$.
If the given value of $\sigma$ is too small, then the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{{\sigma }_{c}}$, will be large and all the residuals may fall into the region for which $\psi \left(t\right)=0$. This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of ${\sigma }_{c}$ or with ${\mathbf{isigma}}=1$.
${\mathbf{ifail}}=7$
The value returned by the chi function is negative.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

On successful exit the accuracy of the results is related to the value of tol, see Arguments.

Standard forms of the functions $\psi$ and $\chi$ are given in Hampel et al. (1986), Huber (1981) and Marazzi (1987). nag_univar_robust_1var_mestim (g07db) calculates $M$-estimates using some standard forms for $\psi$ and $\chi$.
When you supply the initial values, care has to be taken over the choice of the initial value of $\sigma$. If too small a value is chosen then initial values of the standardized residuals $\frac{{x}_{i}-{\stackrel{^}{\theta }}_{k}}{\sigma }$ will be large. If the redescending $\psi$ functions are used, i.e., $\psi =0$ if $\left|t\right|>\tau$, for some positive constant $\tau$, then these large values are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see page 152 of Hampel et al. (1986).

## Example

The following program reads in a set of data consisting of eleven observations of a variable $X$.
The psi and chi functions used are Hampel's Piecewise Linear Function and Hubers chi function respectively.
Using the following starting values various estimates of $\theta$ and $\sigma$ are calculated and printed along with the number of iterations used:
 (a) nag_univar_robust_1var_mestim_wgt (g07dc) determined the starting values, $\sigma$ is estimated simultaneously. (b) You must supply the starting values, $\sigma$ is estimated simultaneously. (c) nag_univar_robust_1var_mestim_wgt (g07dc) determined the starting values, $\sigma$ is fixed. (d) You must supply the starting values, $\sigma$ is fixed.
```function g07dc_example

fprintf('g07dc example results\n\n');

global dchi h1 h2 h3;

dchi = 1.5;
h1   = 1.5;
h2   = 3.0;
h3   = 4.5;

x = [13; 11; 16;  5;  3; 18;  9;  8;  6; 27;  7];

% Controll parameter
beta = 0.3892326;
tol  = 0.0001;

% Loop over combinations of isigma sigma and theta
isigma = int64([ 1  1  0  0]);
sigma  =         [-1  7 -1  7];
theta  =         [ 0  2  0  2];

fprintf('           Input parameters     Output parameters\n');
fprintf(' isigma   sigma   theta   tol    sigma  theta\n');

for j = 1:numel(theta)

fprintf('%3d   %8.4f%8.4f%8.4f', isigma(j), sigma(j), theta(j), tol);

[thetaOut, sigmaOut, rs, nit, wrk, ifail] = ...
g07dc( ...
@chi, @psi, isigma(j), x, beta, theta(j), sigma(j), tol);

fprintf(' %8.4f%8.4f\n', sigmaOut, thetaOut);

end

function [result] = chi(t)
% Hubers Chi function
global dchi;

ps = min(dchi, abs(t));
result = ps*ps/2;

function [result] = psi(t)
% Hampels piecewise linear function
global h1 h2 h3;

if abs(t) < h3
if abs(t) < h2
result=min(h1, abs(t));
else
result=h1*(h3-abs(t))/(h3-h2);
end
if t < 0
result = -result;
end
else
result=0;
end
```
```g07dc example results

Input parameters     Output parameters
isigma   sigma   theta   tol    sigma  theta
1    -1.0000  0.0000  0.0001   6.3247 10.5487
1     7.0000  2.0000  0.0001   6.3249 10.5487
0    -1.0000  0.0000  0.0001   5.9304 10.4896
0     7.0000  2.0000  0.0001   7.0000 10.6500
```