Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_robustm_corr_huber (g02hk)

## Purpose

nag_correg_robustm_corr_huber (g02hk) computes a robust estimate of the covariance matrix for an expected fraction of gross errors.

## Syntax

[covar, theta, nit, ifail] = g02hk(x, eps, 'n', n, 'm', m, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
[covar, theta, nit, ifail] = nag_correg_robustm_corr_huber(x, eps, 'n', n, 'm', m, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 23: nitmon and tol were made optional At Mark 22: n was made optional

## Description

For a set of $n$ observations on $m$ variables in a matrix $X$, a robust estimate of the covariance matrix, $C$, and a robust estimate of location, $\theta$, are given by
 $C=τ2ATA-1,$
where ${\tau }^{2}$ is a correction factor and $A$ is a lower triangular matrix found as the solution to the following equations:
 $zi=Axi-θ,$
 $1n ∑i= 1nwzi2zi=0,$
and
 $1n∑i=1nuzi2zi ziT -I=0,$
 where ${x}_{i}$ is a vector of length $m$ containing the elements of the $i$th row of x, ${z}_{i}$ is a vector of length $m$, $I$ is the identity matrix and $0$ is the zero matrix, and $w$ and $u$ are suitable functions.
nag_correg_robustm_corr_huber (g02hk) uses weight functions:
 $ut= aut2, if ​tbu2$
and
 $wt= 1, if ​t≤cw wt= cwt, if ​t>cw$
for constants ${a}_{u}$, ${b}_{u}$ and ${c}_{w}$.
These functions solve a minimax problem considered by Huber (see Huber (1981)). The values of ${a}_{u}$, ${b}_{u}$ and ${c}_{w}$ are calculated from the expected fraction of gross errors, $\epsilon$ (see Huber (1981) and Marazzi (1987)). The expected fraction of gross errors is the estimated proportion of outliers in the sample.
In order to make the estimate asymptotically unbiased under a Normal model a correction factor, ${\tau }^{2}$, is calculated, (see Huber (1981) and Marazzi (1987)).
The matrix $C$ is calculated using nag_correg_robustm_corr_user_deriv (g02hl). Initial estimates of ${\theta }_{j}$, for $j=1,2,\dots ,m$, are given by the median of the $j$th column of $X$ and the initial value of $A$ is based on the median absolute deviation (see Marazzi (1987)). nag_correg_robustm_corr_huber (g02hk) is based on routines in ROBETH; see Marazzi (1987).

## References

Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Weights for bounded influence regression in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 3 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{m}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
2:     $\mathrm{eps}$ – double scalar
$\epsilon$, the expected fraction of gross errors expected in the sample.
Constraint: $0.0\le {\mathbf{eps}}<1.0$.

### Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the first dimension of the array x.
$n$, the number of observations.
Constraint: ${\mathbf{n}}>1$.
2:     $\mathrm{m}$int64int32nag_int scalar
Default: the second dimension of the array x.
$m$, the number of columns of the matrix $X$, i.e., number of independent variables.
Constraint: $1\le {\mathbf{m}}\le {\mathbf{n}}$.
3:     $\mathrm{maxit}$int64int32nag_int scalar
Default: $150$.
The maximum number of iterations that will be used during the calculation of the covariance matrix.
Constraint: ${\mathbf{maxit}}>0$.
4:     $\mathrm{nitmon}$int64int32nag_int scalar
Default: $0$
Indicates the amount of information on the iteration that is printed.
${\mathbf{nitmon}}>0$
The value of $A$, $\theta$ and $\delta$ (see Accuracy) will be printed at the first and every nitmon iterations.
${\mathbf{nitmon}}\le 0$
No iteration monitoring is printed.
When printing occurs the output is directed to the current advisory message unit (see nag_file_set_unit_advisory (x04ab)).
5:     $\mathrm{tol}$ – double scalar
Default: $5e-5$
The relative precision for the final estimates of the covariance matrix.
Constraint: ${\mathbf{tol}}>0.0$.

### Output Parameters

1:     $\mathrm{covar}\left({\mathbf{m}}×\left({\mathbf{m}}+1\right)/2\right)$ – double array
A robust estimate of the covariance matrix, $C$. The upper triangular part of the matrix $C$ is stored packed by columns. ${C}_{ij}$ is returned in ${\mathbf{covar}}\left(\left(j×\left(j-1\right)/2+i\right)\right)$, $i\le j$.
2:     $\mathrm{theta}\left({\mathbf{m}}\right)$ – double array
The robust estimate of the location arguments ${\theta }_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,m$.
3:     $\mathrm{nit}$int64int32nag_int scalar
The number of iterations performed.
4:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{n}}\le 1$, or ${\mathbf{m}}<1$, or ${\mathbf{n}}<{\mathbf{m}}$, or $\mathit{ldx}<{\mathbf{n}}$, or ${\mathbf{eps}}<0.0$, or ${\mathbf{eps}}\ge 1.0$, or ${\mathbf{tol}}\le 0.0$, or ${\mathbf{maxit}}\le 0$.
${\mathbf{ifail}}=2$
 On entry, a variable has a constant value, i.e., all elements in a column of $X$ are identical.
${\mathbf{ifail}}=3$
The iterative procedure to find $C$ has failed to converge in maxit iterations.
${\mathbf{ifail}}=4$
The iterative procedure to find $C$ has become unstable. This may happen if the value of eps is too large for the sample.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

On successful exit the accuracy of the results is related to the value of tol; see Arguments. At an iteration let
 (i) $d1=\text{}$ the maximum value of the absolute relative change in $A$ (ii) $d2=\text{}$ the maximum absolute change in $u\left({‖{z}_{i}‖}_{2}\right)$ (iii) $d3=\text{}$ the maximum absolute relative change in ${\theta }_{j}$
and let $\delta =\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(d1,d2,d3\right)$. Then the iterative procedure is assumed to have converged when $\delta <{\mathbf{tol}}$

The existence of $A$, and hence $C$, will depend upon the function $u$ (see Marazzi (1987)); also if $X$ is not of full rank a value of $A$ will not be found. If the columns of $X$ are almost linearly related, then convergence will be slow.

## Example

A sample of $10$ observations on three variables is read in and the robust estimate of the covariance matrix is computed assuming 10% gross errors are to be expected. The robust covariance is then printed.
```function g02hk_example

fprintf('g02hk example results\n\n');

x = [3.4, 6.9, 12.2;
6.4, 2.5, 15.1;
4.9, 5.5, 14.2;
7.3, 1.9, 18.2;
8.8, 3.6, 11.7;
8.4, 1.3, 17.9;
5.3, 3.1, 15.0;
2.7, 8.1,  7.7;
6.1, 3.0, 21.9;
5.3, 2.2, 13.9];
epsilon = 0.1;

% Compute robust estimate of variance / covariance matrix
[covar, theta, nit, ifail] = g02hk( ...
x, epsilon);

fprintf(' iterations to convergence = %4d\n\n', nit);
mtitle = 'Covariance matrix';
n = int64(size(x,2));
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
uplo, diag, n, covar, mtitle);
fprintf('\n');
disp('Theta');
disp(theta);

```
```g02hk example results

iterations to convergence =   23

Covariance matrix
1          2          3
1      3.4611    -3.6806     4.6818
2                 5.3477    -6.6445
3                           14.4389

Theta
5.8178
3.6813
15.0369

```