where	$x_{i}$ is a vector of length $m$ containing the elements of the $i$ th row of x,
	$z_{i}$ is a vector of length $m$ ,
	$I$ is the identity matrix and $0$ is the zero matrix,
and	$w$ and $u$ are suitable functions.

nag_correg_robustm_corr_huber (g02hk) uses weight functions:

\begin{array}{l} u (t) = \frac{a_{u}}{t^{2}}, & if ​ t < a_{u}^{2} \\ u (t) = 1, & if ​ a_{u}^{2} \leq t \leq b_{u}^{2} \\ u (t) = \frac{b_{u}}{t^{2}}, & if ​ t > b_{u}^{2} \end{array}

and

\begin{array}{l} w (t) = 1, & if ​ t \leq c_{w} \\ w (t) = \frac{c_{w}}{t}, & if ​ t > c_{w} \end{array}

for constants

a_{u}

b_{u}

and

c_{w}

These functions solve a minimax problem considered by Huber (see Huber (1981)). The values of

a_{u}

b_{u}

and

c_{w}

are calculated from the expected fraction of gross errors,

ε

(see Huber (1981) and Marazzi (1987)). The expected fraction of gross errors is the estimated proportion of outliers in the sample.

In order to make the estimate asymptotically unbiased under a Normal model a correction factor,

τ^{2}

, is calculated, (see Huber (1981) and Marazzi (1987)).

The matrix

C

is calculated using nag_correg_robustm_corr_user_deriv (g02hl). Initial estimates of

θ_{j}

, for

j = 1, 2, \dots, m

, are given by the median of the

j

th column of

X

and the initial value of

A

is based on the median absolute deviation (see Marazzi (1987)). nag_correg_robustm_corr_huber (g02hk) is based on routines in ROBETH; see Marazzi (1987).

References

Huber P J (1981) Robust Statistics Wiley

Marazzi A (1987) Weights for bounded influence regression in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 3 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

Parameters

Compulsory Input Parameters

1: $x (ldx, m)$ – double array: ldx, the first dimension of the array, must satisfy the constraint $ldx \geq n$ .
$x (i, j)$ must contain the $i$ th observation for the $j$ th variable, for $i = 1, 2, \dots, n$ and $j = 1, 2, \dots, m$ .
2: $eps$ – double scalar: $ε$ , the expected fraction of gross errors expected in the sample.

Constraint: $0.0 \leq eps < 1.0$ .

Optional Input Parameters

1: $n$ – int64int32nag_int scalar

Default: the first dimension of the array x.

n

, the number of observations.

Constraint:

n > 1

2: $m$ – int64int32nag_int scalar

Default: the second dimension of the array x.

m

, the number of columns of the matrix

X

, i.e., number of independent variables.

Constraint:

1 \leq m \leq n

3: $maxit$ – int64int32nag_int scalar

Default:

150

The maximum number of iterations that will be used during the calculation of the covariance matrix.

Constraint:

maxit > 0

4: $nitmon$ – int64int32nag_int scalar

Default:

0

Indicates the amount of information on the iteration that is printed.

$nitmon > 0$: The value of $A$ , $θ$ and $δ$ (see Accuracy) will be printed at the first and every nitmon iterations.
$nitmon \leq 0$: No iteration monitoring is printed.

When printing occurs the output is directed to the current advisory message unit (see nag_file_set_unit_advisory (x04ab)).

5: $tol$ – double scalar

Default:

5e-5

The relative precision for the final estimates of the covariance matrix.

Constraint:

tol > 0.0

Output Parameters

1: $covar (m \times (m + 1) / 2)$ – double array: A robust estimate of the covariance matrix, $C$ . The upper triangular part of the matrix $C$ is stored packed by columns. $C_{i j}$ is returned in $covar ((j \times (j - 1) / 2 + i))$ , $i \leq j$ .
2: $theta (m)$ – double array: The robust estimate of the location arguments $θ_{j}$ , for $j = 1, 2, \dots, m$ .
3: $nit$ – int64int32nag_int scalar: The number of iterations performed.
4: $ifail$ – int64int32nag_int scalar: $ifail = 0$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

$ifail = 1$

On entry,	$n \leq 1$ ,
or	$m < 1$ ,
or	$n < m$ ,
or	$ldx < n$ ,
or	$eps < 0.0$ ,
or	$eps \geq 1.0$ ,
or	$tol \leq 0.0$ ,
or	$maxit \leq 0$ .

$ifail = 2$

On entry,

a variable has a constant value, i.e., all elements in a column of

X

are identical.

$ifail = 3$: The iterative procedure to find $C$ has failed to converge in maxit iterations.

$ifail = 4$: The iterative procedure to find $C$ has become unstable. This may happen if the value of eps is too large for the sample.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

On successful exit the accuracy of the results is related to the value of tol; see Arguments. At an iteration let

(i)	$d 1 =$ the maximum value of the absolute relative change in $A$
(ii)	$d 2 =$ the maximum absolute change in $u ({‖z_{i}‖}_{2})$
(iii)	$d 3 =$ the maximum absolute relative change in $θ_{j}$

and let

δ = \max (d 1, d 2, d 3)

. Then the iterative procedure is assumed to have converged when

δ < tol

Further Comments

The existence of

A

, and hence

C

, will depend upon the function

u

(see Marazzi (1987)); also if

X

is not of full rank a value of

A

will not be found. If the columns of

X

are almost linearly related, then convergence will be slow.

Example

A sample of

10

observations on three variables is read in and the robust estimate of the covariance matrix is computed assuming 10% gross errors are to be expected. The robust covariance is then printed.

Open in the MATLAB editor: g02hk_example

function g02hk_example


fprintf('g02hk example results\n\n');

x = [3.4, 6.9, 12.2;
     6.4, 2.5, 15.1;
     4.9, 5.5, 14.2;
     7.3, 1.9, 18.2;
     8.8, 3.6, 11.7;
     8.4, 1.3, 17.9;
     5.3, 3.1, 15.0;
     2.7, 8.1,  7.7;
     6.1, 3.0, 21.9;
     5.3, 2.2, 13.9];
epsilon = 0.1;

% Compute robust estimate of variance / covariance matrix
[covar, theta, nit, ifail] = g02hk( ...
                                    x, epsilon);

fprintf(' iterations to convergence = %4d\n\n', nit);
mtitle = 'Covariance matrix';
n = int64(size(x,2));
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
                 uplo, diag, n, covar, mtitle);
fprintf('\n');
disp('Theta');
disp(theta);

g02hk example results

 iterations to convergence =   23

 Covariance matrix
             1          2          3
 1      3.4611    -3.6806     4.6818
 2                 5.3477    -6.6445
 3                           14.4389

Theta
    5.8178
    3.6813
   15.0369

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox