nag_correg_robustm_corr_user (g02hm) computes a robust estimate of the covariance matrix for user-supplied weight functions. The derivatives of the weight functions are not required.

Syntax

[user, covar, a, wt, theta, nit, ifail] = g02hm(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)

[user, covar, a, wt, theta, nit, ifail] = nag_correg_robustm_corr_user(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)

Note: the interface to this routine has changed since earlier releases of the toolbox:

At Mark 23:	nitmon and tol were made optional
At Mark 22:	n was made optional

Description

For a set of

n

observations on

m

variables in a matrix

X

, a robust estimate of the covariance matrix,

C

, and a robust estimate of location,

θ

, are given by

C = τ^{2} {(A^{T} A)}^{- 1},

where

τ^{2}

is a correction factor and

A

is a lower triangular matrix found as the solution to the following equations.

z_{i} = A (x_{i} - θ)

\frac{1}{n} \sum_{i = 1}^{n} w ({‖z_{i}‖}_{2}) z_{i} = 0

and

\frac{1}{n} \sum_{i = 1}^{n} u ({‖z_{i}‖}_{2}) z_{i} z_{i}^{T} - v ({‖z_{i}‖}_{2}) I = 0,

where	$x_{i}$ is a vector of length $m$ containing the elements of the $i$ th row of $X$ ,
	$z_{i}$ is a vector of length $m$ ,
	$I$ is the identity matrix and $0$ is the zero matrix.
and	$w$ and $u$ are suitable functions.

nag_correg_robustm_corr_user (g02hm) covers two situations:

(i)	$v (t) = 1$ for all $t$ ,
(ii)	$v (t) = u (t)$ .

The robust covariance matrix may be calculated from a weighted sum of squares and cross-products matrix about

θ

using weights

{wt}_{i} = u (‖z_{i}‖)

. In case (i) a divisor of

n

is used and in case (ii) a divisor of

\sum_{i = 1}^{n} {wt}_{i}

is used. If

w (.) = \sqrt{u (.)}

, then the robust covariance matrix can be calculated by scaling each row of

X

\sqrt{{wt}_{i}}

and calculating an unweighted covariance matrix about

θ

In order to make the estimate asymptotically unbiased under a Normal model a correction factor,

τ^{2}

, is needed. The value of the correction factor will depend on the functions employed (see Huber (1981) and Marazzi (1987)).

nag_correg_robustm_corr_user (g02hm) finds

A

using the iterative procedure as given by Huber; see Huber (1981).

A_{k} = (S_{k} + I) A_{k - 1}

and

θ_{j_{k}} = \frac{b_{j}}{D_{1}} + θ_{j_{k - 1}},

where

S_{k} = (s_{j l})

, for

j = 1, 2, \dots, m

and

l = 1, 2, \dots, m

is a lower triangular matrix such that

s_{j l} = \{\begin{cases} - \min [\max (h_{j l} / D_{2}, - BL), BL], & j > l \\ - \min [\max (\frac{1}{2} (h_{j j} / D_{2} - 1), - BD), BD], & j = l \end{cases},

where

$D_{1} = \sum_{i = 1}^{n} w ({‖z_{i}‖}_{2})$
$D_{2} = \sum_{i = 1}^{n} u ({‖z_{i}‖}_{2})$
$h_{j l} = \sum_{i = 1}^{n} u ({‖z_{i}‖}_{2}) z_{i j} z_{i l}$ , for $j \geq l$
$b_{j} = \sum_{i = 1}^{n} w ({‖z_{i}‖}_{2}) (x_{i j} - b_{j})$

and

BD

and

BL

are suitable bounds.

The value of

τ

may be chosen so that

C

is unbiased if the observations are from a given distribution.

nag_correg_robustm_corr_user (g02hm) is based on routines in ROBETH; see Marazzi (1987).

References

Huber P J (1981) Robust Statistics Wiley

Marazzi A (1987) Weights for bounded influence regression in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 3 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

Parameters

Compulsory Input Parameters

1: $ucv$ – function handle or string containing name of m-file

ucv must return the values of the functions

u

and

w

for a given value of its argument.

[user, u, w] = ucv(t, user)

Input Parameters

1: $t$ – double scalar: The argument for which the functions $u$ and $w$ must be evaluated.
2: $user$ – Any MATLAB object: ucv is called from nag_correg_robustm_corr_user (g02hm) with the object supplied to nag_correg_robustm_corr_user (g02hm).

Output Parameters

1: $user$ – Any MATLAB object
2: $u$ – double scalar: The value of the $u$ function at the point t.
3: $w$ – double scalar: The value of the $w$ function at the point t.

2: $indm$ – int64int32nag_int scalar

Indicates which form of the function

v

will be used.

$indm = 1$: $v = 1$ .
$indm \neq 1$: $v = u$ .

3: $x (ldx, m)$ – double array

ldx, the first dimension of the array, must satisfy the constraint

ldx \geq n

x (i, j)

must contain the

i

th observation on the

j

th variable, for

i = 1, 2, \dots, n

and

j = 1, 2, \dots, m

4: $a (m \times (m + 1) / 2)$ – double array

An initial estimate of the lower triangular real matrix

A

. Only the lower triangular elements must be given and these should be stored row-wise in the array.

The diagonal elements must be

\neq 0

, and in practice will usually be

> 0

. If the magnitudes of the columns of

X

are of the same order, the identity matrix will often provide a suitable initial value for

A

. If the columns of

X

are of different magnitudes, the diagonal elements of the initial value of

A

should be approximately inversely proportional to the magnitude of the columns of

X

Constraint:

a (j \times (j - 1) / 2 + j) \neq 0.0

, for

j = 1, 2, \dots, m

5: $theta (m)$ – double array

An initial estimate of the location argument,

θ_{j}

, for

j = 1, 2, \dots, m

In many cases an initial estimate of

θ_{j} = 0

, for

j = 1, 2, \dots, m

, will be adequate. Alternatively medians may be used as given by nag_univar_robust_1var_median (g07da).

Optional Input Parameters

1: $user$ – Any MATLAB object

user is not used by nag_correg_robustm_corr_user (g02hm), but is passed to ucv. Note that for large objects it may be more efficient to use a global variable which is accessible from the m-files than to use user.

2: $n$ – int64int32nag_int scalar

Default: the first dimension of the array x.

n

, the number of observations.

Constraint:

n > 1

3: $m$ – int64int32nag_int scalar

Default: the dimension of the array theta and the second dimension of the array x. (An error is raised if these dimensions are not equal.)

m

, the number of columns of the matrix

X

, i.e., number of independent variables.

Constraint:

1 \leq m \leq n

4: $bl$ – double scalar

Default:

0.9

The magnitude of the bound for the off-diagonal elements of

S_{k}

B L

Constraint:

bl > 0.0

5: $bd$ – double scalar

Default:

0.9

The magnitude of the bound for the diagonal elements of

S_{k}

B D

Constraint:

bd > 0.0

6: $maxit$ – int64int32nag_int scalar

Default:

150

The maximum number of iterations that will be used during the calculation of

A

Constraint:

maxit > 0

7: $nitmon$ – int64int32nag_int scalar

Default:

0

Indicates the amount of information on the iteration that is printed.

$nitmon > 0$: The value of $A$ , $θ$ and $δ$ (see Accuracy) will be printed at the first and every nitmon iterations.
$nitmon \leq 0$: No iteration monitoring is printed.

When printing occurs the output is directed to the current advisory message channel (See nag_file_set_unit_advisory (x04ab).)

8: $tol$ – double scalar

Default:

5e-5

The relative precision for the final estimate of the covariance matrix. Iteration will stop when maximum

δ

(see Accuracy) is less than tol.

Constraint:

tol > 0.0

Output Parameters

1: $user$ – Any MATLAB object
2: $covar (m \times (m + 1) / 2)$ – double array: A robust estimate of the covariance matrix, $C$ . The upper triangular part of the matrix $C$ is stored packed by columns (lower triangular stored by rows), that is $C_{i j}$ is returned in $covar (j \times (j - 1) / 2 + i)$ , $i \leq j$ .
3: $a (m \times (m + 1) / 2)$ – double array: The lower triangular elements of the inverse of the matrix $A$ , stored row-wise.
4: $wt (n)$ – double array: $wt (i)$ contains the weights, ${wt}_{i} = u ({‖z_{i}‖}_{2})$ , for $i = 1, 2, \dots, n$ .
5: $theta (m)$ – double array: Contains the robust estimate of the location argument, $θ_{j}$ , for $j = 1, 2, \dots, m$ .
6: $nit$ – int64int32nag_int scalar: The number of iterations performed.
7: $ifail$ – int64int32nag_int scalar: $ifail = 0$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

$ifail = 1$

On entry,	$n \leq 1$ ,
or	$m < 1$ ,
or	$n < m$ ,
or	$ldx < n$ .

$ifail = 2$

On entry,	$tol \leq 0.0$ ,
or	$maxit \leq 0$ ,
or	diagonal element of $a = 0.0$ ,
or	$bl \leq 0.0$ ,
or	$bd \leq 0.0$ .

$ifail = 3$: A column of x has a constant value.

$ifail = 4$: Value of u or w returned by $ucv < 0$ .

$ifail = 5$: The function has failed to converge in maxit iterations.

W $ifail = 6$: Either the sum $D_{1}$ or the sum $D_{2}$ is zero. This may be caused by the functions $u$ or $w$ being too strict for the current estimate of $A$ (or $C$ ). You should either try a larger initial estimate of $A$ or make the $u$ and $w$ functions less strict.

$ifail = - 99$: An unexpected error has been triggered by this routine. Please contact NAG.

$ifail = - 399$: Your licence key may have expired or may not have been installed correctly.

$ifail = - 999$: Dynamic memory allocation failed.

Accuracy

On successful exit the accuracy of the results is related to the value of tol; see Arguments. At an iteration let

(i)	$d 1 =$ the maximum value of $\|s_{j l}\|$
(ii)	$d 2 =$ the maximum absolute change in $w t (i)$
(iii)	$d 3 =$ the maximum absolute relative change in $θ_{j}$

and let

δ = \max (d 1, d 2, d 3)

. Then the iterative procedure is assumed to have converged when

δ < tol

Further Comments

The existence of

A

will depend upon the function

u

(see Marazzi (1987)); also if

X

is not of full rank a value of

A

will not be found. If the columns of

X

are almost linearly related, then convergence will be slow.

If derivatives of the

u

and

w

functions are available then the method used in nag_correg_robustm_corr_user_deriv (g02hl) will usually give much faster convergence.

Example

A sample of

10

observations on three variables is read in along with initial values for

A

and

θ

and argument values for the

u

and

w

functions,

c_{u}

and

c_{w}

. The covariance matrix computed by nag_correg_robustm_corr_user (g02hm) is printed along with the robust estimate of

θ

ucv computes the Huber's weight functions:

\begin{array}{l} u (t) = 1, & if t \leq c_{u}^{2} \\ u (t) = \frac{c_{u}}{t^{2}}, & if t > c_{u}^{2} \end{array}

and

\begin{array}{l} w (t) = 1, & if t \leq c_{w} \\ w (t) = \frac{c_{w}}{t}, & if t > c_{w} . \end{array}

Open in the MATLAB editor: g02hm_example

function g02hm_example


fprintf('g02hm example results\n\n');

indm = int64(1);
x = [3.4, 6.9, 12.2;
     6.4, 2.5, 15.1;
     4.9, 5.5, 14.2;
     7.3, 1.9, 18.2;
     8.8, 3.6, 11.7;
     8.4, 1.3, 17.9;
     5.3, 3.1, 15.0;
     2.7, 8.1,  7.7;
     6.1, 3.0, 21.9;
     5.3, 2.2, 13.9];

[n,m] = size(x);
indm = int64(1);

% Initial values
a = [1;
     0;     1;
     0;     0;     1];
theta = zeros(m,1);

user = [4, 2];
[user, covar, a, wt, theta, nit, ifail] = ...
    g02hm( ...
           @ucv, indm, x, a, theta, 'user', user);

fprintf(' iterations to convergence = %4d\n\n', nit);
mtitle = 'Robust covariance matrix';
n = int64(size(x,2));
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
                 uplo, diag, n, covar, mtitle);
fprintf('\n');
disp('Robust estimates of theta');
disp(theta);



function [userp, u, w] = ucv(t, userp)
  cu = userp(1);
  u = 1.0;
  if (t ~= 0)
    t2 = t*t;
    if (t2 > cu)
      u = cu/t2;
    end
  end
  % w function and derivative
  cw = userp(2);
  if (t > cw)
    w = cw/t;
  else
    w = 1.0;
  end

g02hm example results

 iterations to convergence =   34

 Robust covariance matrix
             1          2          3
 1      3.2779    -3.6918     4.7391
 2                 5.2841    -6.4087
 3                           11.8373

Robust estimates of theta
    5.6998
    3.8636
   14.7036

PDF version (NAG web site, 64-bit version, 64-bit version)

Chapter Contents

Chapter Introduction

NAG Toolbox