Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_robustm_corr_user (g02hm)

Purpose

nag_correg_robustm_corr_user (g02hm) computes a robust estimate of the covariance matrix for user-supplied weight functions. The derivatives of the weight functions are not required.

Syntax

[user, covar, a, wt, theta, nit, ifail] = g02hm(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
[user, covar, a, wt, theta, nit, ifail] = nag_correg_robustm_corr_user(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 23: nitmon and tol were made optional At Mark 22: n was made optional

Description

For a set of $n$ observations on $m$ variables in a matrix $X$, a robust estimate of the covariance matrix, $C$, and a robust estimate of location, $\theta$, are given by
 $C=τ2ATA-1,$
where ${\tau }^{2}$ is a correction factor and $A$ is a lower triangular matrix found as the solution to the following equations.
 $zi=Axi-θ$
 $1n ∑i= 1nwzi2zi=0$
and
 $1n∑i=1nuzi2zi ziT -vzi2I=0,$
 where ${x}_{i}$ is a vector of length $m$ containing the elements of the $i$th row of $X$, ${z}_{i}$ is a vector of length $m$, $I$ is the identity matrix and $0$ is the zero matrix. and $w$ and $u$ are suitable functions.
nag_correg_robustm_corr_user (g02hm) covers two situations:
 (i) $v\left(t\right)=1$ for all $t$, (ii) $v\left(t\right)=u\left(t\right)$.
The robust covariance matrix may be calculated from a weighted sum of squares and cross-products matrix about $\theta$ using weights ${\mathit{wt}}_{i}=u\left(‖{z}_{i}‖\right)$. In case (i) a divisor of $n$ is used and in case (ii) a divisor of $\sum _{i=1}^{n}{\mathit{wt}}_{i}$ is used. If $w\left(.\right)=\sqrt{u\left(.\right)}$, then the robust covariance matrix can be calculated by scaling each row of $X$ by $\sqrt{{\mathit{wt}}_{i}}$ and calculating an unweighted covariance matrix about $\theta$.
In order to make the estimate asymptotically unbiased under a Normal model a correction factor, ${\tau }^{2}$, is needed. The value of the correction factor will depend on the functions employed (see Huber (1981) and Marazzi (1987)).
nag_correg_robustm_corr_user (g02hm) finds $A$ using the iterative procedure as given by Huber; see Huber (1981).
 $Ak=Sk+IAk-1$
and
 $θjk=bjD1+θjk- 1,$
where ${S}_{k}=\left({s}_{jl}\right)$, for $\mathit{j}=1,2,\dots ,m$ and $\mathit{l}=1,2,\dots ,m$ is a lower triangular matrix such that
 $sjl= -minmaxhjl/D2,-BL,BL, j>l -minmax12hjj/D2-1,-BD,BD, j=l ,$
where
• ${D}_{1}=\sum _{i=1}^{n}w\left({‖{z}_{i}‖}_{2}\right)$
• ${D}_{2}=\sum _{i=1}^{n}u\left({‖{z}_{i}‖}_{2}\right)$
• ${h}_{jl}=\sum _{i=1}^{n}u\left({‖{z}_{i}‖}_{2}\right){z}_{ij}{z}_{il}$, for $j\ge l$
• ${b}_{j}=\sum _{i=1}^{n}w\left({‖{z}_{i}‖}_{2}\right)\left({x}_{ij}-{b}_{j}\right)$
and $\mathit{BD}$ and $\mathit{BL}$ are suitable bounds.
The value of $\tau$ may be chosen so that $C$ is unbiased if the observations are from a given distribution.
nag_correg_robustm_corr_user (g02hm) is based on routines in ROBETH; see Marazzi (1987).

References

Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Weights for bounded influence regression in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 3 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

Parameters

Compulsory Input Parameters

1:     $\mathrm{ucv}$ – function handle or string containing name of m-file
ucv must return the values of the functions $u$ and $w$ for a given value of its argument.
[user, u, w] = ucv(t, user)

Input Parameters

1:     $\mathrm{t}$ – double scalar
The argument for which the functions $u$ and $w$ must be evaluated.
2:     $\mathrm{user}$ – Any MATLAB object
ucv is called from nag_correg_robustm_corr_user (g02hm) with the object supplied to nag_correg_robustm_corr_user (g02hm).

Output Parameters

1:     $\mathrm{user}$ – Any MATLAB object
2:     $\mathrm{u}$ – double scalar
The value of the $u$ function at the point t.
3:     $\mathrm{w}$ – double scalar
The value of the $w$ function at the point t.
2:     $\mathrm{indm}$int64int32nag_int scalar
Indicates which form of the function $v$ will be used.
${\mathbf{indm}}=1$
$v=1$.
${\mathbf{indm}}\ne 1$
$v=u$.
3:     $\mathrm{x}\left(\mathit{ldx},{\mathbf{m}}\right)$ – double array
ldx, the first dimension of the array, must satisfy the constraint $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
4:     $\mathrm{a}\left({\mathbf{m}}×\left({\mathbf{m}}+1\right)/2\right)$ – double array
An initial estimate of the lower triangular real matrix $A$. Only the lower triangular elements must be given and these should be stored row-wise in the array.
The diagonal elements must be $\text{}\ne 0$, and in practice will usually be $\text{}>0$. If the magnitudes of the columns of $X$ are of the same order, the identity matrix will often provide a suitable initial value for $A$. If the columns of $X$ are of different magnitudes, the diagonal elements of the initial value of $A$ should be approximately inversely proportional to the magnitude of the columns of $X$.
Constraint: ${\mathbf{a}}\left(\mathit{j}×\left(\mathit{j}-1\right)/2+\mathit{j}\right)\ne 0.0$, for $\mathit{j}=1,2,\dots ,m$.
5:     $\mathrm{theta}\left({\mathbf{m}}\right)$ – double array
An initial estimate of the location argument, ${\theta }_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,m$.
In many cases an initial estimate of ${\theta }_{\mathit{j}}=0$, for $\mathit{j}=1,2,\dots ,m$, will be adequate. Alternatively medians may be used as given by nag_univar_robust_1var_median (g07da).

Optional Input Parameters

1:     $\mathrm{user}$ – Any MATLAB object
user is not used by nag_correg_robustm_corr_user (g02hm), but is passed to ucv. Note that for large objects it may be more efficient to use a global variable which is accessible from the m-files than to use user.
2:     $\mathrm{n}$int64int32nag_int scalar
Default: the first dimension of the array x.
$n$, the number of observations.
Constraint: ${\mathbf{n}}>1$.
3:     $\mathrm{m}$int64int32nag_int scalar
Default: the dimension of the array theta and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
$m$, the number of columns of the matrix $X$, i.e., number of independent variables.
Constraint: $1\le {\mathbf{m}}\le {\mathbf{n}}$.
4:     $\mathrm{bl}$ – double scalar
Default: $0.9$
The magnitude of the bound for the off-diagonal elements of ${S}_{k}$, $BL$.
Constraint: ${\mathbf{bl}}>0.0$.
5:     $\mathrm{bd}$ – double scalar
Default: $0.9$
The magnitude of the bound for the diagonal elements of ${S}_{k}$, $BD$.
Constraint: ${\mathbf{bd}}>0.0$.
6:     $\mathrm{maxit}$int64int32nag_int scalar
Default: $150$
The maximum number of iterations that will be used during the calculation of $A$.
Constraint: ${\mathbf{maxit}}>0$.
7:     $\mathrm{nitmon}$int64int32nag_int scalar
Default: $0$
Indicates the amount of information on the iteration that is printed.
${\mathbf{nitmon}}>0$
The value of $A$, $\theta$ and $\delta$ (see Accuracy) will be printed at the first and every nitmon iterations.
${\mathbf{nitmon}}\le 0$
No iteration monitoring is printed.
When printing occurs the output is directed to the current advisory message channel (See nag_file_set_unit_advisory (x04ab).)
8:     $\mathrm{tol}$ – double scalar
Default: $5e-5$
The relative precision for the final estimate of the covariance matrix. Iteration will stop when maximum $\delta$ (see Accuracy) is less than tol.
Constraint: ${\mathbf{tol}}>0.0$.

Output Parameters

1:     $\mathrm{user}$ – Any MATLAB object
2:     $\mathrm{covar}\left({\mathbf{m}}×\left({\mathbf{m}}+1\right)/2\right)$ – double array
A robust estimate of the covariance matrix, $C$. The upper triangular part of the matrix $C$ is stored packed by columns (lower triangular stored by rows), that is ${C}_{ij}$ is returned in ${\mathbf{covar}}\left(j×\left(j-1\right)/2+i\right)$, $i\le j$.
3:     $\mathrm{a}\left({\mathbf{m}}×\left({\mathbf{m}}+1\right)/2\right)$ – double array
The lower triangular elements of the inverse of the matrix $A$, stored row-wise.
4:     $\mathrm{wt}\left({\mathbf{n}}\right)$ – double array
${\mathbf{wt}}\left(\mathit{i}\right)$ contains the weights, ${\mathit{wt}}_{\mathit{i}}=u\left({‖{z}_{\mathit{i}}‖}_{2}\right)$, for $\mathit{i}=1,2,\dots ,n$.
5:     $\mathrm{theta}\left({\mathbf{m}}\right)$ – double array
Contains the robust estimate of the location argument, ${\theta }_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,m$.
6:     $\mathrm{nit}$int64int32nag_int scalar
The number of iterations performed.
7:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

${\mathbf{ifail}}=1$
 On entry, ${\mathbf{n}}\le 1$, or ${\mathbf{m}}<1$, or ${\mathbf{n}}<{\mathbf{m}}$, or $\mathit{ldx}<{\mathbf{n}}$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{tol}}\le 0.0$, or ${\mathbf{maxit}}\le 0$, or diagonal element of ${\mathbf{a}}=0.0$, or ${\mathbf{bl}}\le 0.0$, or ${\mathbf{bd}}\le 0.0$.
${\mathbf{ifail}}=3$
A column of x has a constant value.
${\mathbf{ifail}}=4$
Value of u or w returned by ${\mathbf{ucv}}<0$.
${\mathbf{ifail}}=5$
The function has failed to converge in maxit iterations.
W  ${\mathbf{ifail}}=6$
Either the sum ${D}_{1}$ or the sum ${D}_{2}$ is zero. This may be caused by the functions $u$ or $w$ being too strict for the current estimate of $A$ (or $C$). You should either try a larger initial estimate of $A$ or make the $u$ and $w$ functions less strict.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

Accuracy

On successful exit the accuracy of the results is related to the value of tol; see Arguments. At an iteration let
 (i) $d1=\text{}$ the maximum value of $\left|{s}_{jl}\right|$ (ii) $d2=\text{}$ the maximum absolute change in $wt\left(i\right)$ (iii) $d3=\text{}$ the maximum absolute relative change in ${\theta }_{j}$
and let $\delta =\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(d1,d2,d3\right)$. Then the iterative procedure is assumed to have converged when $\delta <{\mathbf{tol}}$

The existence of $A$ will depend upon the function $u$ (see Marazzi (1987)); also if $X$ is not of full rank a value of $A$ will not be found. If the columns of $X$ are almost linearly related, then convergence will be slow.
If derivatives of the $u$ and $w$ functions are available then the method used in nag_correg_robustm_corr_user_deriv (g02hl) will usually give much faster convergence.

Example

A sample of $10$ observations on three variables is read in along with initial values for $A$ and $\theta$ and argument values for the $u$ and $w$ functions, ${c}_{u}$ and ${c}_{w}$. The covariance matrix computed by nag_correg_robustm_corr_user (g02hm) is printed along with the robust estimate of $\theta$.
ucv computes the Huber's weight functions:
 $ut=1, if t≤cu2 ut= cut2, if t>cu2$
and
 $wt= 1, if t≤cw wt= cwt, if t>cw.$
```function g02hm_example

fprintf('g02hm example results\n\n');

indm = int64(1);
x = [3.4, 6.9, 12.2;
6.4, 2.5, 15.1;
4.9, 5.5, 14.2;
7.3, 1.9, 18.2;
8.8, 3.6, 11.7;
8.4, 1.3, 17.9;
5.3, 3.1, 15.0;
2.7, 8.1,  7.7;
6.1, 3.0, 21.9;
5.3, 2.2, 13.9];

[n,m] = size(x);
indm = int64(1);

% Initial values
a = [1;
0;     1;
0;     0;     1];
theta = zeros(m,1);

user = [4, 2];
[user, covar, a, wt, theta, nit, ifail] = ...
g02hm( ...
@ucv, indm, x, a, theta, 'user', user);

fprintf(' iterations to convergence = %4d\n\n', nit);
mtitle = 'Robust covariance matrix';
n = int64(size(x,2));
uplo   = 'Upper';
diag   = 'Non-unit';
[ifail] = x04cc( ...
uplo, diag, n, covar, mtitle);
fprintf('\n');
disp('Robust estimates of theta');
disp(theta);

function [userp, u, w] = ucv(t, userp)
cu = userp(1);
u = 1.0;
if (t ~= 0)
t2 = t*t;
if (t2 > cu)
u = cu/t2;
end
end
% w function and derivative
cw = userp(2);
if (t > cw)
w = cw/t;
else
w = 1.0;
end
```
```g02hm example results

iterations to convergence =   34

Robust covariance matrix
1          2          3
1      3.2779    -3.6918     4.7391
2                 5.2841    -6.4087
3                           11.8373

Robust estimates of theta
5.6998
3.8636
14.7036

```