Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)

# NAG Toolbox: nag_correg_robustm_corr_user (g02hm)

## Purpose

nag_correg_robustm_corr_user (g02hm) computes a robust estimate of the covariance matrix for user-supplied weight functions. The derivatives of the weight functions are not required.

## Syntax

[user, cov, a, wt, theta, nit, ifail] = g02hm(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
[user, cov, a, wt, theta, nit, ifail] = nag_correg_robustm_corr_user(ucv, indm, x, a, theta, 'user', user, 'n', n, 'm', m, 'bl', bl, 'bd', bd, 'maxit', maxit, 'nitmon', nitmon, 'tol', tol)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 22: n has been made optional
Mark 23: nitmon, tol now optional
.

## Description

For a set of n$n$ observations on m$m$ variables in a matrix X$X$, a robust estimate of the covariance matrix, C$C$, and a robust estimate of location, θ$\theta$, are given by
 C = τ2(ATA) − 1, $C=τ2 (ATA) -1,$
where τ2${\tau }^{2}$ is a correction factor and A$A$ is a lower triangular matrix found as the solution to the following equations.
 zi = A(xi − θ) $zi=A (xi-θ)$
 n 1/n ∑ w(‖zi‖2)zi = 0 i = 1
$1n ∑i= 1nw ( ‖zi‖ 2) zi=0$
and
 n 1/n ∑ u(‖zi‖2)ziziT − v(‖zi‖2)I = 0, i = 1
$1n∑i=1nu ( ‖zi‖ 2) zi ziT -v ( ‖zi‖ 2) I=0,$
 where xi${x}_{i}$ is a vector of length m$m$ containing the elements of the i$i$th row of X$X$, zi${z}_{i}$ is a vector of length m$m$, I$I$ is the identity matrix and 0$0$ is the zero matrix. and w$w$ and u$u$ are suitable functions.
nag_correg_robustm_corr_user (g02hm) covers two situations:
 (i) v(t) = 1$v\left(t\right)=1$ for all t$t$, (ii) v(t) = u(t)$v\left(t\right)=u\left(t\right)$.
The robust covariance matrix may be calculated from a weighted sum of squares and cross-products matrix about θ$\theta$ using weights wti = u(zi)${\mathit{wt}}_{i}=u\left(‖{z}_{i}‖\right)$. In case (i) a divisor of n$n$ is used and in case (ii) a divisor of i = 1nwti$\sum _{i=1}^{n}{\mathit{wt}}_{i}$ is used. If w( . ) = sqrt(u( . ))$w\left(.\right)=\sqrt{u\left(.\right)}$, then the robust covariance matrix can be calculated by scaling each row of X$X$ by sqrt(wti)$\sqrt{{\mathit{wt}}_{i}}$ and calculating an unweighted covariance matrix about θ$\theta$.
In order to make the estimate asymptotically unbiased under a Normal model a correction factor, τ2${\tau }^{2}$, is needed. The value of the correction factor will depend on the functions employed (see Huber (1981) and Marazzi (1987)).
nag_correg_robustm_corr_user (g02hm) finds A$A$ using the iterative procedure as given by Huber; see Huber (1981).
 Ak = (Sk + I)Ak − 1 $Ak= (Sk+I) Ak-1$
and
 θjk = (bj)/(D1) + θjk − 1, $θjk=bjD1+θjk- 1,$
where Sk = (sjl)${S}_{k}=\left({s}_{jl}\right)$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$ and l = 1,2,,m$\mathit{l}=1,2,\dots ,m$ is a lower triangular matrix such that
sjl =
 { − min [max (hjl / D2, − BL),BL], j > l − min [max ((1/2)(hjj / D2 − 1), − BD),BD], j = l
,
$sjl= { -min [max (hjl/D2,-BL) ,BL] , j>l -min [max (12 (hjj/D2-1) ,-BD) ,BD] , j=l ,$
where
• D1 = i = 1nw(zi2)${D}_{1}=\sum _{i=1}^{n}w\left({‖{z}_{i}‖}_{2}\right)$
• D2 = i = 1nu(zi2)${D}_{2}=\sum _{i=1}^{n}u\left({‖{z}_{i}‖}_{2}\right)$
• hjl = i = 1nu(zi2)zijzil${h}_{jl}=\sum _{i=1}^{n}u\left({‖{z}_{i}‖}_{2}\right){z}_{ij}{z}_{il}$, for jl$j\ge l$
• bj = i = 1nw(zi2)(xijbj)${b}_{j}=\sum _{i=1}^{n}w\left({‖{z}_{i}‖}_{2}\right)\left({x}_{ij}-{b}_{j}\right)$
and BD$\mathit{BD}$ and BL$\mathit{BL}$ are suitable bounds.
The value of τ$\tau$ may be chosen so that C$C$ is unbiased if the observations are from a given distribution.
nag_correg_robustm_corr_user (g02hm) is based on routines in ROBETH; see Marazzi (1987).

## References

Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Weights for bounded influence regression in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 3 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

## Parameters

### Compulsory Input Parameters

1:     ucv – function handle or string containing name of m-file
ucv must return the values of the functions u$u$ and w$w$ for a given value of its argument.
[user, u, w] = ucv(t, user)

Input Parameters

1:     t – double scalar
The argument for which the functions u$u$ and w$w$ must be evaluated.
2:     user – Any MATLAB object
ucv is called from nag_correg_robustm_corr_user (g02hm) with the object supplied to nag_correg_robustm_corr_user (g02hm).

Output Parameters

1:     user – Any MATLAB object
2:     u – double scalar
The value of the u$u$ function at the point t.
3:     w – double scalar
The value of the w$w$ function at the point t.
2:     indm – int64int32nag_int scalar
Indicates which form of the function v$v$ will be used.
indm = 1${\mathbf{indm}}=1$
v = 1$v=1$.
indm1${\mathbf{indm}}\ne 1$
v = u$v=u$.
3:     x(ldx,m) – double array
ldx, the first dimension of the array, must satisfy the constraint ldxn$\mathit{ldx}\ge {\mathbf{n}}$.
x(i,j)${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the i$\mathit{i}$th observation on the j$\mathit{j}$th variable, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$ and j = 1,2,,m$\mathit{j}=1,2,\dots ,m$.
4:     a(m × (m + 1) / 2${\mathbf{m}}×\left({\mathbf{m}}+1\right)/2$) – double array
An initial estimate of the lower triangular real matrix A$A$. Only the lower triangular elements must be given and these should be stored row-wise in the array.
The diagonal elements must be 0$\text{}\ne 0$, and in practice will usually be > 0$\text{}>0$. If the magnitudes of the columns of X$X$ are of the same order, the identity matrix will often provide a suitable initial value for A$A$. If the columns of X$X$ are of different magnitudes, the diagonal elements of the initial value of A$A$ should be approximately inversely proportional to the magnitude of the columns of X$X$.
Constraint: a(j × (j1) / 2 + j)0.0${\mathbf{a}}\left(\mathit{j}×\left(\mathit{j}-1\right)/2+\mathit{j}\right)\ne 0.0$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$.
5:     theta(m) – double array
m, the dimension of the array, must satisfy the constraint 1mn$1\le {\mathbf{m}}\le {\mathbf{n}}$.
An initial estimate of the location parameter, θj${\theta }_{\mathit{j}}$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$.
In many cases an initial estimate of θj = 0${\theta }_{\mathit{j}}=0$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$, will be adequate. Alternatively medians may be used as given by nag_univar_robust_1var_median (g07da).

### Optional Input Parameters

1:     user – Any MATLAB object
user is not used by nag_correg_robustm_corr_user (g02hm), but is passed to ucv. Note that for large objects it may be more efficient to use a global variable which is accessible from the m-files than to use user.
2:     n – int64int32nag_int scalar
Default: The first dimension of the array x.
n$n$, the number of observations.
Constraint: n > 1${\mathbf{n}}>1$.
3:     m – int64int32nag_int scalar
Default: The dimension of the array theta and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
m$m$, the number of columns of the matrix X$X$, i.e., number of independent variables.
Constraint: 1mn$1\le {\mathbf{m}}\le {\mathbf{n}}$.
4:     bl – double scalar
The magnitude of the bound for the off-diagonal elements of Sk${S}_{k}$, BL$BL$.
Default: 0.9$0.9$
Constraint: bl > 0.0${\mathbf{bl}}>0.0$.
5:     bd – double scalar
The magnitude of the bound for the diagonal elements of Sk${S}_{k}$, BD$BD$.
Default: 0.9$0.9$
Constraint: bd > 0.0${\mathbf{bd}}>0.0$.
6:     maxit – int64int32nag_int scalar
The maximum number of iterations that will be used during the calculation of A$A$.
Default: 150$150$
Constraint: maxit > 0${\mathbf{maxit}}>0$.
7:     nitmon – int64int32nag_int scalar
Indicates the amount of information on the iteration that is printed.
nitmon > 0${\mathbf{nitmon}}>0$
The value of A$A$, θ$\theta$ and δ$\delta$ (see Section [Accuracy]) will be printed at the first and every nitmon iterations.
nitmon0${\mathbf{nitmon}}\le 0$
No iteration monitoring is printed.
When printing occurs the output is directed to the current advisory message channel (See nag_file_set_unit_advisory (x04ab).)
Default: 0$0$
8:     tol – double scalar
The relative precision for the final estimate of the covariance matrix. Iteration will stop when maximum δ$\delta$ (see Section [Accuracy]) is less than tol.
Default: 5e-5$5e-5$
Constraint: tol > 0.0${\mathbf{tol}}>0.0$.

ruser ldx wk

### Output Parameters

1:     user – Any MATLAB object
2:     cov(m × (m + 1) / 2${\mathbf{m}}×\left({\mathbf{m}}+1\right)/2$) – double array
A robust estimate of the covariance matrix, C$C$. The upper triangular part of the matrix C$C$ is stored packed by columns (lower triangular stored by rows), that is Cij${C}_{ij}$ is returned in cov(j × (j1) / 2 + i)${\mathbf{cov}}\left(j×\left(j-1\right)/2+i\right)$, ij$i\le j$.
3:     a(m × (m + 1) / 2${\mathbf{m}}×\left({\mathbf{m}}+1\right)/2$) – double array
The lower triangular elements of the inverse of the matrix A$A$, stored row-wise.
4:     wt(n) – double array
wt(i)${\mathbf{wt}}\left(\mathit{i}\right)$ contains the weights, wti = u(zi2)${\mathit{wt}}_{\mathit{i}}=u\left({‖{z}_{\mathit{i}}‖}_{2}\right)$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
5:     theta(m) – double array
Contains the robust estimate of the location parameter, θj${\theta }_{\mathit{j}}$, for j = 1,2,,m$\mathit{j}=1,2,\dots ,m$.
6:     nit – int64int32nag_int scalar
The number of iterations performed.
7:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

ifail = 1${\mathbf{ifail}}=1$
 On entry, n ≤ 1${\mathbf{n}}\le 1$, or m < 1${\mathbf{m}}<1$, or n < m${\mathbf{n}}<{\mathbf{m}}$, or ldx < n$\mathit{ldx}<{\mathbf{n}}$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, tol ≤ 0.0${\mathbf{tol}}\le 0.0$, or maxit ≤ 0${\mathbf{maxit}}\le 0$, or diagonal element of a = 0.0${\mathbf{a}}=0.0$, or bl ≤ 0.0${\mathbf{bl}}\le 0.0$, or bd ≤ 0.0${\mathbf{bd}}\le 0.0$.
ifail = 3${\mathbf{ifail}}=3$
A column of x has a constant value.
ifail = 4${\mathbf{ifail}}=4$
Value of u or w returned by ucv < 0${\mathbf{ucv}}<0$.
ifail = 5${\mathbf{ifail}}=5$
The function has failed to converge in maxit iterations.
W ifail = 6${\mathbf{ifail}}=6$
Either the sum D1${D}_{1}$ or the sum D2${D}_{2}$ is zero. This may be caused by the functions u$u$ or w$w$ being too strict for the current estimate of A$A$ (or C$C$). You should either try a larger initial estimate of A$A$ or make the u$u$ and w$w$ functions less strict.

## Accuracy

On successful exit the accuracy of the results is related to the value of tol; see Section [Parameters]. At an iteration let
 (i) d1 = $d1=\text{}$ the maximum value of |sjl|$|{s}_{jl}|$ (ii) d2 = $d2=\text{}$ the maximum absolute change in wt(i)$wt\left(i\right)$ (iii) d3 = $d3=\text{}$ the maximum absolute relative change in θj${\theta }_{j}$
and let δ = max (d1,d2,d3)$\delta =\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(d1,d2,d3\right)$. Then the iterative procedure is assumed to have converged when δ < tol$\delta <{\mathbf{tol}}$

## Further Comments

The existence of A$A$ will depend upon the function u$u$ (see Marazzi (1987)); also if X$X$ is not of full rank a value of A$A$ will not be found. If the columns of X$X$ are almost linearly related, then convergence will be slow.
If derivatives of the u$u$ and w$w$ functions are available then the method used in nag_correg_robustm_corr_user_deriv (g02hl) will usually give much faster convergence.

## Example

```function nag_correg_robustm_corr_user_example
indm = int64(1);
x = [3.4, 6.9, 12.2;
6.4, 2.5, 15.1;
4.9, 5.5, 14.2;
7.3, 1.9, 18.2;
8.8, 3.6, 11.7;
8.4, 1.3, 17.9;
5.3, 3.1, 15;
2.7, 8.1, 7.7;
6.1, 3, 21.9;
5.3, 2.2, 13.9];
a = [1;
0;
1;
0;
0;
1];
theta = [0;
0;
0];
user = [4, 2];
[user, covar, aOut, wt, thetaOut, nit, ifail] = ...
nag_correg_robustm_corr_user(@ucv, indm, x, a, theta, 'user', user)

function [userp, u, w] = ucv(t, userp)
cu = userp(1);
u = 1.0;
if (t ~= 0)
t2 = t*t;
if (t2 > cu)
u = cu/t2;
end
end
% w function and derivative
cw = userp(2);
if (t > cw)
w = cw/t;
else
w = 1.0;
end
```
```

user =

4     2

covar =

3.2779
-3.6918
5.2841
4.7391
-6.4087
11.8373

aOut =

0.5523
1.0614
0.9424
-0.1880
0.4776
0.5021

wt =

1.0000
1.0000
1.0000
1.0000
0.2339
1.0000
1.0000
0.9385
0.4012
0.7579

thetaOut =

5.6998
3.8636
14.7036

nit =

34

ifail =

0

```
```function g02hm_example
indm = int64(1);
x = [3.4, 6.9, 12.2;
6.4, 2.5, 15.1;
4.9, 5.5, 14.2;
7.3, 1.9, 18.2;
8.8, 3.6, 11.7;
8.4, 1.3, 17.9;
5.3, 3.1, 15;
2.7, 8.1, 7.7;
6.1, 3, 21.9;
5.3, 2.2, 13.9];
a = [1;
0;
1;
0;
0;
1];
theta = [0;
0;
0];
user = [4, 2];
[user, covar, aOut, wt, thetaOut, nit, ifail] = ...
g02hm(@ucv, indm, x, a, theta, 'user', user)

function [userp, u, w] = ucv(t, userp)
cu = userp(1);
u = 1.0;
if (t ~= 0)
t2 = t*t;
if (t2 > cu)
u = cu/t2;
end
end
% w function and derivative
cw = userp(2);
if (t > cw)
w = cw/t;
else
w = 1.0;
end
```
```

user =

4     2

covar =

3.2779
-3.6918
5.2841
4.7391
-6.4087
11.8373

aOut =

0.5523
1.0614
0.9424
-0.1880
0.4776
0.5021

wt =

1.0000
1.0000
1.0000
1.0000
0.2339
1.0000
1.0000
0.9385
0.4012
0.7579

thetaOut =

5.6998
3.8636
14.7036

nit =

34

ifail =

0

```

PDF version (NAG web site, 64-bit version, 64-bit version)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013