hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_univar_outlier_peirce_1var (g07ga)

Purpose

nag_univar_outlier_peirce_1var (g07ga) identifies outlying values using Peirce's criterion.

Syntax

[iout, niout, diff, llamb, ifail] = g07ga(p, y, ldiff, 'n', n, 'mean', mean, 'var', var)
[iout, niout, diff, llamb, ifail] = nag_univar_outlier_peirce_1var(p, y, ldiff, 'n', n, 'mean', mean, 'var', var)

Description

nag_univar_outlier_peirce_1var (g07ga) flags outlying values in data using Peirce's criterion. Let
Peirce's method flags yiyi as a potential outlier if |yiμ|x|yi-μ|x, where x = σ2zx=σ2z and zz is obtained from the solution of
Rm = λmn ( mm (nm)nm )/(nn)
Rm = λ m-n mm (n-m) n-m nn
(1)
where
R = 2 exp((( z2 1 )/2)(1Φ(z)))
R = 2 exp( ( z2 - 1 2 ) ( 1- Φ(z) ) )
(2)
and ΦΦ is the cumulative distribution function for the standard Normal distribution.
As σ̃2σ~2 is unknown an assumption is made that the relationship between σ̃2σ~2 and σ2σ2, hence λλ, depends only on the sum of squares of the rejected observations and the ratio estimated as
λ2 = ( npm z2 )/(npm)
λ2 = n-p-m z2 n-p-m
which gives
z2 = 1 + (npm)/m (1λ2)
z2 = 1+ n-p-m m ( 1-λ2 )
(3)
A value for the cutoff xx is calculated iteratively. An initial value of R = 0.2R=0.2 is used and a value of λλ is estimated using equation (1). Equation (3) is then used to obtain an estimate of zz and then equation (2) is used to get a new estimate for RR. This process is then repeated until the relative change in zz between consecutive iterations is sqrt(ε)ε, where εε is machine precision.
By construction, the cutoff for testing for m + 1m+1 potential outliers is less than the cutoff for testing for mm potential outliers. Therefore Peirce's criterion is used in sequence with the existence of a single potential outlier being investigated first. If one is found, the existence of two potential outliers is investigated etc.
If one of a duplicate series of observations is flagged as an outlier, then all of them are flagged as outliers.

References

Gould B A (1855) On Peirce's criterion for the rejection of doubtful observations, with tables for facilitating its application The Astronomical Journal 45
Peirce B (1852) Criterion for the rejection of doubtful observations The Astronomical Journal 45

Parameters

Compulsory Input Parameters

1:     p – int64int32nag_int scalar
pp, the number of parameters in the model used in obtaining the yy. If yy is an observed set of values, as opposed to the residuals from fitting a model with pp parameters, then pp should be set to 11, i.e., as if a model just containing the mean had been used.
Constraint: 1pn21pn-2.
2:     y(n) – double array
n, the dimension of the array, must satisfy the constraint n3n3.
yy, the data being tested.
3:     ldiff – int64int32nag_int scalar
The maximum number of values to be returned in arrays diff and llamb.
If ldiff0ldiff0, arrays diff and llamb are not referenced.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the array y.
nn, the number of observations.
Constraint: n3n3.
2:     mean – double scalar
If var > 0.0var>0.0, mean must contain μμ, the mean of yy, otherwise mean is not referenced and the mean is calculated from the data supplied in y.
Default: 0.00.0
3:     var – double scalar
If var > 0.0var>0.0, var must contain σ2σ2, the variance of yy, otherwise the variance is calculated from the data supplied in y.
Default: 0.00.0

Input Parameters Omitted from the MATLAB Interface

None.

Output Parameters

1:     iout(n) – int64int32nag_int array
The indices of the values in y sorted in descending order of the absolute difference from the mean, therefore |y(iout(i1))μ| |y(iout(i))μ| | y iouti-1 - μ | | y iouti - μ | , for i = 2,3,,ni=2,3,,n.
2:     niout – int64int32nag_int scalar
The number of potential outliers. The indices for these potential outliers are held in the first niout elements of iout. By construction there can be at most np1n-p-1 values flagged as outliers.
3:     diff(ldiff ) – double array
diff(i)diffi holds |yμ|σ2z|y-μ|-σ2z for observation y(iout(i))yiouti, for i = 1,2,,min (ldiff,niout + 1,np1)i=1,2,,min(ldiff,niout+1,n-p-1).
4:     llamb(ldiff) – double array
llamb(i)llambi holds log(λ2)log(λ2) for observation y(iout(i))yiouti, for i = 1,2,,min (ldiff,niout + 1,np1)i=1,2,,min(ldiff,niout+1,n-p-1).
5:     ifail – int64int32nag_int scalar
ifail = 0ifail=0 unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
  ifail = 1ifail=1
Constraint: n3n3.
  ifail = 2ifail=2
Constraint: 1pn21pn-2.

Accuracy

Not applicable.

Further Comments

One problem with Peirce's algorithm as implemented in nag_univar_outlier_peirce_1var (g07ga) is the assumed relationship between σ2σ2, the variance using the full dataset, and σ̃2σ~2, the variance with the potential outliers removed. In some cases, for example if the data yy were the residuals from a linear regression, this assumption may not hold as the regression line may change significantly when outlying values have been dropped resulting in a radically different set of residuals. In such cases nag_univar_outlier_peirce_2var (g07gb) should be used instead.

Example

function nag_univar_outlier_peirce_1var_example
y = [-0.30; 0.48; 0.63; -0.22; 0.18; -0.44; -0.24; -0.13; -0.05; 0.39; ...
      1.01; 0.06; -1.40; 0.20; 0.10];
p = int64(2);
ldiff = int64(1);

% Get a list of potential outliers
[iout, niout, dif, llamb, ifail] = nag_univar_outlier_peirce_1var(p, y, ldiff);

% Display results
if ifail == 0
  if ldiff > 0
    fprintf('\n  No.  Index    Value       Diff    ln(lambda^2)\n');
  else
    fprintf('\n  No.  Index    Value\n');
  end
  for i=1:double(niout)
    if i > ldiff
      fprintf(' %4d %4d %10.2f\n', i, iout(i), y(iout(i)));
    else
      fprintf(' %4d %4d %10.2f %10.2f %10.2f\n', ...
              i, iout(i), y(iout(i)), dif(i), llamb(i));
    end
  end
end
 

  No.  Index    Value       Diff    ln(lambda^2)
    1   13      -1.40       0.31      -0.30
    2   11       1.01

function g07ga_example
y = [-0.30; 0.48; 0.63; -0.22; 0.18; -0.44; -0.24; -0.13; -0.05; 0.39; ...
      1.01; 0.06; -1.40; 0.20; 0.10];
p = int64(2);
ldiff = int64(1);

% Get a list of potential outliers
[iout, niout, dif, llamb, ifail] = g07ga(p, y, ldiff);

% Display results
if ifail == 0
  if ldiff > 0
    fprintf('\n  No.  Index    Value       Diff    ln(lambda^2)\n');
  else
    fprintf('\n  No.  Index    Value\n');
  end
  for i=1:double(niout)
    if i > ldiff
      fprintf(' %4d %4d %10.2f\n', i, iout(i), y(iout(i)));
    else
      fprintf(' %4d %4d %10.2f %10.2f %10.2f\n', ...
              i, iout(i), y(iout(i)), dif(i), llamb(i));
    end
  end
end
 

  No.  Index    Value       Diff    ln(lambda^2)
    1   13      -1.40       0.31      -0.30
    2   11       1.01


PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2013