hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_univar_outlier_peirce_1var (g07ga)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_univar_outlier_peirce_1var (g07ga) identifies outlying values using Peirce's criterion.

Syntax

[iout, niout, dif, llamb, ifail] = g07ga(p, y, ldiff, 'n', n, 'mean_p', mean_p, 'var', var)
[iout, niout, dif, llamb, ifail] = nag_univar_outlier_peirce_1var(p, y, ldiff, 'n', n, 'mean_p', mean_p, 'var', var)

Description

nag_univar_outlier_peirce_1var (g07ga) flags outlying values in data using Peirce's criterion. Let
Peirce's method flags yi as a potential outlier if yi-μx, where x=σ2z and z is obtained from the solution of
Rm = λ m-n mm n-m n-m nn (1)
where
R = 2 exp z2 - 1 2 1- Φz (2)
and Φ is the cumulative distribution function for the standard Normal distribution.
As σ~2 is unknown an assumption is made that the relationship between σ~2 and σ2, hence λ, depends only on the sum of squares of the rejected observations and the ratio estimated as
λ2 = n-p-m z2 n-p-m  
which gives
z2 = 1+ n-p-m m 1-λ2 (3)
A value for the cutoff x is calculated iteratively. An initial value of R=0.2 is used and a value of λ is estimated using equation (1). Equation (3) is then used to obtain an estimate of z and then equation (2) is used to get a new estimate for R. This process is then repeated until the relative change in z between consecutive iterations is ε, where ε is machine precision.
By construction, the cutoff for testing for m+1 potential outliers is less than the cutoff for testing for m potential outliers. Therefore Peirce's criterion is used in sequence with the existence of a single potential outlier being investigated first. If one is found, the existence of two potential outliers is investigated etc.
If one of a duplicate series of observations is flagged as an outlier, then all of them are flagged as outliers.

References

Gould B A (1855) On Peirce's criterion for the rejection of doubtful observations, with tables for facilitating its application The Astronomical Journal 45
Peirce B (1852) Criterion for the rejection of doubtful observations The Astronomical Journal 45

Parameters

Compulsory Input Parameters

1:     p int64int32nag_int scalar
p, the number of parameters in the model used in obtaining the y. If y is an observed set of values, as opposed to the residuals from fitting a model with p parameters, then p should be set to 1, i.e., as if a model just containing the mean had been used.
Constraint: 1pn-2.
2:     yn – double array
y, the data being tested.
3:     ldiff int64int32nag_int scalar
The maximum number of values to be returned in arrays dif and llamb.
If ldiff0, arrays dif and llamb are not referenced.

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the dimension of the array y.
n, the number of observations.
Constraint: n3.
2:     mean_p – double scalar
Default: 0.0
If var>0.0, mean_p must contain μ, the mean of y, otherwise mean_p is not referenced and the mean is calculated from the data supplied in y.
3:     var – double scalar
Default: 0.0
If var>0.0, var must contain σ2, the variance of y, otherwise the variance is calculated from the data supplied in y.

Output Parameters

1:     ioutn int64int32nag_int array
The indices of the values in y sorted in descending order of the absolute difference from the mean, therefore y iouti-1 - μ y iouti - μ , for i=2,3,,n.
2:     niout int64int32nag_int scalar
The number of potential outliers. The indices for these potential outliers are held in the first niout elements of iout. By construction there can be at most n-p-1 values flagged as outliers.
3:     difldiff – double array
difi holds y-μ-σ2z for observation yiouti, for i=1,2,,minldiff,niout+1,n-p-1.
4:     llambldiff – double array
llambi holds logλ2 for observation yiouti, for i=1,2,,minldiff,niout+1,n-p-1.
5:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=1
Constraint: n3.
   ifail=2
Constraint: 1pn-2.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

Not applicable.

Further Comments

One problem with Peirce's algorithm as implemented in nag_univar_outlier_peirce_1var (g07ga) is the assumed relationship between σ2, the variance using the full dataset, and σ~2, the variance with the potential outliers removed. In some cases, for example if the data y were the residuals from a linear regression, this assumption may not hold as the regression line may change significantly when outlying values have been dropped resulting in a radically different set of residuals. In such cases nag_univar_outlier_peirce_2var (g07gb) should be used instead.

Example

This example reads in a series of data and flags any potential outliers.
The dataset used is from Peirce's original paper and consists of fifteen observations on the vertical semidiameter of Venus.
function g07ga_example


fprintf('g07ga example results\n\n');

y = [-0.30;  0.48;  0.63; -0.22; 0.18;
     -0.44; -0.24; -0.13; -0.05; 0.39;
      1.01;  0.06; -1.40;  0.20; 0.10];

p     = int64(2);
ldiff = int64(1);

% Get a list of potential outliers
[iout, niout, dif, llamb, ifail] = ...
  g07ga(p, y, ldiff);

% Display results
fprintf('Number of potential outliers: %2d\n',niout);
fprintf('  No.  Index    Value');
if ldiff > 0
  fprintf('       Diff    ln(lambda^2)');
end
fprintf('\n');

for i=1:niout
  fprintf(' %4d %4d %10.2f', i, iout(i), y(iout(i)));
  if i <= ldiff
    fprintf(' %10.2f %10.2f', dif(i), llamb(i));
  end
  fprintf('\n');
end


g07ga example results

Number of potential outliers:  2
  No.  Index    Value       Diff    ln(lambda^2)
    1   13      -1.40       0.31      -0.30
    2   11       1.01

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015