hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_tsa_cp_pelt_user (g13nb)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_tsa_cp_pelt_user (g13nb) detects change points in a univariate time series, that is, the time points at which some feature of the data, for example the mean, changes. Change points are detected using the PELT (Pruned Exact Linear Time) algorithm using one of a provided set of cost function.

Syntax

[tau, user, ifail] = g13nb(n, beta, k, costfn, 'minss', minss, 'user', user)
[tau, user, ifail] = nag_tsa_cp_pelt_user(n, beta, k, costfn, 'minss', minss, 'user', user)

Description

Let y1:n=yj:j=1,2,,n denote a series of data and τ=τi:i=1,2,,m denote a set of m ordered (strictly monotonic increasing) indices known as change points with 1τin and τm=n. For ease of notation we also define τ0=0. The m change points, τ, split the data into m segments, with the ith segment being of length ni and containing yτi-1+1:τi.
Given a user-supplied cost function, Cyτi-1+1:τi nag_tsa_cp_pelt_user (g13nb) solves
minimize m,τ i=1 m Cyτi-1+1:τi + β (1)
where β is a penalty term used to control the number of change points. This minimization is performed using the PELT algorithm of Killick et al. (2012). The PELT algorithm is guaranteed to return the optimal solution to (1) if there exists a constant K such that
C y u+1 : v + C y v+1 : w + K C y u+1 : w (2)
for all u<v<w 

References

Chen J and Gupta A K (2010) Parametric Statistical Change Point Analysis With Applications to Genetics Medicine and Finance Second Edition Birkhäuser
Killick R, Fearnhead P and Eckely I A (2012) Optimal detection of changepoints with a linear computational cost Journal of the American Statistical Association 107:500 1590–1598

Parameters

Compulsory Input Parameters

1:     n int64int32nag_int scalar
n, the length of the time series.
Constraint: n2.
2:     beta – double scalar
β, the penalty term.
There are a number of standard ways of setting β, including:
SIC or BIC
β=p×logn 
AIC
β=2p 
Hannan-Quinn
β=2p×loglogn 
where p is the number of parameters being treated as estimated in each segment. The value of p will depend on the cost function being used.
If no penalty is required then set β=0. Generally, the smaller the value of β the larger the number of suggested change points.
3:     k – double scalar
K, the constant value that satisfies equation (2). If K exists, it is unlikely to be unique in such cases, it is recommened that the largest value of K, that satisfies equation (2), is chosen. No check is made that K is the correct value for the supplied cost function.
4:     costfn – function handle or string containing name of m-file
The cost function, C. costfn must calculate a vector of costs for a number of segments.
[c, user, info] = costfn(ts, r, user, info)

Input Parameters

1:     ts int64int32nag_int scalar
A reference time point.
2:     rnr int64int32nag_int array
Time points which, along with ts, define the segments being considered, 0rin for i=1,2,nr.
3:     user – Any MATLAB object
costfn is called from nag_tsa_cp_pelt_user (g13nb) with the object supplied to nag_tsa_cp_pelt_user (g13nb).
4:     info int64int32nag_int scalar
info=0.

Output Parameters

1:     cnr – double array
The cost function, C, with
ci= Cyri:t ​ if ​t>ri, Cyt:ri ​ otherwise.  
where t=ts and ri=ri.
It should be noted that if t>ri for any value of i then it will be true for all values of i. Therefore the inequality need only be tested once per call to costfn.
2:     user – Any MATLAB object
3:     info int64int32nag_int scalar
Set info to a nonzero value if you wish nag_tsa_cp_pelt_user (g13nb) to terminate with ifail=51.

Optional Input Parameters

1:     minss int64int32nag_int scalar
Default: 2
The minimum distance between two change points, that is τi-τi-1minss.
Constraint: minss2.
2:     user – Any MATLAB object
user is not used by nag_tsa_cp_pelt_user (g13nb), but is passed to costfn. Note that for large objects it may be more efficient to use a global variable which is accessible from the m-files than to use user.

Output Parameters

1:     tauntau int64int32nag_int array
The dimension of the array tau will be ntau
The location of the change points. The ith segment is defined by yτi-1+1 to yτi, where τ0=0 and τi=taui,1im.
2:     user – Any MATLAB object
3:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=11
Constraint: n2.
   ifail=31
Constraint: minss2.
   ifail=51
User requested termination.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

Not applicable.

Further Comments

nag_tsa_cp_pelt (g13na) performs the same calculations for a cost function selected from a provided set of cost functions. If the required cost function belongs to this provided set then nag_tsa_cp_pelt (g13na) can be used without the need to provide a cost function routine.

Example

This example identifies changes in the scale parameter, under the assumption that the data has a gamma distribution, for a simulated dataset with 100 observations. A penalty, β of 3.6 is used and the minimum segment size is set to 3. The shape parameter is fixed at 2.1 across the whole input series.
The cost function used is
Cyτi-1+1:τi = 2 a ni logSi - log a ni  
where a is a shape parameter that is fixed for all segments and ni=τi-τi-1+1.
function g13nb_example


fprintf('g13nb example results\n\n');

% Input series
y = [ 0.00; 0.78; 0.02; 0.17; 0.04; 1.23; 0.24; 1.70; 0.77; 0.06;
      0.67; 0.94; 1.99; 2.64; 2.26; 3.72; 3.14; 2.28; 3.78; 0.83;
      2.80; 1.66; 1.93; 2.71; 2.97; 3.04; 2.29; 3.71; 1.69; 2.76;
      1.96; 3.17; 1.04; 1.50; 1.12; 1.11; 1.00; 1.84; 1.78; 2.39;
      1.85; 0.62; 2.16; 0.78; 1.70; 0.63; 1.79; 1.21; 2.20; 1.34;
      0.04; 0.14; 2.78; 1.83; 0.98; 0.19; 0.57; 1.41; 2.05; 1.17;
      0.44; 2.32; 0.67; 0.73; 1.17; 0.34; 2.95; 1.08; 2.16; 2.27;
      0.14; 0.24; 0.27; 1.71; 0.04; 1.03; 0.12; 0.67; 1.15; 1.10;
      1.37; 0.59; 0.44; 0.63; 0.06; 0.62; 0.39; 2.63; 1.63; 0.42;
      0.73; 0.85; 0.26; 0.48; 0.26; 1.77; 1.53; 1.39; 1.68; 0.43];

% The cost function is a function of the sum of Y, so for
% efficiency we will calculate the cumulative sum
% It should be noted that this may introduce some rounding issues
% with very extreme data, we also pre-pend a value of 0
csy = [0.0; cumsum(y)];

% Shape parameter used in the cost function
a = 2.1;

% The value of K is defined by the cost function being used
% in this example a value of 0.0 is the required value
k = 0;

% The cumulative sum of the input series and shape parameter
% constitute the information that needs to be passed to the
% costfun, so pack them together into a cell array which will
% get passed through the NAG function
user = {csy; a};

% Length of the input series
n = int64(numel(y));

% Penalty term
beta = 3.4;

% Drop small regions
minss = int64(3);

[tau] = g13nb( ...
               n, beta, k, @costfn, 'minss', minss, 'user', user);

% Print the results
fprintf('  -- Change Points --\n');
fprintf('  Number     Position\n');
fprintf(' =====================\n');
for i = 1:numel(tau)
  fprintf(' %4d       %6d\n', i, tau(i));
end

% Plot the results
fig1 = figure;

% Plot the original series
plot(y,'Color','red');

% Mark the change points, drop the last one as it is always
% at the end of the series
xpos = transpose(double(tau(1:end-1))*ones(1,2));
ypos = diag(ylim)*ones(2,numel(tau)-1);
line(xpos,ypos,'Color','black');

% Add labels and titles
title({'{\bf g13nb Example Plot}',
      'Simulated time series and the corresponding changes in scale b',
      'assuming y ~ Ga(2.1,b)'});
xlabel('{\bf Time}');
ylabel('{\bf Value}');



function [c,user,info] = costfn(ts, r, user, info)
  % Cost function, C. This cost function is based on the likelihood of
  % the gamma distribution
  csy = user{1};
  a = user{2};

  % Only need to test which way around ts and r are once
  if (ts<r(1))
    si = csy(r+1) - csy(ts+1);
    dn = double(r - ts);
  else
    si = csy(ts+1) - csy(r+1);
    dn = double(ts - r);
  end
  c = (2*dn*a) .* (log(si) - log(dn*a));

  % Set info nonzero to terminate execution for any reason
  info = int64(0);
g13nb example results

  -- Change Points --
  Number     Position
 =====================
    1            5
    2           12
    3           32
    4           70
    5           73
    6          100
g13nb_fig1.png
This example plot shows the original data series and the estimated change points.

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015