NAG FL Interface
e02bef (dim1_spline_auto)
1
Purpose
e02bef computes a cubic spline approximation to an arbitrary set of data points. The knots of the spline are located automatically, but a single argument must be specified to control the tradeoff between closeness of fit and smoothness of fit.
2
Specification
Fortran Interface
Subroutine e02bef ( 
start, m, x, y, w, s, nest, n, lamda, c, fp, wrk, lwrk, iwrk, ifail) 
Integer, Intent (In) 
:: 
m, nest, lwrk 
Integer, Intent (Inout) 
:: 
n, iwrk(nest), ifail 
Real (Kind=nag_wp), Intent (In) 
:: 
x(m), y(m), w(m), s 
Real (Kind=nag_wp), Intent (Inout) 
:: 
lamda(nest), wrk(lwrk) 
Real (Kind=nag_wp), Intent (Out) 
:: 
c(nest), fp 
Character (1), Intent (In) 
:: 
start 

C Header Interface
#include <nag.h>
void 
e02bef_ (const char *start, const Integer *m, const double x[], const double y[], const double w[], const double *s, const Integer *nest, Integer *n, double lamda[], double c[], double *fp, double wrk[], const Integer *lwrk, Integer iwrk[], Integer *ifail, const Charlen length_start) 

C++ Header Interface
#include <nag.h> extern "C" {
void 
e02bef_ (const char *start, const Integer &m, const double x[], const double y[], const double w[], const double &s, const Integer &nest, Integer &n, double lamda[], double c[], double &fp, double wrk[], const Integer &lwrk, Integer iwrk[], Integer &ifail, const Charlen length_start) 
}

The routine may be called by the names e02bef or nagf_fit_dim1_spline_auto.
3
Description
e02bef determines a smooth cubic spline approximation $s\left(x\right)$ to the set of data points $\left({x}_{\mathit{r}},{y}_{\mathit{r}}\right)$, with weights ${w}_{\mathit{r}}$, for $\mathit{r}=1,2,\dots ,m$.
The spline is given in the Bspline representation
where
${N}_{i}\left(x\right)$ denotes the normalized cubic Bspline defined upon the knots
${\lambda}_{i},{\lambda}_{i+1},\dots ,{\lambda}_{i+4}$.
The total number $n$ of these knots and their values ${\lambda}_{1},\dots ,{\lambda}_{n}$ are chosen automatically by the routine. The knots ${\lambda}_{5},\dots ,{\lambda}_{n4}$ are the interior knots; they divide the approximation interval $\left[{x}_{1},{x}_{m}\right]$ into $n7$ subintervals. The coefficients ${c}_{1},{c}_{2},\dots ,{c}_{n4}$ are then determined as the solution of the following constrained minimization problem:
minimize
subject to the constraint
where 
${\delta}_{i}$ 
stands for the discontinuity jump in the third order derivative of $s\left(x\right)$ at the interior knot ${\lambda}_{i}$, 

${\epsilon}_{r}$ 
denotes the weighted residual ${w}_{r}\left({y}_{r}s\left({x}_{r}\right)\right)$, 
and 
$S$ 
is a nonnegative number to be specified by you. 
The quantity
$\eta $ can be seen as a measure of the (lack of) smoothness of
$s\left(x\right)$, while closeness of fit is measured through
$\theta $. By means of the argument
$S$, ‘the smoothing factor’, you can then control the balance between these two (usually conflicting) properties. If
$S$ is too large, the spline will be too smooth and signal will be lost (underfit); if
$S$ is too small, the spline will pick up too much noise (overfit). In the extreme cases the routine will return an interpolating spline
$\left(\theta =0\right)$ if
$S$ is set to zero, and the weighted least squares cubic polynomial
$\left(\eta =0\right)$ if
$S$ is set very large. Experimenting with
$S$ values between these two extremes should result in a good compromise. (See
Section 9.2 for advice on choice of
$S$.)
The method employed is outlined in
Section 9.3 and fully described in
Dierckx (1975),
Dierckx (1981) and
Dierckx (1982). It involves an adaptive strategy for locating the knots of the cubic spline (depending on the function underlying the data and on the value of
$S$), and an iterative method for solving the constrained minimization problem once the knots have been determined.
Values of the computed spline, or of its derivatives or definite integral, can subsequently be computed by calling
e02bbf,
e02bcf or
e02bdf, as described in
Section 9.4.
4
References
Dierckx P (1975) An algorithm for smoothing, differentiating and integration of experimental data using spline functions J. Comput. Appl. Math. 1 165–184
Dierckx P (1981) An improved algorithm for curve fitting with spline functions Report TW54 Department of Computer Science, Katholieke Univerciteit Leuven
Dierckx P (1982) A fast algorithm for smoothing data on a rectangular grid while using spline functions SIAM J. Numer. Anal. 19 1286–1304
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183
5
Arguments

1:
$\mathbf{start}$ – Character(1)
Input

On entry: must be set to 'C' or 'W'.
 ${\mathbf{start}}=\text{'C'}$
 The routine will build up the knot set starting with no interior knots. No values need be assigned to the arguments n, lamda, wrk or iwrk.
 ${\mathbf{start}}=\text{'W'}$
 The routine will restart the knotplacing strategy using the knots found in a previous call of the routine. In this case, the arguments n, lamda, wrk, and iwrk must be unchanged from that previous call. This warm start can save much time in searching for a satisfactory value of ${\mathbf{s}}$.
Constraint:
${\mathbf{start}}=\text{'C'}$ or $\text{'W'}$.

2:
$\mathbf{m}$ – Integer
Input

On entry: $m$, the number of data points.
Constraint:
${\mathbf{m}}\ge 4$.

3:
$\mathbf{x}\left({\mathbf{m}}\right)$ – Real (Kind=nag_wp) array
Input

On entry: the values
${x}_{\mathit{r}}$ of the independent variable (abscissa) $x$, for $\mathit{r}=1,2,\dots ,m$.
Constraint:
${x}_{1}<{x}_{2}<\cdots <{x}_{m}$.

4:
$\mathbf{y}\left({\mathbf{m}}\right)$ – Real (Kind=nag_wp) array
Input

On entry: the values
${y}_{\mathit{r}}$ of the dependent variable (ordinate) $y$, for $\mathit{r}=1,2,\dots ,m$.

5:
$\mathbf{w}\left({\mathbf{m}}\right)$ – Real (Kind=nag_wp) array
Input

On entry: the values
${w}_{\mathit{r}}$ of the weights, for
$\mathit{r}=1,2,\dots ,m$. For advice on the choice of weights, see
Section 2.1.2 in the
E02 Chapter Introduction.
Constraint:
${\mathbf{w}}\left(\mathit{r}\right)>0.0$, for $\mathit{r}=1,2,\dots ,m$.

6:
$\mathbf{s}$ – Real (Kind=nag_wp)
Input

On entry: the smoothing factor,
$S$.
If $S=0.0$, the routine returns an interpolating spline.
If $S$ is smaller than machine precision, it is assumed equal to zero.
For advice on the choice of
$S$, see
Sections 3 and
9.2.
Constraint:
${\mathbf{s}}\ge 0.0$.

7:
$\mathbf{nest}$ – Integer
Input

On entry: an overestimate for the number, $n$, of knots required.
Constraint:
${\mathbf{nest}}\ge 8$. In most practical situations,
${\mathbf{nest}}={\mathbf{m}}/2$ is sufficient.
nest never needs to be larger than
${\mathbf{m}}+4$, the number of knots needed for interpolation
$\left({\mathbf{s}}=0.0\right)$.

8:
$\mathbf{n}$ – Integer
Input/Output

On entry: if the warm start option is used, the value of
n must be left unchanged from the previous call.
On exit: the total number, $n$, of knots of the computed spline.

9:
$\mathbf{lamda}\left({\mathbf{nest}}\right)$ – Real (Kind=nag_wp) array
Input/Output

On entry: if the warm start option is used, the values ${\mathbf{lamda}}\left(1\right),{\mathbf{lamda}}\left(2\right),\dots ,{\mathbf{lamda}}\left({\mathbf{n}}\right)$ must be left unchanged from the previous call.
On exit: the knots of the spline, i.e., the positions of the interior knots
${\mathbf{lamda}}\left(5\right),\phantom{\rule{0ex}{0ex}}{\mathbf{lamda}}\left(6\right),\dots ,{\mathbf{lamda}}\left({\mathbf{n}}4\right)$ as well as the positions of the additional knots
and
needed for the Bspline representation.

10:
$\mathbf{c}\left({\mathbf{nest}}\right)$ – Real (Kind=nag_wp) array
Output

On exit: the coefficient
${c}_{\mathit{i}}$ of the Bspline ${N}_{\mathit{i}}\left(x\right)$ in the spline approximation $s\left(x\right)$, for $\mathit{i}=1,2,\dots ,n4$.

11:
$\mathbf{fp}$ – Real (Kind=nag_wp)
Output

On exit: the sum of the squared weighted residuals,
$\theta $, of the computed spline approximation. If
${\mathbf{fp}}=0.0$, this is an interpolating spline.
fp should equal
${\mathbf{s}}$ within a relative tolerance of
$0.001$ unless
$n=8$ when the spline has no interior knots and so is simply a cubic polynomial. For knots to be inserted,
${\mathbf{s}}$ must be set to a value below the value of
fp produced in this case.

12:
$\mathbf{wrk}\left({\mathbf{lwrk}}\right)$ – Real (Kind=nag_wp) array
Communication Array

If the warm start option is used on entry, the values ${\mathbf{wrk}}\left(1\right),\dots ,{\mathbf{wrk}}\left(n\right)$ must be left unchanged from the previous call.

13:
$\mathbf{lwrk}$ – Integer
Input

On entry: the dimension of the array
wrk as declared in the (sub)program from which
e02bef is called.
Constraint:
${\mathbf{lwrk}}\ge 4\times {\mathbf{m}}+16\times {\mathbf{nest}}+41$.

14:
$\mathbf{iwrk}\left({\mathbf{nest}}\right)$ – Integer array
Communication Array

If the warm start option is used, on entry, the values ${\mathbf{iwrk}}\left(1\right),\dots ,{\mathbf{iwrk}}\left(n\right)$ must be left unchanged from the previous call.
This array is used as workspace.

15:
$\mathbf{ifail}$ – Integer
Input/Output

On entry:
ifail must be set to
$0$,
$1\text{or}1$. If you are unfamiliar with this argument you should refer to
Section 4 in the Introduction to the NAG Library FL Interface for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this argument, the recommended value is
$0$.
When the value $\mathbf{1}\text{or}\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit:
${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
6
Error Indicators and Warnings
If on entry
${\mathbf{ifail}}=0$ or
$1$, explanatory error messages are output on the current error message unit (as defined by
x04aaf).
Errors or warnings detected by the routine:
 ${\mathbf{ifail}}=1$

On entry, ${\mathbf{lwrk}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{lwrk}}\ge \u2329\mathit{\text{value}}\u232a$.
On entry, ${\mathbf{m}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{m}}\ge 4$.
On entry, ${\mathbf{nest}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nest}}\ge 8$.
On entry, ${\mathbf{nest}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{nest}}\ge \u2329\mathit{\text{value}}\u232a$ when ${\mathbf{s}}=0.0$.
On entry, $r=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{w}}\left(r\right)=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{w}}\left(r\right)>0.0$ for all $r$.
On entry, ${\mathbf{s}}=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{s}}\ge 0.0$.
On entry, ${\mathbf{start}}\ne \text{'W'}$ or $\text{'C'}$: ${\mathbf{start}}=\u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=2$

On entry, $r=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{w}}\left(r\right)=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{w}}\left(r\right)>0.0$ for all $r$.
 ${\mathbf{ifail}}=3$

On entry, $r=\u2329\mathit{\text{value}}\u232a$, ${\mathbf{x}}\left(r1\right)=\u2329\mathit{\text{value}}\u232a$ and ${\mathbf{x}}\left(r\right)=\u2329\mathit{\text{value}}\u232a$.
Constraint: ${\mathbf{x}}\left(r1\right)<{\mathbf{x}}\left(r\right)$ for all $r$.
 ${\mathbf{ifail}}=4$

The number of knots needed is greater than
nest:
${\mathbf{nest}}=\u2329\mathit{\text{value}}\u232a$. Possibly
s is too small:
${\mathbf{s}}=\u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=5$

The iterative process has failed to converge. Possibly
s is too small:
${\mathbf{s}}=\u2329\mathit{\text{value}}\u232a$.
 ${\mathbf{ifail}}=99$
An unexpected error has been triggered by this routine. Please
contact
NAG.
See
Section 7 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=399$
Your licence key may have expired or may not have been installed correctly.
See
Section 8 in the Introduction to the NAG Library FL Interface for further information.
 ${\mathbf{ifail}}=999$
Dynamic memory allocation failed.
See
Section 9 in the Introduction to the NAG Library FL Interface for further information.
If
${\mathbf{ifail}}={\mathbf{4}}$ or
${\mathbf{5}}$, a spline approximation is returned, but it fails to satisfy the fitting criterion (see
(2) and
(3)) – perhaps by only a small amount, however.
7
Accuracy
On successful exit, the approximation returned is such that its weighted sum of squared residuals
$\theta $ (as in
(3)) is equal to the smoothing factor
$S$, up to a specified relative tolerance of
$0.001$ – except that if
$n=8$,
$\theta $ may be significantly less than
$S$: in this case the computed spline is simply a weighted least squares polynomial approximation of degree
$3$, i.e., a spline with no interior knots.
8
Parallelism and Performance
e02bef makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the
X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the
Users' Note for your implementation for any additional implementationspecific information.
9.1
Timing
The time taken for a call of e02bef depends on the complexity of the shape of the data, the value of the smoothing factor $S$, and the number of data points. If e02bef is to be called for different values of $S$, much time can be saved by setting ${\mathbf{start}}=\text{'W'}$ after the first call.
9.2
Choice of S
If the weights have been correctly chosen (see
Section 2.1.2 in the
E02 Chapter Introduction), the standard deviation of
${w}_{r}{y}_{r}$ would be the same for all
$r$, equal to
$\sigma $, say. In this case, choosing the smoothing factor
$S$ in the range
${\sigma}^{2}\left(m\pm \sqrt{2m}\right)$, as suggested by
Reinsch (1967), is likely to give a good start in the search for a satisfactory value. Otherwise, experimenting with different values of
$S$ will be required from the start, taking account of the remarks in
Section 3.
In that case, in view of computation time and memory requirements, it is recommended to start with a very large value for
$S$ and so determine the least squares cubic polynomial; the value returned in
fp, call it
${\theta}_{0}$, gives an upper bound for
$S$. Then progressively decrease the value of
$S$ to obtain closer fits – say by a factor of
$10$ in the beginning, i.e.,
$S={\theta}_{0}/10$,
$S={\theta}_{0}/100$, and so on, and more carefully as the approximation shows more details.
The number of knots of the spline returned, and their location, generally depend on the value of $S$ and on the behaviour of the function underlying the data. However, if e02bef is called with ${\mathbf{start}}=\text{'W'}$, the knots returned may also depend on the smoothing factors of the previous calls. Therefore if, after a number of trials with different values of $S$ and ${\mathbf{start}}=\text{'W'}$, a fit can finally be accepted as satisfactory, it may be worthwhile to call e02bef once more with the selected value for $S$ but now using ${\mathbf{start}}=\text{'C'}$. Often, e02bef then returns an approximation with the same quality of fit but with fewer knots, which is therefore better if data reduction is also important.
9.3
Outline of Method Used
If
$S=0$, the requisite number of knots is known in advance, i.e.,
$n=m+4$; the interior knots are located immediately as
${\lambda}_{\mathit{i}}={x}_{\mathit{i}2}$, for
$\mathit{i}=5,6,\dots ,n4$. The corresponding least squares spline (see
e02baf) is then an interpolating spline and therefore a solution of the problem.
If
$S>0$, a suitable knot set is built up in stages (starting with no interior knots in the case of a cold start but with the knot set found in a previous call if a warm start is chosen). At each stage, a spline is fitted to the data by least squares (see
e02baf) and
$\theta $, the weighted sum of squares of residuals, is computed. If
$\theta >S$, new knots are added to the knot set to reduce
$\theta $ at the next stage. The new knots are located in intervals where the fit is particularly poor, their number depending on the value of
$S$ and on the progress made so far in reducing
$\theta $. Sooner or later, we find that
$\theta \le S$ and at that point the knot set is accepted. The routine then goes on to compute the (unique) spline which has this knot set and which satisfies the full fitting criterion specified by
(2) and
(3). The theoretical solution has
$\theta =S$. The routine computes the spline by an iterative scheme which is ended when
$\theta =S$ within a relative tolerance of
$0.001$. The main part of each iteration consists of a linear least squares computation of special form, done in a similarly stable and efficient manner as in
e02baf.
An exception occurs when the routine finds at the start that, even with no interior knots $\left(n=8\right)$, the least squares spline already has its weighted sum of squares of residuals $\text{}\le S$. In this case, since this spline (which is simply a cubic polynomial) also has an optimal value for the smoothness measure $\eta $, namely zero, it is returned at once as the (trivial) solution. It will usually mean that $S$ has been chosen too large.
For further details of the algorithm and its use, see
Dierckx (1981).
9.4
Evaluation of Computed Spline
The value of the computed spline at a given value
x may be obtained in the real variable
s by the call:
Call e02bbf(n,lamda,c,x,s,ifail)
where
n,
lamda and
c are the output arguments of
e02bef.
The values of the spline and its first three derivatives at a given value
x may be obtained in the real array
s of dimension at least
$4$ by the call:
Call e02bcf(n,lamda,c,x,left,s,ifail)
where if
${\mathbf{left}}=1$, lefthand derivatives are computed and if
${\mathbf{left}}\ne 1$, righthand derivatives are calculated. The value of
left is only relevant if
x is an interior knot (see
e02bcf).
The value of the definite integral of the spline over the interval
${\mathbf{x}}\left(1\right)$ to
${\mathbf{x}}\left({\mathbf{m}}\right)$ can be obtained in the real variable
dint by the call:
Call e02bdf(n,lamda,c,dint,ifail)
(see
e02bdf).
10
Example
This example reads in a set of data values, followed by a set of values of ${\mathbf{s}}$. For each value of ${\mathbf{s}}$ it calls e02bef to compute a spline approximation, and prints the values of the knots and the Bspline coefficients ${c}_{i}$.
The program includes code to evaluate the computed splines, by calls to
e02bbf, at the points
${x}_{r}$ and at points midway between them. These values are not printed out, however; instead the results are illustrated by plots of the computed splines, together with the data points (indicated by
$\times $) and the positions of the knots (indicated by vertical lines): the effect of decreasing
${\mathbf{s}}$ can be clearly seen.
10.1
Program Text
10.2
Program Data
10.3
Program Results