NAG CL Interface
e02bac (dim1_spline_knots)
1
Purpose
e02bac computes a weighted least squares approximation to an arbitrary set of data points by a cubic spline with knots prescribed by you. Cubic spline interpolation can also be carried out.
2
Specification
void 
e02bac (Integer m,
const double x[],
const double y[],
const double weights[],
double *ss,
Nag_Spline *spline,
NagError *fail) 

The function may be called by the names: e02bac, nag_fit_dim1_spline_knots or nag_1d_spline_fit_knots.
3
Description
e02bac determines a least squares cubic spline approximation $s\left(x\right)$ to the set of data points $\left({x}_{\mathit{r}},{y}_{\mathit{r}}\right)$ with weights ${w}_{\mathit{r}}$, for $\mathit{r}=1,2,\dots ,m$. The value of $\mathbf{spline}\mathbf{\to}\mathbf{n}=\overline{n}+7$, where $\overline{n}$ is the number of intervals of the spline (one greater than the number of interior knots), and the values of the knots ${\lambda}_{5},{\lambda}_{6},\dots ,{\lambda}_{\overline{n}+3}$, interior to the data interval, are prescribed by you.
$s\left(x\right)$ has the property that it minimizes
$\theta $, the sum of squares of the weighted residuals
${\epsilon}_{\mathit{r}}$, for
$\mathit{r}=1,2,\dots ,m$, where
The function produces this minimizing value of
$\theta $ and the coefficients
${c}_{1},{c}_{2},\dots ,{c}_{q}$, where
$q=\overline{n}+3$, in the Bspline representation
Here
${N}_{i}\left(x\right)$ denotes the normalized Bspline of degree 3 defined upon the knots
${\lambda}_{i},{\lambda}_{i+1},\dots ,{\lambda}_{i+4}$.
In order to define the full set of Bsplines required, eight additional knots ${\lambda}_{1},{\lambda}_{2},{\lambda}_{3},{\lambda}_{4}$ and ${\lambda}_{\overline{n}+4},{\lambda}_{\overline{n}+5}$, ${\lambda}_{\overline{n}+6},{\lambda}_{\overline{n}+7}$ are inserted automatically by the function. The first four of these are set equal to the smallest ${x}_{r}$ and the last four to the largest ${x}_{r}$.
The representation of $s\left(x\right)$ in terms of Bsplines is the most compact form possible in that only $\overline{n}+3$ coefficients, in addition to the $\overline{n}+7$ knots, fully define $s\left(x\right)$.
The method employed involves forming and then computing the least squares solution of a set of
$m$ linear equations in the coefficients
${c}_{i}\left(i=1,2,\dots ,\overline{n}+3\right)$. The equations are formed using a recurrence relation for Bsplines that is unconditionally stable (
Cox (1972),
de Boor (1972)), even for multiple (coincident) knots. The least squares solution is also obtained in a stable manner by using orthogonal transformations, viz. a variant of Givens rotations (
Gentleman (1974) and
Gentleman (1973)). This requires only one equation to be stored at a time. Full advantage is taken of the structure of the equations, there being at most four nonzero values of
${N}_{i}\left(x\right)$ for any value of
$x$ and hence at most four coefficients in each equation.
For further details of the algorithm and its use see
Cox (1974),
Cox (1975) and
Cox and Hayes (1973).
Subsequent evaluation of
$s\left(x\right)$ from its Bspline representation may be carried out using
e02bbc. If derivatives of
$s\left(x\right)$ are also required,
e02bcc may be used.
e02bdc can be used to compute the definite integral of
$s\left(x\right)$.
4
References
Cox M G (1972) The numerical evaluation of Bsplines J. Inst. Math. Appl. 10 134–149
Cox M G (1974) A datafitting package for the nonspecialist user Software for Numerical Mathematics (ed D J Evans) Academic Press
Cox M G (1975) Numerical methods for the interpolation and approximation of data by spline functions PhD Thesis City University, London
Cox M G and Hayes J G (1973) Curve fitting: a guide and suite of algorithms for the nonspecialist user NPL Report NAC26 National Physical Laboratory
de Boor C (1972) On calculating with Bsplines J. Approx. Theory 6 50–62
Gentleman W M (1973) Least squares computations by Givens transformations without square roots J. Inst. Math. Applic. 12 329–336
Gentleman W M (1974) Algorithm AS 75. Basic procedures for large sparse or weighted linear least squares problems Appl. Statist. 23 448–454
Schoenberg I J and Whitney A (1953) On Polya frequency functions III Trans. Amer. Math. Soc. 74 246–259
5
Arguments

1:
$\mathbf{m}$ – Integer
Input

On entry: the number $m$ of data points.
Constraint:
${\mathbf{m}}\ge \mathit{mdist}\ge 4$, where $\mathit{mdist}$ is the number of distinct $x$ values in the data.

2:
$\mathbf{x}\left[{\mathbf{m}}\right]$ – const double
Input

On entry: the values ${x}_{\mathit{r}}$ of the independent variable (abscissa), for $\mathit{r}=1,2,\dots ,m$.
Constraint:
${x}_{1}\le {x}_{2}\le \cdots \le {x}_{m}$.

3:
$\mathbf{y}\left[{\mathbf{m}}\right]$ – const double
Input

On entry: the values ${y}_{\mathit{r}}$ of the of the dependent variable (ordinate), for $\mathit{r}=1,2,\dots ,m$.

4:
$\mathbf{weights}\left[{\mathbf{m}}\right]$ – const double
Input

On entry: the values
${w}_{\mathit{r}}$ of the weights, for
$\mathit{r}=1,2,\dots ,m$. For advice on the choice of weights, see the
E02 Chapter Introduction.
Constraint:
${w}_{r}>0$, for $\mathit{r}=1,2,\dots ,m$.

5:
$\mathbf{ss}$ – double *
Output

On exit: the residual sum of squares, $\theta $.

6:
$\mathbf{spline}$ – Nag_Spline *

Pointer to structure of type Nag_Spline with the following members:
 n – IntegerInput

On entry: $\overline{n}+7$, where $\overline{n}$ is the number of intervals of the spline (which is one greater than the number of interior knots, i.e., the knots strictly within the range ${x}_{1}$ to ${x}_{m}$) over which the spline is defined.
Constraint:
$8\le \mathbf{n}\le \mathit{mdist}+4$, where $\mathit{mdist}$ is the number of distinct $x$ values in the data.
 lamda – double *Input/Output

On entry: a pointer to which memory of size $\mathbf{n}$ must be allocated. $\mathbf{lamda}\left[\mathit{i}1\right]$ must be set to the $\left(\mathit{i}4\right)$th interior knot, ${\lambda}_{\mathit{i}}$, for $\mathit{i}=5,6,\dots ,\overline{n}+3$.
On exit: the input values are unchanged, and $\mathbf{lamda}\left[i\right]$, $i=0,1,2,3\text{,}\mathbf{n}4$, $\mathbf{n}3$, $\mathbf{n}2$, $\mathbf{n}1$ contains the additional exterior knots introduced by the function.
Constraint:
${\mathbf{x}}\left[0\right]<\mathbf{lamda}\left[4\right]\le \mathbf{lamda}\left[5\right]\le \cdots \le \mathbf{lamda}\left[\mathbf{n}5\right]<{\mathbf{x}}\left[{\mathbf{m}}1\right]$.
 c – double *Output

On exit: a pointer to which memory of size $\mathbf{n}4$ is internally allocated. $\mathbf{c}$ holds the coefficient ${c}_{\mathit{i}}$ of the Bspline ${N}_{\mathit{i}}\left(x\right)$, for $\mathit{i}=1,2,\dots ,\overline{n}+3$.
Note that when the information contained in the pointers
$\mathbf{lamda}$ and
$\mathbf{c}$ is no longer of use, or before a new call to
e02bac with the same
spline, you should free this storage using the NAG macro
NAG_FREE.

7:
$\mathbf{fail}$ – NagError *
Input/Output

The NAG error argument (see
Section 7 in the Introduction to the NAG Library CL Interface).
6
Error Indicators and Warnings
 NE_ALLOC_FAIL

Dynamic memory allocation failed.
 NE_INT_ARG_LT

On entry, $\mathbf{spline}\mathbf{\to}\mathbf{n}$ must not be less than 8: $\mathbf{spline}\mathbf{\to}\mathbf{n}=\u2329\mathit{\text{value}}\u232a$.
 NE_KNOTS_DISTINCT_ABSCI_CONS

Too many knots for the number of distinct abscissae, $\mathit{mdist}$: $\mathbf{spline}\mathbf{\to}\mathbf{n}=\u2329\mathit{\text{value}}\u232a$, $\mathit{mdist}=\u2329\mathit{\text{value}}\u232a$.
These must satisfy the constraint $\mathbf{spline}\mathbf{\to}\mathbf{n}\le \mathit{mdist}+4$.
 NE_KNOTS_OUTSIDE_DATA_INTVL

On entry, userspecified knots must be interior to the data interval, $\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[4\right]$ must be greater than ${\mathbf{x}}\left[0\right]$ and $\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[\mathbf{spline}\mathbf{\to}\mathbf{n}5\right]$ must be less than ${\mathbf{x}}\left[{\mathbf{m}}1\right]$: $\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[4\right]=\u2329\mathit{\text{value}}\u232a$, ${\mathbf{x}}\left[0\right]=\u2329\mathit{\text{value}}\u232a$, $\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$, ${\mathbf{x}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$.
 NE_NOT_INCREASING

The sequence
$\mathbf{spline}\mathbf{\to}\mathbf{lamda}$ is not increasing:
$\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$,
$\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$.
This condition on
$\mathbf{spline}\mathbf{\to}\mathbf{lamda}$ applies to userspecified knots in the interval
$\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[4\right]$,
$\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[\mathbf{spline}\mathbf{\to}\mathbf{n}5\right]$.
The sequence
x is not increasing:
${\mathbf{x}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$,
${\mathbf{x}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$.
 NE_SW_COND_FAIL

The conditions specified by Schoenberg and Whitney fail.
The conditions specified by
Schoenberg and Whitney (1953) fail to hold for at least one subset of the distinct data abscissae. That is, there is no subset of
$\mathbf{spline}\mathbf{\to}\mathbf{n}4$ strictly increasing values,
${\mathbf{x}}\left[{r}_{0}\right]$,
${\mathbf{x}}\left[{r}_{1}\right],\dots ,{\mathbf{x}}\left[{r}_{\mathbf{spline}\mathbf{\to}\mathbf{n}5}\right]$, among the abscissae such that
 ${\mathbf{x}}\left[{r}_{0}\right]<\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[0\right]<{\mathbf{x}}\left[{r}_{4}\right]$,
 ${\mathbf{x}}\left[{r}_{1}\right]<\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[1\right]<{\mathbf{x}}\left[{r}_{5}\right]$,
 $\dots $
 ${\mathbf{x}}\left[{r}_{\mathbf{spline}\mathbf{\to}\mathbf{n}9}\right]<\mathbf{spline}\mathbf{\to}\mathbf{lamda}\left[\mathbf{spline}\mathbf{\to}\mathbf{n}9\right]<{\mathbf{x}}\left[{r}_{\mathbf{spline}\mathbf{\to}\mathbf{n}5}\right]$.
This means that there is no unique solution: there are regions containing too many knots compared with the number of data points.
 NE_WEIGHTS_NOT_POSITIVE

On entry, the weights are not strictly positive: ${\mathbf{weights}}\left[\u2329\mathit{\text{value}}\u232a\right]=\u2329\mathit{\text{value}}\u232a$.
7
Accuracy
The rounding errors committed are such that the computed coefficients are exact for a slightly perturbed set of ordinates
${y}_{r}+\delta {y}_{r}$. The ratio of the rootmeansquare value for the
$\delta {y}_{r}$ to the rootmeansquare value of the
${y}_{r}$ can be expected to be less than a small multiple of
$\kappa \times m\times $machine precision, where
$\kappa $ is a condition number for the problem. Values of
$\kappa $ for 2030 practical datasets all proved to lie between
$4.5$ and
$7.8$ (see
Cox (1975)). (Note that for these datasets, replacing the coincident end knots at the endpoints
${x}_{1}$ and
${x}_{m}$ used in the function by various choices of noncoincident exterior knots gave values of
$\kappa $ between 16 and
$180$. Again see
Cox (1975) for further details.) In general we would not expect
$\kappa $ to be large unless the choice of knots results in nearviolation of the Schoenberg–Whitney conditions.
A cubic spline which adequately fits the data and is free from spurious oscillations is more likely to be obtained if the knots are chosen to be grouped more closely in regions where the function (underlying the data) or its derivatives change more rapidly than elsewhere.
8
Parallelism and Performance
e02bac is not threaded in any implementation.
The time taken by e02bac is approximately $C\times \left(2m+\overline{n}+7\right)$ seconds, where $C$ is a machinedependent constant.
Multiple knots are permitted as long as their multiplicity does not exceed
$4$, i.e., the complete set of knots must satisfy
${\lambda}_{\mathit{i}}<{\lambda}_{\mathit{i}+4}$, for
$\mathit{i}=1,2,\dots ,\overline{n}+3$, (see
Section 6). At a knot of multiplicity one (the usual case),
$s\left(x\right)$ and its first two derivatives are continuous. At a knot of multiplicity two,
$s\left(x\right)$ and its first derivative are continuous. At a knot of multiplicity three,
$s\left(x\right)$ is continuous, and at a knot of multiplicity four,
$s\left(x\right)$ is generally discontinuous.
The function can be used efficiently for cubic spline interpolation, i.e., if $m=\overline{n}+3$. The abscissae must then of course satisfy ${x}_{1}<{x}_{2}<\cdots <{x}_{m}$. Recommended values for the knots in this case are ${\lambda}_{\mathit{i}}={x}_{\mathit{i}2}$, for $\mathit{i}=5,6,\dots ,\overline{n}+3$.
10
Example
Determine a weighted least squares cubic spline approximation with five intervals (four interior knots) to a set of 14 given data points. Tabulate the data and the corresponding values of the approximating spline, together with the residual errors, and also the values of the approximating spline at points halfway between each pair of adjacent data points.
The example program is written in a general form that will enable a cubic spline approximation with
$\overline{n}$ intervals (
$\overline{n}1$ interior knots) to be obtained to
$m$ data points, with arbitrary positive weights, and the approximation to be tabulated. Note that
e02bbc is used to evaluate the approximating spline. The program is selfstarting in that any number of datasets can be supplied.
10.1
Program Text
10.2
Program Data
10.3
Program Results