The routine may be called by the names g10abf or nagf_smooth_fit_spline.
3Description
g10abf fits a cubic smoothing spline to a set of $n$ observations (${x}_{i}$, ${y}_{i}$), for $i=1,2,\dots ,n$. The spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is unsuitable.
Cubic smoothing splines arise as the unique real-valued solution function $f$, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:
where ${w}_{i}$ is the (optional) weight for the $i$th observation and $\rho $ is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve, and the second the smoothness of the curve. The value of the smoothing parameter $\rho $ weights these two aspects; larger values of $\rho $ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be estimated see Hutchinson and de Hoog (1985) and Reinsch (1967).
The fitted values, $\hat{y}={({\hat{y}}_{1},{\hat{y}}_{2},\dots ,{\hat{y}}_{n})}^{\mathrm{T}}$, and weighted residuals, ${r}_{i}$, can be written as
$$\hat{y}=Hy\text{\hspace{1em} and \hspace{1em}}{r}_{i}=\sqrt{{w}_{i}}({y}_{i}-{\hat{y}}_{i})$$
for a matrix $H$. The residual degrees of freedom for the spline is $\mathrm{trace}(I-H)$ and the diagonal elements of $H$, ${h}_{ii}$, are the leverages.
The parameter $\rho $ can be chosen in a number of ways. The fit can be inspected for a number of different values of $\rho $. Alternatively the degrees of freedom for the spline, which determines the value of $\rho $, can be specified, or the (generalized) cross-validation can be minimized to give $\rho $; see g10acf for further details.
g10abf requires the ${x}_{i}$ to be strictly increasing. If two or more observations have the same ${x}_{i}$-value then they should be replaced by a single observation with ${y}_{i}$ equal to the (weighted) mean of the $y$ values and weight, ${w}_{i}$, equal to the sum of the weights. This operation can be performed by g10zaf.
The computation is split into three phases.
(i)Compute matrices needed to fit spline.
(ii)Fit spline for a given value of $\rho $.
(iii)Compute spline coefficients.
When fitting the spline for several different values of $\rho $, phase (i) need only be carried out once and then phase (ii) repeated for different values of $\rho $. If the spline is being fitted as part of an iterative weighted least squares procedure phases (i) and (ii) have to be repeated for each set of weights. In either case, phase (iii) will often only have to be performed after the final fit has been computed.
Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math.47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math.10 177–183
5Arguments
1: $\mathbf{mode}$ – Character(1)Input
On entry: indicates in which mode the routine is to be used.
${\mathbf{mode}}=\text{'P'}$
Initialization and fitting is performed. This partial fit can be used in an iterative weighted least squares context where the weights are changing at each call to g10abf or when the coefficients are not required.
${\mathbf{mode}}=\text{'Q'}$
Fitting only is performed. Initialization must have been performed previously by a call to g10abf with ${\mathbf{mode}}=\text{'P'}$. This quick fit may be called repeatedly with different values of rho without re-initialization.
${\mathbf{mode}}=\text{'F'}$
Initialization and full fitting is performed and the function coefficients are calculated.
Constraint:
${\mathbf{mode}}=\text{'P'}$, $\text{'Q'}$ or $\text{'F'}$.
2: $\mathbf{weight}$ – Character(1)Input
On entry: indicates whether user-defined weights are to be used.
Constraint:
${\mathbf{weight}}=\text{'W'}$ or $\text{'U'}$.
3: $\mathbf{n}$ – IntegerInput
On entry: $n$, the number of distinct observations.
Constraint:
${\mathbf{n}}\ge 3$.
4: $\mathbf{x}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayInput
On entry: the distinct and ordered values
${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
Constraint:
${\mathbf{x}}\left(\mathit{i}\right)<{\mathbf{x}}\left(\mathit{i}+1\right)$, for $\mathit{i}=1,2,\dots ,n-1$.
5: $\mathbf{y}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayInput
On entry: the values
${y}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
6: $\mathbf{wt}\left(*\right)$ – Real (Kind=nag_wp) arrayInput
Note: the dimension of the array wt
must be at least
${\mathbf{n}}$ if ${\mathbf{weight}}=\text{'W'}$.
On entry: if ${\mathbf{weight}}=\text{'W'}$, wt must contain the $n$ weights. Otherwise wt is not referenced and unit weights are assumed.
Constraint:
if ${\mathbf{weight}}=\text{'W'}$,
${\mathbf{wt}}\left(\mathit{i}\right)>0.0$, for $\mathit{i}=1,2,\dots ,n$.
7: $\mathbf{rho}$ – Real (Kind=nag_wp)Input
On entry: $\rho $, the smoothing parameter.
Constraint:
${\mathbf{rho}}\ge 0.0$.
8: $\mathbf{yhat}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the fitted values,
${\hat{y}}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
9: $\mathbf{c}({\mathbf{ldc}},3)$ – Real (Kind=nag_wp) arrayInput/Output
On entry: if ${\mathbf{mode}}=\text{'Q'}$, c must be unaltered from the previous call to g10abf with ${\mathbf{mode}}=\text{'P'}$. Otherwise c need not be set.
On exit: if ${\mathbf{mode}}=\text{'F'}$, c contains the spline coefficients. More precisely, the value of the spline at $t$ is given by $(({\mathbf{c}}(i,3)\times d+{\mathbf{c}}(i,2))\times d+{\mathbf{c}}(i,1))\times d+{\hat{y}}_{i}$, where ${x}_{i}\le t<{x}_{i+1}$ and $d=t-{x}_{i}$.
If ${\mathbf{mode}}=\text{'P'}$ or $\text{'Q'}$, c contains information that will be used in a subsequent call to g10abf with ${\mathbf{mode}}=\text{'Q'}$.
10: $\mathbf{ldc}$ – IntegerInput
On entry: the first dimension of the array c as declared in the (sub)program from which g10abf is called.
Constraint:
${\mathbf{ldc}}\ge {\mathbf{n}}-1$.
11: $\mathbf{rss}$ – Real (Kind=nag_wp)Output
On exit: the (weighted) residual sum of squares.
12: $\mathbf{df}$ – Real (Kind=nag_wp)Output
On exit: the residual degrees of freedom.
13: $\mathbf{res}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the (weighted) residuals,
${r}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
14: $\mathbf{h}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayOutput
On exit: the leverages,
${h}_{\mathit{i}\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
15: $\mathbf{comm}\left(9\times {\mathbf{n}}+14\right)$ – Real (Kind=nag_wp) arrayCommunication Array
On entry: if ${\mathbf{mode}}=\text{'Q'}$, comm must be unaltered from the previous call to g10abf with ${\mathbf{mode}}=\text{'P'}$. Otherwise comm need not be set.
On exit: if ${\mathbf{mode}}=\text{'P'}$ or $\text{'Q'}$, comm contains information that will be used in a subsequent call to g10abf with ${\mathbf{mode}}=\text{'Q'}$.
16: $\mathbf{ifail}$ – IntegerInput/Output
On entry: ifail must be set to $0$, $\mathrm{-1}$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $\mathrm{-1}$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $\mathrm{-1}$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry ${\mathbf{ifail}}=0$ or $\mathrm{-1}$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{ldc}}=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{n}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{ldc}}\ge {\mathbf{n}}-1$.
On entry, ${\mathbf{mode}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{mode}}=\text{'P'}$, $\text{'Q'}$ or $\text{'F'}$.
On entry, ${\mathbf{n}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{n}}\ge 3$.
On entry, ${\mathbf{rho}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{rho}}\ge 0.0$.
On entry, ${\mathbf{weight}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{weight}}=\text{'W'}$ or $\text{'U'}$.
${\mathbf{ifail}}=2$
On entry, at least one element of ${\mathbf{wt}}\le 0.0$.
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
7Accuracy
Accuracy depends on the value of $\rho $ and the position of the $x$ values. The values of ${x}_{i}-{x}_{i-1}$ and ${w}_{i}$ are scaled and $\rho $ is transformed to avoid underflow and overflow problems.
8Parallelism and Performance
Background information to multithreading can be found in the Multithreading documentation.
g10abf is not threaded in any implementation.
9Further Comments
The time taken by g10abf is of order $n$.
Regression splines with a small $(<n)$ number of knots can be fitted by e02baf and e02bef.
10Example
The data, given by Hastie and Tibshirani (1990), is the age, ${x}_{i}$, and C-peptide concentration (pmol/ml), ${y}_{i}$, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by g10zaf and a series of splines fit using a range of values for the smoothing parameter, $\rho $.