# NAG FL Interfaceg10acf (fit_​spline_​parest)

## 1Purpose

g10acf estimates the values of the smoothing parameter and fits a cubic smoothing spline to a set of data.

## 2Specification

Fortran Interface
 Subroutine g10acf ( n, x, y, wt, yhat, c, ldc, rss, df, res, h, crit, rho, u, tol, wk,
 Integer, Intent (In) :: n, ldc, maxcal Integer, Intent (Inout) :: ifail Real (Kind=nag_wp), Intent (In) :: x(n), y(n), wt(*), u, tol Real (Kind=nag_wp), Intent (Inout) :: c(ldc,3), crit Real (Kind=nag_wp), Intent (Out) :: yhat(n), rss, df, res(n), h(n), rho, wk(7*(n+2)) Character (1), Intent (In) :: method, weight
C Header Interface
#include <nag.h>
 void g10acf_ (const char *method, const char *weight, const Integer *n, const double x[], const double y[], const double wt[], double yhat[], double c[], const Integer *ldc, double *rss, double *df, double res[], double h[], double *crit, double *rho, const double *u, const double *tol, const Integer *maxcal, double wk[], Integer *ifail, const Charlen length_method, const Charlen length_weight)
The routine may be called by the names g10acf or nagf_smooth_fit_spline_parest.

## 3Description

For a set of $n$ observations $\left({x}_{\mathit{i}},{y}_{\mathit{i}}\right)$, for $\mathit{i}=1,2,\dots ,n$, the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.
Cubic smoothing splines arise as the unique real-valued solution function $f$, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes
 $∑i=1nwiyi-fxi2+ρ ∫-∞∞f′′x2dx,$
where ${w}_{i}$ is the (optional) weight for the $i$th observation and $\rho$ is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing parameter $\rho$ weights these two aspects; larger values of $\rho$ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967).
The fitted values, $\stackrel{^}{y}={\left({\stackrel{^}{y}}_{1},{\stackrel{^}{y}}_{2},\dots ,{\stackrel{^}{y}}_{n}\right)}^{\mathrm{T}}$, and weighted residuals, ${r}_{i}$, can be written as:
 $y^=Hy and ri=wiyi-y^i$
for a matrix $H$. The residual degrees of freedom for the spline is $\mathrm{trace}\left(I-H\right)$ and the diagonal elements of $H$ are the leverages.
The parameter $\rho$ can be estimated in a number of ways.
1. (i)The degrees of freedom for the spline can be specified, i.e., find $\rho$ such that $\mathrm{trace}\left(H\right)={\nu }_{0}$ for given ${\nu }_{0}$.
2. (ii)Minimize the cross-validation (CV), i.e., find $\rho$ such that the CV is minimized, where
 $CV=1∑i=1nwi ∑i=1n ri1-hii 2.$
3. (iii)Minimize the generalized cross-validation (GCV), i.e., find $\rho$ such that the GCV is minimized, where
 $GCV=n2∑i=1nwi ∑i=1nri2 ∑i=1n1-hii 2 .$
g10acf requires the ${x}_{i}$ to be strictly increasing. If two or more observations have the same ${x}_{i}$ value then they should be replaced by a single observation with ${y}_{i}$ equal to the (weighted) mean of the $y$ values and weight, ${w}_{i}$, equal to the sum of the weights. This operation can be performed by g10zaf.
The algorithm is based on Hutchinson (1986). c05azf is used to solve for $\rho$ given ${\nu }_{0}$ and the method of e04abf/​e04aba is used to minimize the GCV or CV.

## 4References

Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183

## 5Arguments

1: $\mathbf{method}$Character(1) Input
On entry: indicates whether the smoothing parameter is to be found by minimization of the CV or GCV functions, or by finding the smoothing parameter corresponding to a specified degrees of freedom value.
${\mathbf{method}}=\text{'C'}$
Cross-validation is used.
${\mathbf{method}}=\text{'D'}$
The degrees of freedom are specified.
${\mathbf{method}}=\text{'G'}$
Generalized cross-validation is used.
Constraint: ${\mathbf{method}}=\text{'C'}$, $\text{'D'}$ or $\text{'G'}$.
2: $\mathbf{weight}$Character(1) Input
On entry: indicates whether user-defined weights are to be used.
${\mathbf{weight}}=\text{'W'}$
User-defined weights should be supplied in wt.
${\mathbf{weight}}=\text{'U'}$
The data is treated as unweighted.
Constraint: ${\mathbf{weight}}=\text{'W'}$ or $\text{'U'}$.
3: $\mathbf{n}$Integer Input
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 3$.
4: $\mathbf{x}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Input
On entry: the distinct and ordered values ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
Constraint: ${\mathbf{x}}\left(\mathit{i}\right)<{\mathbf{x}}\left(\mathit{i}+1\right)$, for $\mathit{i}=1,2,\dots ,n-1$.
5: $\mathbf{y}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Input
On entry: the values ${y}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
6: $\mathbf{wt}\left(*\right)$Real (Kind=nag_wp) array Input
Note: the dimension of the array wt must be at least ${\mathbf{n}}$ if ${\mathbf{weight}}=\text{'W'}$.
On entry: if ${\mathbf{weight}}=\text{'W'}$, wt must contain the $n$ weights. Otherwise wt is not referenced and unit weights are assumed.
Constraint: if ${\mathbf{weight}}=\text{'W'}$, ${\mathbf{wt}}\left(\mathit{i}\right)>0.0$, for $\mathit{i}=1,2,\dots ,n$.
7: $\mathbf{yhat}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Output
On exit: the fitted values, ${\stackrel{^}{y}}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
8: $\mathbf{c}\left({\mathbf{ldc}},3\right)$Real (Kind=nag_wp) array Output
On exit: the spline coefficients. More precisely, the value of the spline approximation at $t$ is given by $\left(\left({\mathbf{c}}\left(i,3\right)×d+{\mathbf{c}}\left(i,2\right)\right)×d+{\mathbf{c}}\left(i,1\right)\right)×d+{\stackrel{^}{y}}_{i}$, where ${x}_{i}\le t<{x}_{i+1}$ and $d=t-{x}_{i}$.
9: $\mathbf{ldc}$Integer Input
On entry: the first dimension of the array c as declared in the (sub)program from which g10acf is called.
Constraint: ${\mathbf{ldc}}\ge {\mathbf{n}}-1$.
10: $\mathbf{rss}$Real (Kind=nag_wp) Output
On exit: the (weighted) residual sum of squares.
11: $\mathbf{df}$Real (Kind=nag_wp) Output
On exit: the residual degrees of freedom. If ${\mathbf{method}}=\text{'D'}$ this will be $n-{\mathbf{crit}}$ to the required accuracy.
12: $\mathbf{res}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Output
On exit: the (weighted) residuals, ${r}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
13: $\mathbf{h}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Output
On exit: the leverages, ${h}_{\mathit{i}\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
14: $\mathbf{crit}$Real (Kind=nag_wp) Input/Output
On entry: if ${\mathbf{method}}=\text{'D'}$, the required degrees of freedom for the spline.
If ${\mathbf{method}}=\text{'C'}$ or $\text{'G'}$, crit need not be set.
Constraint: $2.0<{\mathbf{crit}}\le {\mathbf{n}}$.
On exit: if ${\mathbf{method}}=\text{'C'}$, the value of the cross-validation, or if ${\mathbf{method}}=\text{'G'}$, the value of the generalized cross-validation function, evaluated at the value of $\rho$ returned in rho.
15: $\mathbf{rho}$Real (Kind=nag_wp) Output
On exit: the smoothing parameter, $\rho$.
16: $\mathbf{u}$Real (Kind=nag_wp) Input
On entry: the upper bound on the smoothing parameter. If ${\mathbf{u}}\le {\mathbf{tol}}$, ${\mathbf{u}}=1000.0$ will be used instead. See Section 9 for details on how this argument is used.
17: $\mathbf{tol}$Real (Kind=nag_wp) Input
On entry: the accuracy to which the smoothing parameter rho is required. tol should preferably be not much less than $\sqrt{\epsilon }$, where $\epsilon$ is the machine precision. If ${\mathbf{tol}}<\epsilon$, ${\mathbf{tol}}=\sqrt{\epsilon }$ will be used instead.
18: $\mathbf{maxcal}$Integer Input
On entry: the maximum number of spline evaluations to be used in finding the value of $\rho$. If ${\mathbf{maxcal}}<3$, ${\mathbf{maxcal}}=100$ will be used instead.
19: $\mathbf{wk}\left(7×\left({\mathbf{n}}+2\right)\right)$Real (Kind=nag_wp) array Workspace
20: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{crit}}=〈\mathit{\text{value}}〉$.
Constraint: if ${\mathbf{method}}=\text{'D'}$, ${\mathbf{crit}}>2.0$.
On entry, ${\mathbf{crit}}=〈\mathit{\text{value}}〉$.
Constraint: if ${\mathbf{method}}=\text{'D'}$, ${\mathbf{crit}}\le {\mathbf{n}}$.
On entry, ${\mathbf{ldc}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ldc}}\ge {\mathbf{n}}-1$.
On entry, method is not valid: ${\mathbf{method}}=〈\mathit{\text{value}}〉$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 3$.
On entry, ${\mathbf{weight}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{weight}}=\text{'W'}$ or $\text{'U'}$.
${\mathbf{ifail}}=2$
On entry, at least one element of ${\mathbf{wt}}\le 0.0$.
${\mathbf{ifail}}=3$
On entry, x is not a strictly ordered array.
${\mathbf{ifail}}=4$
For the specified degrees of freedom, ${\mathbf{rho}}>{\mathbf{u}}$: ${\mathbf{u}}=〈\mathit{\text{value}}〉$.
${\mathbf{ifail}}=5$
Accuracy of tol cannot be achieved: ${\mathbf{tol}}=〈\mathit{\text{value}}〉$.
${\mathbf{ifail}}=6$
maxcal iterations have been performed.
${\mathbf{ifail}}=7$
Optimum value of rho lies above u: ${\mathbf{u}}=〈\mathit{\text{value}}〉$.
${\mathbf{ifail}}=-99$
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

When minimizing the cross-validation or generalized cross-validation, the error in the estimate of $\rho$ should be within $±3\left({\mathbf{tol}}×{\mathbf{rho}}+{\mathbf{tol}}\right)$. When finding $\rho$ for a fixed number of degrees of freedom the error in the estimate of $\rho$ should be within $±2×{\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{rho}}\right)$.
Given the value of $\rho$, the accuracy of the fitted spline depends on the value of $\rho$ and the position of the $x$ values. The values of ${x}_{i}-{x}_{i-1}$ and ${w}_{i}$ are scaled and $\rho$ is transformed to avoid underflow and overflow problems.

## 8Parallelism and Performance

g10acf is not threaded in any implementation.

## 9Further Comments

The time to fit the spline for a given value of $\rho$ is of order $n$.
When finding the value of $\rho$ that gives the required degrees of freedom, the algorithm examines the interval $0.0$ to u. For small degrees of freedom the value of $\rho$ can be large, as in the theoretical case of two degrees of freedom when the spline reduces to a straight line and $\rho$ is infinite. If the CV or GCV is to be minimized then the algorithm searches for the minimum value in the interval $0.0$ to u. If the function is decreasing in that range then the boundary value of u will be returned. In either case, the larger the value of u the more likely is the interval to contain the required solution, but the process will be less efficient.
Regression splines with a small $\left( number of knots can be fitted by e02baf and e02bef.

## 10Example

This example uses the data given by Hastie and Tibshirani (1990), which consists of the age, ${x}_{i}$, and C-peptide concentration (pmol/ml), ${y}_{i}$, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by g10zaf and a spline with $5$ degrees of freedom is fitted by g10acf. The fitted values and residuals are printed.

### 10.1Program Text

Program Text (g10acfe.f90)

### 10.2Program Data

Program Data (g10acfe.d)

### 10.3Program Results

Program Results (g10acfe.r)