NAG CL Interfaceg10acc (fit_​spline_​parest)

Settings help

CL Name Style:

1Purpose

g10acc estimates the values of the smoothing argument and fits a cubic smoothing spline to a set of data.

2Specification

 #include
 void g10acc (Nag_SmoothParamMethods method, Integer n, const double x[], const double y[], const double weights[], double yhat[], double coeff[], double *rss, double *df, double res[], double h[], double *crit, double *rho, double u, double tol, Integer maxcal, NagError *fail)
The function may be called by the names: g10acc, nag_smooth_fit_spline_parest or nag_smooth_spline_estim.

3Description

For a set of $n$ observations $\left({x}_{\mathit{i}},{y}_{\mathit{i}}\right)$, for $\mathit{i}=1,2,\dots ,n$, the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.
Cubic smoothing splines arise as the unique real-valued solution function, $f$, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:
 $∑ i=1 n w i { y i -f( x i )} 2 + ρ ∫ -∞ ∞ { f ′′ (x)} 2 dx ,$
where ${w}_{i}$ is the (optional) weight for the $i$th observation and $\rho$ is the smoothing argument. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing argument $\rho$ weights these two aspects; larger values of $\rho$ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967).
The fitted values, $\stackrel{^}{y}={\left({\stackrel{^}{y}}_{1},{\stackrel{^}{y}}_{2},\dots ,{\stackrel{^}{y}}_{n}\right)}^{\mathrm{T}}$, and weighted residuals, ${r}_{i}$, can be written as:
 $y ^ = Hy and r i = w i ( y i - y ^ i )$
for a matrix $H$. The residual degrees of freedom for the spline is trace$\left(I-H\right)$ and the diagonal elements of $H$ are the leverages.
The argument $\rho$ can be estimated in a number of ways.
1. 1.The degrees of freedom for the spline can be specified, i.e., find $\rho$ such that trace$\left(H\right)={\nu }_{0}$ for given ${\nu }_{0}$.
2. 2.Minimize the cross-validation (CV), i.e., find $\rho$ such that the CV is minimized, where
 $CV = 1 ∑ i=1 n w i ∑ i=1 n [ r i 1 - h ii ] 2 .$
3. 3.Minimize the generalized cross-validation (GCV), i.e., find $\rho$ such that the GCV is minimized, where
 $GCV = n 2 ∑ i=1 n w i [ ∑ i=1 n r i 2 ( ∑ i=1 n (1- h ii )) 2 ] .$
g10acc requires the ${x}_{i}$ to be strictly increasing. If two or more observations have the same ${x}_{i}$ value then they should be replaced by a single observation with ${y}_{i}$ equal to the (weighted) mean of the $y$ values and weight, ${w}_{i}$, equal to the sum of the weights. This operation can be performed by g10zac.
The algorithm is based on Hutchinson (1986).
Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183

5Arguments

1: $\mathbf{method}$Nag_SmoothParamMethods Input
On entry: indicates whether the smoothing argument is to be found by minimization of the CV or GCV functions, or by finding the smoothing argument corresponding to a specified degrees of freedom value.
${\mathbf{method}}=\mathrm{Nag_SmoothParamCV}$
Cross-validation is used.
${\mathbf{method}}=\mathrm{Nag_SmoothParamDF}$
The degrees of freedom are specified.
${\mathbf{method}}=\mathrm{Nag_SmoothParamGCV}$
Generalized cross-validation is used.
Constraint: ${\mathbf{method}}=\mathrm{Nag_SmoothParamCV}$, $\mathrm{Nag_SmoothParamDF}$ or $\mathrm{Nag_SmoothParamGCV}$.
2: $\mathbf{n}$Integer Input
On entry: the number of observations, $n$.
Constraint: ${\mathbf{n}}\ge 3$.
3: $\mathbf{x}\left[{\mathbf{n}}\right]$const double Input
On entry: the distinct and ordered values ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
Constraint: ${\mathbf{x}}\left[\mathit{i}-1\right]<{\mathbf{x}}\left[\mathit{i}\right]$, for $\mathit{i}=1,2,\dots ,n-1$.
4: $\mathbf{y}\left[{\mathbf{n}}\right]$const double Input
On entry: the values ${y}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
5: $\mathbf{weights}\left[{\mathbf{n}}\right]$const double Input
On entry: weights must contain the $n$ weights, if they are required. Otherwise, weights must be set to NULL.
Constraint: if weights are required, then ${\mathbf{weights}}\left[\mathit{i}-1\right]>0.0$, for $\mathit{i}=1,2,\dots ,n$.
6: $\mathbf{yhat}\left[{\mathbf{n}}\right]$double Output
On exit: the fitted values, ${\stackrel{^}{y}}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
7: $\mathbf{coeff}\left[\left({\mathbf{n}}-1\right)×3\right]$double Output
On exit: the spline coefficients. More precisely, the value of the spline approximation at $t$ is given by $\left(\left({\mathbf{coeff}}\left[\left(i-1\right)×\left(n-1\right)+2\right]×d+{\mathbf{coeff}}\left[\left(i-1\right)×\left(n-1\right)+1\right]\right)×d+{\mathbf{coeff}}\left[\left(i-1\right)×\left(n-1\right)\right]\right)×d+{\stackrel{^}{y}}_{i}$, where ${x}_{i}\le t<{x}_{i+1}$ and $d={t-x}_{i}$.
8: $\mathbf{rss}$double * Output
On exit: the (weighted) residual sum of squares.
9: $\mathbf{df}$double * Output
On exit: the residual degrees of freedom. If ${\mathbf{method}}=\mathrm{Nag_SmoothParamDF}$, this will be $n-{\mathbf{crit}}$ to the required accuracy.
10: $\mathbf{res}\left[{\mathbf{n}}\right]$double Output
On exit: the (weighted) residuals, ${r}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
11: $\mathbf{h}\left[{\mathbf{n}}\right]$double Output
On exit: the leverages, ${h}_{\mathit{i}\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
12: $\mathbf{crit}$double * Input/Output
On entry: if ${\mathbf{method}}=\mathrm{Nag_SmoothParamDF}$, the required degrees of freedom for the spline.
If ${\mathbf{method}}=\mathrm{Nag_SmoothParamCV}$ or $\mathrm{Nag_SmoothParamGCV}$, crit need not be set.
Constraint: $2.0<{\mathbf{crit}}\le {\mathbf{n}}$.
On exit: if ${\mathbf{method}}=\mathrm{Nag_SmoothParamCV}$, the value of the cross-validation, or if ${\mathbf{method}}=\mathrm{Nag_SmoothParamGCV}$, the value of the generalized cross-validation function, evaluated at the value of $\rho$ returned in rho.
13: $\mathbf{rho}$double * Output
On exit: the smoothing argument, $\rho$.
14: $\mathbf{u}$double Input
On entry: the upper bound on the smoothing argument. See Section 9 for details on how this argument is used.
Suggested value: ${\mathbf{u}}=1000.0$.
Constraint: ${\mathbf{u}}>{\mathbf{tol}}$.
15: $\mathbf{tol}$double Input
On entry: the accuracy to which the smoothing argument rho is required. tol should be preferably not much less than $\sqrt{\epsilon }$, where $\epsilon$ is the machine precision.
Constraint: .
16: $\mathbf{maxcal}$Integer Input
On entry: the maximum number of spline evaluations to be used in finding the value of $\rho$.
Suggested value: ${\mathbf{maxcal}}=30$.
Constraint: ${\mathbf{maxcal}}\ge 3$.
17: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6Error Indicators and Warnings

NE_2_REAL_ARG_LE
On entry, ${\mathbf{u}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{tol}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{u}}>{\mathbf{tol}}$.
NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument method had an illegal value.
NE_G10AC_ACC
A solution to the accuracy given by tol has not been achieved in maxcal iterations. Try increasing the value of tol and/or maxcal.
NE_G10AC_CG_RHO
${\mathbf{method}}=\mathrm{Nag_SmoothParamCV}$ or $\mathrm{Nag_SmoothParamGCV}$ and the optimal value of ${\mathbf{rho}}>{\mathbf{u}}$. Try a larger value of u.
NE_G10AC_DF_RHO
${\mathbf{method}}=\mathrm{Nag_SmoothParamDF}$ and the required value of rho for specified degrees of freedom $>{\mathbf{u}}$. Try a larger value of u.
NE_G10AC_DF_TOL
${\mathbf{method}}=\mathrm{Nag_SmoothParamDF}$ and the accuracy given by tol cannot be achieved. Try increasing the value of tol.
NE_INT_ARG_LT
On entry, ${\mathbf{maxcal}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{maxcal}}\ge 3$.
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}\ge 3$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NOT_STRICTLY_INCREASING
The sequence x is not strictly increasing: ${\mathbf{x}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$, ${\mathbf{x}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
NE_REAL
On entry, ${\mathbf{crit}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{crit}}>2.0$, if ${\mathbf{method}}=\mathrm{Nag_SmoothParamDF}$.
NE_REAL_ARRAY_CONS
On entry, ${\mathbf{weights}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{weights}}\left[\mathit{i}\right]>0$, for $\mathit{i}=0,1,\dots ,n-1$.
NE_REAL_INT_ARG_CONS
On entry, ${\mathbf{crit}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{crit}}\le {\mathbf{n}}$, if ${\mathbf{method}}=\mathrm{Nag_SmoothParamDF}$.
NE_REAL_MACH_PREC
On entry, ${\mathbf{tol}}=⟨\mathit{\text{value}}⟩$, .
Constraint: .

7Accuracy

When minimizing the cross-validation or generalized cross-validation, the error in the estimate of $\rho$ should be within $±3×\left({\mathbf{tol}}×{\mathbf{rho}}+{\mathbf{tol}}\right)$. When finding $\rho$ for a fixed number of degrees of freedom the error in the estimate of $\rho$ should be within $±2×{\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{rho}}\right)$.
Given the value of $\rho$, the accuracy of the fitted spline depends on the value of $\rho$ and the position of the $x$ values. The values of ${x}_{i}-{x}_{i-1}$ and ${w}_{i}$ are scaled and $\rho$ is transformed to avoid underflow and overflow problems.

8Parallelism and Performance

g10acc is not threaded in any implementation.

The time to fit the spline for a given value of $\rho$ is of order $n$.
When finding the value of $\rho$ that gives the required degrees of freedom, the algorithm examines the interval $0.0$ to u. For small degrees of freedom the value of $\rho$ can be large, as in the theoretical case of two degrees of freedom when the spline reduces to a straight line and $\rho$ is infinite. If the CV or GCV is to be minimized then the algorithm searches for the minimum value in the interval $0.0$ to u. If the function is decreasing in that range then the boundary value of u will be returned. In either case, the larger the value of u the more likely is the interval to contain the required solution, but the process will be less efficient.
Regression splines with a small $\left( number of knots can be fitted by e02bac and e02bec.

10Example

The data, given by Hastie and Tibshirani (1990), is the age, ${x}_{i}$, and C-peptide concentration (pmol/ml), ${y}_{i}$, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by g10zac and a spline with 5 degrees of freedom is fitted by g10acc. The fitted values and residuals are printed.

10.1Program Text

Program Text (g10acce.c)

10.2Program Data

Program Data (g10acce.d)

10.3Program Results

Program Results (g10acce.r)