NAG CL Interface
g10abc (fit_​spline)

Settings help

CL Name Style:


1 Purpose

g10abc fits a cubic smoothing spline for a given smoothing parameter.

2 Specification

#include <nag.h>
void  g10abc (Nag_SmoothFitType mode, Integer n, const double x[], const double y[], const double wt[], double rho, double yhat[], double c[], Integer pdc, double *rss, double *df, double res[], double h[], double comm[], NagError *fail)
The function may be called by the names: g10abc, nag_smooth_fit_spline or nag_smooth_spline_fit.

3 Description

g10abc fits a cubic smoothing spline to a set of n observations (xi, yi), for i=1,2,,n. The spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is unsuitable.
Cubic smoothing splines arise as the unique real-valued solution function f, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:
i=1nwi(yi-f(xi))2+ρ -(f(x))2dx,  
where wi is the (optional) weight for the ith observation and ρ is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve, and the second the smoothness of the curve. The value of the smoothing parameter ρ weights these two aspects; larger values of ρ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be estimated see Hutchinson and de Hoog (1985) and Reinsch (1967).
The fitted values, y^ = (y^1,y^2,,y^n) T , and weighted residuals, ri, can be written as
y^=Hy  and  ri=wi(yi-y^i)  
for a matrix H. The residual degrees of freedom for the spline is trace(I-H) and the diagonal elements of H, hii, are the leverages.
The parameter ρ can be chosen in a number of ways. The fit can be inspected for a number of different values of ρ. Alternatively the degrees of freedom for the spline, which determines the value of ρ, can be specified, or the (generalized) cross-validation can be minimized to give ρ; see g10acc for further details.
g10abc requires the xi to be strictly increasing. If two or more observations have the same xi-value then they should be replaced by a single observation with yi equal to the (weighted) mean of the y values and weight, wi, equal to the sum of the weights. This operation can be performed by g10zac.
The computation is split into three phases.
  1. (i)Compute matrices needed to fit spline.
  2. (ii)Fit spline for a given value of ρ.
  3. (iii)Compute spline coefficients.
When fitting the spline for several different values of ρ, phase (i) need only be carried out once and then phase (ii) repeated for different values of ρ. If the spline is being fitted as part of an iterative weighted least squares procedure phases (i) and (ii) have to be repeated for each set of weights. In either case, phase (iii) will often only have to be performed after the final fit has been computed.
The algorithm is based on Hutchinson (1986).

4 References

Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183

5 Arguments

1: mode Nag_SmoothFitType Input
On entry: indicates in which mode the function is to be used.
mode=Nag_SmoothFitPartial
Initialization and fitting is performed. This partial fit can be used in an iterative weighted least squares context where the weights are changing at each call to g10abc or when the coefficients are not required.
mode=Nag_SmoothFitQuick
Fitting only is performed. Initialization must have been performed previously by a call to g10abc with mode=Nag_SmoothFitPartial. This quick fit may be called repeatedly with different values of rho without re-initialization.
mode=Nag_SmoothFitFull
Initialization and full fitting is performed and the function coefficients are calculated.
Constraint: mode=Nag_SmoothFitPartial, Nag_SmoothFitQuick or Nag_SmoothFitFull.
2: n Integer Input
On entry: n, the number of distinct observations.
Constraint: n3.
3: x[n] const double Input
On entry: the distinct and ordered values xi, for i=1,2,,n.
Constraint: x[i-1]<x[i], for i=1,2,,n-1.
4: y[n] const double Input
On entry: the values yi, for i=1,2,,n.
5: wt[dim] const double Input
Note: the dimension, dim, of the array wt must be at least n.
On entry: optionally, the n weights.
If weights are not provided then wt must be set to NULL, in which case unit weights are assumed.
Constraint: wt[i-1]>0.0, for i=1,2,,n.
6: rho double Input
On entry: ρ, the smoothing parameter.
Constraint: rho0.0.
7: yhat[n] double Output
On exit: the fitted values, y^i, for i=1,2,,n.
8: c[pdc×3] double Input/Output
Note: the (i,j)th element of the matrix C is stored in c[(j-1)×pdc+i-1].
On entry: if mode=Nag_SmoothFitQuick, c must be unaltered from the previous call to g10abc with mode=Nag_SmoothFitPartial. Otherwise c need not be set.
On exit: if mode=Nag_SmoothFitFull, c contains the spline coefficients. More precisely, the value of the spline at t is given by ((c[2×pdc+i-1]×d+c[1×pdc+i-1])×d+c[i-1]) × d + y^i , where xit<xi+1 and d=t-xi.
If mode=Nag_SmoothFitPartial or Nag_SmoothFitQuick, c contains information that will be used in a subsequent call to g10abc with mode=Nag_SmoothFitQuick.
9: pdc Integer Input
On entry: the stride separating matrix row elements in the array c.
Constraint: pdcn-1.
10: rss double * Output
On exit: the (weighted) residual sum of squares.
11: df double * Output
On exit: the residual degrees of freedom.
12: res[n] double Output
On exit: the (weighted) residuals, ri, for i=1,2,,n.
13: h[n] double Output
On exit: the leverages, hii, for i=1,2,,n.
14: comm[9×n+14] double Communication Array
On entry: if mode=Nag_SmoothFitQuick, comm must be unaltered from the previous call to g10abc with mode=Nag_SmoothFitPartial. Otherwise comm need not be set.
On exit: if mode=Nag_SmoothFitPartial or Nag_SmoothFitQuick, comm contains information that will be used in a subsequent call to g10abc with mode=Nag_SmoothFitQuick.
15: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_ARRAY_SIZE
On entry, pdc=value and n=value.
Constraint: pdcn-1.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_INT
On entry, n=value.
Constraint: n3.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NE_NOT_STRICTLY_INCREASING
On entry, x is not a strictly ordered array.
NE_REAL
On entry, rho=value.
Constraint: rho0.0.
NE_WEIGHTS_NOT_POSITIVE
On entry, at least one element of wt0.0.

7 Accuracy

Accuracy depends on the value of ρ and the position of the x values. The values of xi-xi-1 and wi are scaled and ρ is transformed to avoid underflow and overflow problems.

8 Parallelism and Performance

g10abc is not threaded in any implementation.

9 Further Comments

The time taken by g10abc is of order n.
Regression splines with a small (<n) number of knots can be fitted by e02bac and e02bec.

10 Example

The data, given by Hastie and Tibshirani (1990), is the age, xi, and C-peptide concentration (pmol/ml), yi, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by g10zac and a series of splines fit using a range of values for the smoothing parameter, ρ.

10.1 Program Text

Program Text (g10abce.c)

10.2 Program Data

Program Data (g10abce.d)

10.3 Program Results

Program Results (g10abce.r)
GnuplotProduced by GNUPLOT 4.6 patchlevel 3 3 3.5 4 4.5 5 5.5 6 6.5 7 0 2 4 6 8 10 12 14 16 C-peptide concentration (pmol/ml) Age (years) Example Program Cubic Smoothing Spline Study of the factors affecting insulin-dependent diabetes mellitus in children Hastie and Tibshirani (1990) gnuplot_plot_1 raw data gnuplot_plot_2 ρ = 1 gnuplot_plot_3 ρ = 10 gnuplot_plot_4 ρ = 100