The function may be called by the names: g10acc, nag_smooth_fit_spline_parest or nag_smooth_spline_estim.
For a set of observations , for , the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.
Cubic smoothing splines arise as the unique real-valued solution function, , with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:
where is the (optional) weight for the th observation and is the smoothing argument. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing argument weights these two aspects; larger values of give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967).
The fitted values, , and weighted residuals, , can be written as:
for a matrix . The residual degrees of freedom for the spline is trace and the diagonal elements of are the leverages.
The argument can be estimated in a number of ways.
1.The degrees of freedom for the spline can be specified, i.e., find such that trace for given .
2.Minimize the cross-validation (CV), i.e., find such that the CV is minimized, where
3.Minimize the generalized cross-validation (GCV), i.e., find such that the GCV is minimized, where
g10acc requires the to be strictly increasing. If two or more observations have the same value then they should be replaced by a single observation with equal to the (weighted) mean of the values and weight, , equal to the sum of the weights. This operation can be performed by g10zac.
Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math.47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math.10 177–183
1: – Nag_SmoothParamMethodsInput
On entry: indicates whether the smoothing argument is to be found by minimization of the CV or GCV functions, or by finding the smoothing argument corresponding to a specified degrees of freedom value.
Cross-validation is used.
The degrees of freedom are specified.
Generalized cross-validation is used.
, or .
2: – IntegerInput
On entry: the number of observations, .
3: – const doubleInput
On entry: the distinct and ordered values , for .
, for .
4: – const doubleInput
On entry: the values , for .
5: – const doubleInput
On entry: weights must contain the weights, if they are required. Otherwise, weights must be set to NULL.
On entry, and . These arguments must satisfy , if .
On entry, , .
When minimizing the cross-validation or generalized cross-validation, the error in the estimate of should be within . When finding for a fixed number of degrees of freedom the error in the estimate of should be within .
Given the value of , the accuracy of the fitted spline depends on the value of and the position of the values. The values of and are scaled and is transformed to avoid underflow and overflow problems.
8Parallelism and Performance
g10acc is not threaded in any implementation.
The time to fit the spline for a given value of is of order .
When finding the value of that gives the required degrees of freedom, the algorithm examines the interval to u. For small degrees of freedom the value of can be large, as in the theoretical case of two degrees of freedom when the spline reduces to a straight line and is infinite. If the CV or GCV is to be minimized then the algorithm searches for the minimum value in the interval to u. If the function is decreasing in that range then the boundary value of u will be returned. In either case, the larger the value of u the more likely is the interval to contain the required solution, but the process will be less efficient.
Regression splines with a small number of knots can be fitted by e02bac and e02bec.
The data, given by Hastie and Tibshirani (1990), is the age, , and C-peptide concentration (pmol/ml), , from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by g10zac and a spline with 5 degrees of freedom is fitted by g10acc. The fitted values and residuals are printed.