Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_smooth_fit_spline (g10ab)

Purpose

nag_smooth_fit_spline (g10ab) fits a cubic smoothing spline for a given smoothing parameter.

Syntax

[yhat, c, rss, df, res, h, comm, ifail] = g10ab(mode, x, y, rho, c, comm, 'n', n, 'wt', wt)
[yhat, c, rss, df, res, h, comm, ifail] = nag_smooth_fit_spline(mode, x, y, rho, c, comm, 'n', n, 'wt', wt)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 24: drop weight, wt optional
.

Description

nag_smooth_fit_spline (g10ab) fits a cubic smoothing spline to a set of n$n$ observations (xi${x}_{i}$, yi${y}_{i}$), for i = 1,2,,n$i=1,2,\dots ,n$. The spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is unsuitable.
Cubic smoothing splines arise as the unique real-valued solution function f$f$, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:
 n ∞ ∑ wi(yi − f(xi))2 + ρ ∫ (f ′ ′ (x))2dx, i = 1 − ∞
$∑i=1nwi(yi-f(xi))2+ρ ∫-∞∞(f′′(x))2dx,$
where wi${w}_{i}$ is the (optional) weight for the i$i$th observation and ρ$\rho$ is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve, and the second the smoothness of the curve. The value of the smoothing parameter ρ$\rho$ weights these two aspects; larger values of ρ$\rho$ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be estimated see Hutchinson and de Hoog (1985) and Reinsch (1967).
The fitted values, = (1,2,,n)T $\stackrel{^}{y}={\left({\stackrel{^}{y}}_{1},{\stackrel{^}{y}}_{2},\dots ,{\stackrel{^}{y}}_{n}\right)}^{\mathrm{T}}$, and weighted residuals, ri${r}_{i}$, can be written as
 ŷ = Hy  and  ri = sqrt(wi)(yi − ŷi) $y^=Hy and ri=wi(yi-y^i)$
for a matrix H$H$. The residual degrees of freedom for the spline is trace(IH)$\mathrm{trace}\left(I-H\right)$ and the diagonal elements of H$H$, hii${h}_{ii}$, are the leverages.
The parameter ρ$\rho$ can be chosen in a number of ways. The fit can be inspected for a number of different values of ρ$\rho$. Alternatively the degrees of freedom for the spline, which determines the value of ρ$\rho$, can be specified, or the (generalized) cross-validation can be minimized to give ρ$\rho$; see nag_smooth_fit_spline_parest (g10ac) for further details.
nag_smooth_fit_spline (g10ab) requires the xi${x}_{i}$ to be strictly increasing. If two or more observations have the same xi${x}_{i}$-value then they should be replaced by a single observation with yi${y}_{i}$ equal to the (weighted) mean of the y$y$ values and weight, wi${w}_{i}$, equal to the sum of the weights. This operation can be performed by nag_smooth_data_order (g10za).
The computation is split into three phases.
 (i) Compute matrices needed to fit spline. (ii) Fit spline for a given value of ρ$\rho$. (iii) Compute spline coefficients.
When fitting the spline for several different values of ρ$\rho$, phase (i) need only be carried out once and then phase (ii) repeated for different values of ρ$\rho$. If the spline is being fitted as part of an iterative weighted least squares procedure phases (i) and (ii) have to be repeated for each set of weights. In either case, phase (iii) will often only have to be performed after the final fit has been computed.
The algorithm is based on Hutchinson (1986).

References

Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183

Parameters

Compulsory Input Parameters

1:     mode – string (length ≥ 1)
Indicates in which mode the function is to be used.
mode = 'P'${\mathbf{mode}}=\text{'P'}$
Initialization and fitting is performed. This partial fit can be used in an iterative weighted least squares context where the weights are changing at each call to nag_smooth_fit_spline (g10ab) or when the coefficients are not required.
mode = 'Q'${\mathbf{mode}}=\text{'Q'}$
Fitting only is performed. Initialization must have been performed previously by a call to nag_smooth_fit_spline (g10ab) with mode = 'P'${\mathbf{mode}}=\text{'P'}$. This quick fit may be called repeatedly with different values of rho without re-initialization.
mode = 'F'${\mathbf{mode}}=\text{'F'}$
Initialization and full fitting is performed and the function coefficients are calculated.
Constraint: mode = 'P'${\mathbf{mode}}=\text{'P'}$, 'Q'$\text{'Q'}$ or 'F'$\text{'F'}$.
2:     x(n) – double array
n, the dimension of the array, must satisfy the constraint n3${\mathbf{n}}\ge 3$.
The distinct and ordered values xi${x}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
Constraint: x(i) < x(i + 1)${\mathbf{x}}\left(\mathit{i}\right)<{\mathbf{x}}\left(\mathit{i}+1\right)$, for i = 1,2,,n1$\mathit{i}=1,2,\dots ,n-1$.
3:     y(n) – double array
n, the dimension of the array, must satisfy the constraint n3${\mathbf{n}}\ge 3$.
The values yi${y}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
4:     rho – double scalar
ρ$\rho$, the smoothing parameter.
Constraint: rho0.0${\mathbf{rho}}\ge 0.0$.
5:     c(ldc,3$3$) – double array
ldc, the first dimension of the array, must satisfy the constraint ldcn1$\mathit{ldc}\ge {\mathbf{n}}-1$.
If mode = 'Q'${\mathbf{mode}}=\text{'Q'}$, c must be unaltered from the previous call to nag_smooth_fit_spline (g10ab) with mode = 'P'${\mathbf{mode}}=\text{'P'}$. Otherwise c need not be set.
6:     comm(9 × n + 14$9×{\mathbf{n}}+14$) – double array
If mode = 'Q'${\mathbf{mode}}=\text{'Q'}$, comm must be unaltered from the previous call to nag_smooth_fit_spline (g10ab) with mode = 'P'${\mathbf{mode}}=\text{'P'}$. Otherwise comm need not be set.

Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the arrays x, y, comm. (An error is raised if these dimensions are not equal.)
n$n$, the number of distinct observations.
Constraint: n3${\mathbf{n}}\ge 3$.
2:     wt( : $:$) – double array
Note: the dimension of the array wt must be at least n${\mathbf{n}}$ if weight = 'W'$\mathit{weight}=\text{'W'}$.
If weight = 'W'$\mathit{weight}=\text{'W'}$, wt must contain the n$n$ weights. Otherwise wt is not referenced and unit weights are assumed.
Constraint: if weight = 'W'$\mathit{weight}=\text{'W'}$, wt(i) > 0.0${\mathbf{wt}}\left(\mathit{i}\right)>0.0$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.

weight ldc

Output Parameters

1:     yhat(n) – double array
The fitted values, i${\stackrel{^}{y}}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
2:     c(ldc,3$3$) – double array
ldcn1$\mathit{ldc}\ge {\mathbf{n}}-1$.
If mode = 'F'${\mathbf{mode}}=\text{'F'}$, c contains the spline coefficients. More precisely, the value of the spline at t$t$ is given by ((c(i,3) × d + c(i,2)) × d + c(i,1)) × d + i $\left(\left({\mathbf{c}}\left(i,3\right)×d+{\mathbf{c}}\left(i,2\right)\right)×d+{\mathbf{c}}\left(i,1\right)\right)×d+{\stackrel{^}{y}}_{i}$, where xit < xi + 1${x}_{i}\le t<{x}_{i+1}$ and d = txi$d=t-{x}_{i}$.
If mode = 'P'${\mathbf{mode}}=\text{'P'}$ or 'Q'$\text{'Q'}$, c contains information that will be used in a subsequent call to nag_smooth_fit_spline (g10ab) with mode = 'Q'${\mathbf{mode}}=\text{'Q'}$.
The (weighted) residual sum of squares.
4:     df – double scalar
The residual degrees of freedom.
5:     res(n) – double array
The (weighted) residuals, ri${r}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
6:     h(n) – double array
The leverages, hii${h}_{\mathit{i}\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
7:     comm(9 × n + 14$9×{\mathbf{n}}+14$) – double array
If mode = 'P'${\mathbf{mode}}=\text{'P'}$ or 'Q'$\text{'Q'}$, comm contains information that will be used in a subsequent call to nag_smooth_fit_spline (g10ab) with mode = 'Q'${\mathbf{mode}}=\text{'Q'}$.
8:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

Error Indicators and Warnings

Errors or warnings detected by the function:
ifail = 1${\mathbf{ifail}}=1$
 On entry, n < 3${\mathbf{n}}<3$, or ldc < n − 1$\mathit{ldc}<{\mathbf{n}}-1$, or rho < 0.0${\mathbf{rho}}<0.0$, or mode ≠ 'Q'${\mathbf{mode}}\ne \text{'Q'}$, 'P'$\text{'P'}$ or 'F'$\text{'F'}$, or weight ≠ 'W'$\mathit{weight}\ne \text{'W'}$ or 'U'$\text{'U'}$.
ifail = 2${\mathbf{ifail}}=2$
 On entry, weight = 'W'$\mathit{weight}=\text{'W'}$ and at least one element of wt ≤ 0.0${\mathbf{wt}}\le 0.0$.
ifail = 3${\mathbf{ifail}}=3$
 On entry, x(i) ≥ x(i + 1)${\mathbf{x}}\left(i\right)\ge {\mathbf{x}}\left(i+1\right)$, for some i$i$, i = 1,2, … ,n − 1$i=1,2,\dots ,n-1$.

Accuracy

Accuracy depends on the value of ρ$\rho$ and the position of the x$x$ values. The values of xixi1${x}_{i}-{x}_{i-1}$ and wi${w}_{i}$ are scaled and ρ$\rho$ is transformed to avoid underflow and overflow problems.

The time taken by nag_smooth_fit_spline (g10ab) is of order n$n$.
Regression splines with a small ( < n)$\left( number of knots can be fitted by nag_fit_1dspline_knots (e02ba) and nag_fit_1dspline_auto (e02be).

Example

```function nag_smooth_fit_spline_example
mode = 'F';
x = [0.9;
1;
1.8;
1.9;
2.2;
4.2;
4.8;
5.1;
5.2;
5.8;
6.9;
7.9;
8.1;
8.5;
8.8;
8.9;
9.8;
9.9;
10.4;
10.5;
10.6;
10.8;
11;
11.1;
11.3;
11.5;
11.8;
11.9;
12.4;
12.5;
12.7;
12.8;
13.2;
13.8;
14.5;
15.5;
15.6];
y = [3;
3.9;
3.4;
3.7;
3.9;
5.1;
4.2;
4.6;
4.85;
5.6;
5.1;
4.8;
5.2;
5.3;
4.1;
4.9;
4.8;
4.9;
5;
5.2;
5;
5.1;
4.4;
4.9;
5.1;
5.5;
4.6;
5.1;
5.2;
4.1;
3.4;
6.6;
5.3;
3.7;
5.7;
4.9;
4.9];
wt = [1;
1;
1;
1;
1;
1;
2;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
1;
1];
rho = 10;
c = zeros(36, 3);
wk = zeros(447, 1);
[yhat, cOut, rss, df, res, h, wkOut, ifail] = ...
nag_smooth_fit_spline(mode, x, y, rho, c, wk, 'wt', wt);
yhat, cOut, rss, df, res, h, ifail
```
```

yhat =

3.3674
3.4008
3.6642
3.7016
3.8214
4.5265
4.6471
4.7561
4.7993
5.0458
5.1204
4.9590
4.9262
4.8595
4.8172
4.8095
4.8676
4.8818
4.9445
4.9521
4.9572
4.9613
4.9614
4.9618
4.9623
4.9568
4.9338
4.9251
4.8943
4.8944
4.9051
4.9138
4.9239
4.8930
4.9938
4.9773
4.9682

cOut =

0.3359         0   -0.1653
0.3310   -0.0496    0.0593
0.3655    0.0928   -0.0596
0.3823    0.0749   -0.0603
0.4109    0.0206   -0.0249
0.1944   -0.1289    0.2331
0.2915    0.2907   -0.1692
0.4202    0.1384   -0.2394
0.4407    0.0665   -0.1938
0.3112   -0.2823    0.0555
-0.1085   -0.0992    0.0463
-0.1679    0.0398   -0.0967
-0.1636   -0.0182    0.0264
-0.1655    0.0135    0.2246
-0.0968    0.2156   -0.0980
-0.0566    0.1862   -0.0574
0.1392    0.0314   -0.0878
0.1428    0.0050   -0.0796
0.0882   -0.1143   -0.0546
0.0637   -0.1307    0.0569
0.0392   -0.1137    0.0954
0.0052   -0.0564    0.1578
0.0016    0.0382   -0.0948
0.0064    0.0098   -0.1504
-0.0077   -0.0804   -0.0884
-0.0505   -0.1335    0.1560
-0.0885    0.0069    0.0058
-0.0869    0.0086    0.0845
-0.0150    0.1353    0.2220
0.0188    0.2019   -0.1354
0.0833    0.1207   -0.8125
0.0831   -0.1230   -0.0539
-0.0413   -0.1878    0.2844
0.0406    0.3242   -0.2523
0.1237   -0.2056    0.0655
-0.0911   -0.0092    0.0307

11.2876

df =

27.7845

res =

-0.3674
0.4992
-0.2642
-0.0016
0.0786
0.5735
-0.6323
-0.1561
0.0717
0.5542
-0.0204
-0.2248
0.2738
0.4405
-0.7172
0.0905
-0.0676
0.0182
0.0555
0.2479
0.0605
0.1387
-0.5614
-0.0874
0.1377
0.5432
-0.3338
0.1749
0.3057
-0.7944
-1.5051
1.6862
0.5318
-1.1930
0.7062
-0.0773
-0.0682

h =

0.4813
0.3955
0.2548
0.2670
0.3463
0.3688
0.3608
0.1614
0.3365
0.2948
0.3899
0.4175
0.1879
0.1839
0.1993
0.2076
0.2013
0.1893
0.1286
0.1188
0.2232
0.1046
0.1050
0.2154
0.1177
0.1300
0.1453
0.1489
0.1482
0.1463
0.1470
0.1503
0.3694
0.2850
0.3666
0.4252
0.4961

ifail =

0

```
```function g10ab_example
mode = 'F';
x = [0.9;
1;
1.8;
1.9;
2.2;
4.2;
4.8;
5.1;
5.2;
5.8;
6.9;
7.9;
8.1;
8.5;
8.8;
8.9;
9.8;
9.9;
10.4;
10.5;
10.6;
10.8;
11;
11.1;
11.3;
11.5;
11.8;
11.9;
12.4;
12.5;
12.7;
12.8;
13.2;
13.8;
14.5;
15.5;
15.6];
y = [3;
3.9;
3.4;
3.7;
3.9;
5.1;
4.2;
4.6;
4.85;
5.6;
5.1;
4.8;
5.2;
5.3;
4.1;
4.9;
4.8;
4.9;
5;
5.2;
5;
5.1;
4.4;
4.9;
5.1;
5.5;
4.6;
5.1;
5.2;
4.1;
3.4;
6.6;
5.3;
3.7;
5.7;
4.9;
4.9];
wt = [1;
1;
1;
1;
1;
1;
2;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
1;
1];
rho = 10;
c = zeros(36, 3);
wk = zeros(447, 1);
[yhat, cOut, rss, df, res, h, wkOut, ifail] = ...
g10ab(mode, x, y, rho, c, wk, 'wt', wt);
yhat, cOut, rss, df, res, h, ifail
```
```

yhat =

3.3674
3.4008
3.6642
3.7016
3.8214
4.5265
4.6471
4.7561
4.7993
5.0458
5.1204
4.9590
4.9262
4.8595
4.8172
4.8095
4.8676
4.8818
4.9445
4.9521
4.9572
4.9613
4.9614
4.9618
4.9623
4.9568
4.9338
4.9251
4.8943
4.8944
4.9051
4.9138
4.9239
4.8930
4.9938
4.9773
4.9682

cOut =

0.3359         0   -0.1653
0.3310   -0.0496    0.0593
0.3655    0.0928   -0.0596
0.3823    0.0749   -0.0603
0.4109    0.0206   -0.0249
0.1944   -0.1289    0.2331
0.2915    0.2907   -0.1692
0.4202    0.1384   -0.2394
0.4407    0.0665   -0.1938
0.3112   -0.2823    0.0555
-0.1085   -0.0992    0.0463
-0.1679    0.0398   -0.0967
-0.1636   -0.0182    0.0264
-0.1655    0.0135    0.2246
-0.0968    0.2156   -0.0980
-0.0566    0.1862   -0.0574
0.1392    0.0314   -0.0878
0.1428    0.0050   -0.0796
0.0882   -0.1143   -0.0546
0.0637   -0.1307    0.0569
0.0392   -0.1137    0.0954
0.0052   -0.0564    0.1578
0.0016    0.0382   -0.0948
0.0064    0.0098   -0.1504
-0.0077   -0.0804   -0.0884
-0.0505   -0.1335    0.1560
-0.0885    0.0069    0.0058
-0.0869    0.0086    0.0845
-0.0150    0.1353    0.2220
0.0188    0.2019   -0.1354
0.0833    0.1207   -0.8125
0.0831   -0.1230   -0.0539
-0.0413   -0.1878    0.2844
0.0406    0.3242   -0.2523
0.1237   -0.2056    0.0655
-0.0911   -0.0092    0.0307

11.2876

df =

27.7845

res =

-0.3674
0.4992
-0.2642
-0.0016
0.0786
0.5735
-0.6323
-0.1561
0.0717
0.5542
-0.0204
-0.2248
0.2738
0.4405
-0.7172
0.0905
-0.0676
0.0182
0.0555
0.2479
0.0605
0.1387
-0.5614
-0.0874
0.1377
0.5432
-0.3338
0.1749
0.3057
-0.7944
-1.5051
1.6862
0.5318
-1.1930
0.7062
-0.0773
-0.0682

h =

0.4813
0.3955
0.2548
0.2670
0.3463
0.3688
0.3608
0.1614
0.3365
0.2948
0.3899
0.4175
0.1879
0.1839
0.1993
0.2076
0.2013
0.1893
0.1286
0.1188
0.2232
0.1046
0.1050
0.2154
0.1177
0.1300
0.1453
0.1489
0.1482
0.1463
0.1470
0.1503
0.3694
0.2850
0.3666
0.4252
0.4961

ifail =

0

```