Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_smooth_fit_spline_parest (g10ac)

## Purpose

nag_smooth_fit_spline_parest (g10ac) estimates the values of the smoothing parameter and fits a cubic smoothing spline to a set of data.

## Syntax

[yhat, c, rss, df, res, h, crit, rho, ifail] = g10ac(method, x, y, crit, 'n', n, 'wt', wt, 'u', u, 'tol', tol, 'maxcal', maxcal)
[yhat, c, rss, df, res, h, crit, rho, ifail] = nag_smooth_fit_spline_parest(method, x, y, crit, 'n', n, 'wt', wt, 'u', u, 'tol', tol, 'maxcal', maxcal)
Note: the interface to this routine has changed since earlier releases of the toolbox:
Mark 24: tol now optional, drop weight, wt optional
.

## Description

For a set of n$n$ observations (xi,yi)$\left({x}_{\mathit{i}},{y}_{\mathit{i}}\right)$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$, the spline provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable.
Cubic smoothing splines arise as the unique real-valued solution function f$f$, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes
 n ∞ ∑ wi(yi − f(xi))2 + ρ ∫ (f ′ ′ (x))2dx, i = 1 − ∞
$∑i=1nwi(yi-f(xi))2+ρ ∫-∞∞(f′′(x))2dx,$
where wi${w}_{i}$ is the (optional) weight for the i$i$th observation and ρ$\rho$ is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve and the second the smoothness of the curve. The value of the smoothing parameter ρ$\rho$ weights these two aspects; larger values of ρ$\rho$ give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be fitted see Hutchinson and de Hoog (1985) and Reinsch (1967).
The fitted values, = (1,2,,n)T $\stackrel{^}{y}={\left({\stackrel{^}{y}}_{1},{\stackrel{^}{y}}_{2},\dots ,{\stackrel{^}{y}}_{n}\right)}^{\mathrm{T}}$, and weighted residuals, ri${r}_{i}$, can be written as:
 ŷ = Hy  and  ri = sqrt(wi)(yi − ŷi) $y^=Hy and ri=wi(yi-y^i)$
for a matrix H$H$. The residual degrees of freedom for the spline is trace(IH)$\mathrm{trace}\left(I-H\right)$ and the diagonal elements of H$H$ are the leverages.
The parameter ρ$\rho$ can be estimated in a number of ways.
(i) The degrees of freedom for the spline can be specified, i.e., find ρ$\rho$ such that trace(H) = ν0$\mathrm{trace}\left(H\right)={\nu }_{0}$ for given ν0${\nu }_{0}$.
(ii) Minimize the cross-validation (CV), i.e., find ρ$\rho$ such that the CV is minimized, where
 CV = 1/( ∑ i = 1nwi) ∑ i = 1n [(ri)/(1 − hii)]2. $CV=1∑i=1nwi ∑i=1n [ri1-hii ] 2.$
(iii) Minimize the generalized cross-validation (GCV), i.e., find ρ$\rho$ such that the GCV is minimized, where
 GCV = (n2)/( ∑ i = 1nwi) [( ∑ i = 1nri2)/( ( ∑ i = 1n(1 − hii))2)] . $GCV=n2∑i=1nwi [∑i=1nri2 (∑i=1n(1-hii)) 2 ] .$
nag_smooth_fit_spline_parest (g10ac) requires the xi${x}_{i}$ to be strictly increasing. If two or more observations have the same xi${x}_{i}$ value then they should be replaced by a single observation with yi${y}_{i}$ equal to the (weighted) mean of the y$y$ values and weight, wi${w}_{i}$, equal to the sum of the weights. This operation can be performed by nag_smooth_data_order (g10za)
The algorithm is based on Hutchinson (1986). nag_roots_contfn_brent_rcomm (c05az) is used to solve for ρ$\rho$ given ν0${\nu }_{0}$ and the method of nag_opt_one_var_func (e04ab) is used to minimize the GCV or CV.

## References

Hastie T J and Tibshirani R J (1990) Generalized Additive Models Chapman and Hall
Hutchinson M F (1986) Algorithm 642: A fast procedure for calculating minimum cross-validation cubic smoothing splines ACM Trans. Math. Software 12 150–153
Hutchinson M F and de Hoog F R (1985) Smoothing noisy data with spline functions Numer. Math. 47 99–106
Reinsch C H (1967) Smoothing by spline functions Numer. Math. 10 177–183

## Parameters

### Compulsory Input Parameters

1:     method – string (length ≥ 1)
Indicates whether the smoothing parameter is to be found by minimization of the CV or GCV functions, or by finding the smoothing parameter corresponding to a specified degrees of freedom value.
method = 'C'${\mathbf{method}}=\text{'C'}$
Cross-validation is used.
method = 'D'${\mathbf{method}}=\text{'D'}$
The degrees of freedom are specified.
method = 'G'${\mathbf{method}}=\text{'G'}$
Generalized cross-validation is used.
Constraint: method = 'C'${\mathbf{method}}=\text{'C'}$, 'D'$\text{'D'}$ or 'G'$\text{'G'}$.
2:     x(n) – double array
n, the dimension of the array, must satisfy the constraint n3${\mathbf{n}}\ge 3$.
The distinct and ordered values xi${x}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
Constraint: x(i) < x(i + 1)${\mathbf{x}}\left(\mathit{i}\right)<{\mathbf{x}}\left(\mathit{i}+1\right)$, for i = 1,2,,n1$\mathit{i}=1,2,\dots ,n-1$.
3:     y(n) – double array
n, the dimension of the array, must satisfy the constraint n3${\mathbf{n}}\ge 3$.
The values yi${y}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
4:     crit – double scalar
If method = 'D'${\mathbf{method}}=\text{'D'}$, the required degrees of freedom for the spline.
If method = 'C'${\mathbf{method}}=\text{'C'}$ or 'G'$\text{'G'}$, crit need not be set.
Constraint: 2.0 < critn$2.0<{\mathbf{crit}}\le {\mathbf{n}}$.

### Optional Input Parameters

1:     n – int64int32nag_int scalar
Default: The dimension of the arrays x, y. (An error is raised if these dimensions are not equal.)
n$n$, the number of observations.
Constraint: n3${\mathbf{n}}\ge 3$.
2:     wt( : $:$) – double array
Note: the dimension of the array wt must be at least n${\mathbf{n}}$ if weight = 'W'$\mathit{weight}=\text{'W'}$.
If weight = 'W'$\mathit{weight}=\text{'W'}$, wt must contain the n$n$ weights. Otherwise wt is not referenced and unit weights are assumed.
Constraint: if weight = 'W'$\mathit{weight}=\text{'W'}$, wt(i) > 0.0${\mathbf{wt}}\left(\mathit{i}\right)>0.0$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
3:     u – double scalar
The upper bound on the smoothing parameter. If utol${\mathbf{u}}\le {\mathbf{tol}}$, u = 1000.0${\mathbf{u}}=1000.0$ will be used instead. See Section [Further Comments] for details on how this parameter is used.
Default: 0.0$0.0$
4:     tol – double scalar
The accuracy to which the smoothing parameter rho is required. tol should preferably be not much less than sqrt(ε)$\sqrt{\epsilon }$, where ε$\epsilon$ is the machine precision. If tol < ε${\mathbf{tol}}<\epsilon$, tol = sqrt(ε)${\mathbf{tol}}=\sqrt{\epsilon }$ will be used instead.
Default: 0.0$0.0$
5:     maxcal – int64int32nag_int scalar
The maximum number of spline evaluations to be used in finding the value of ρ$\rho$. If maxcal < 3${\mathbf{maxcal}}<3$, maxcal = 100${\mathbf{maxcal}}=100$ will be used instead.
Default: 0$0$

weight ldc wk

### Output Parameters

1:     yhat(n) – double array
The fitted values, i${\stackrel{^}{y}}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
2:     c(ldc,3$3$) – double array
ldcn1$\mathit{ldc}\ge {\mathbf{n}}-1$.
The spline coefficients. More precisely, the value of the spline approximation at t$t$ is given by ((c(i,3) × d + c(i,2)) × d + c(i,1)) × d + i$\left(\left({\mathbf{c}}\left(i,3\right)×d+{\mathbf{c}}\left(i,2\right)\right)×d+{\mathbf{c}}\left(i,1\right)\right)×d+{\stackrel{^}{y}}_{i}$, where xit < xi + 1${x}_{i}\le t<{x}_{i+1}$ and d = txi$d=t-{x}_{i}$.
The (weighted) residual sum of squares.
4:     df – double scalar
The residual degrees of freedom. If method = 'D'${\mathbf{method}}=\text{'D'}$ this will be ncrit$n-{\mathbf{crit}}$ to the required accuracy.
5:     res(n) – double array
The (weighted) residuals, ri${r}_{\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
6:     h(n) – double array
The leverages, hii${h}_{\mathit{i}\mathit{i}}$, for i = 1,2,,n$\mathit{i}=1,2,\dots ,n$.
7:     crit – double scalar
If method = 'C'${\mathbf{method}}=\text{'C'}$, the value of the cross-validation, or if method = 'G'${\mathbf{method}}=\text{'G'}$, the value of the generalized cross-validation function, evaluated at the value of ρ$\rho$ returned in rho.
8:     rho – double scalar
The smoothing parameter, ρ$\rho$.
9:     ifail – int64int32nag_int scalar
${\mathrm{ifail}}={\mathbf{0}}$ unless the function detects an error (see [Error Indicators and Warnings]).

## Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

ifail = 1${\mathbf{ifail}}=1$
Constraint: if method = 'D'${\mathbf{method}}=\text{'D'}$, crit > 2.0${\mathbf{crit}}>2.0$.
Constraint: if method = 'D'${\mathbf{method}}=\text{'D'}$, ${\mathbf{crit}}\le {\mathbf{n}}$.
Constraint: ldcn1$\mathit{ldc}\ge {\mathbf{n}}-1$.
Constraint: n3${\mathbf{n}}\ge 3$.
On entry, method is not valid.
On entry, weight is not valid.
ifail = 2${\mathbf{ifail}}=2$
On entry, at least one element of wt0.0${\mathbf{wt}}\le 0.0$.
ifail = 3${\mathbf{ifail}}=3$
On entry, x is not a strictly ordered array.
ifail = 4${\mathbf{ifail}}=4$
For the specified degrees of freedom, rho > u${\mathbf{rho}}>{\mathbf{u}}$:
W ifail = 5${\mathbf{ifail}}=5$
Accuracy of tol cannot be achieved:
W ifail = 6${\mathbf{ifail}}=6$
maxcal iterations have been performed.
W ifail = 7${\mathbf{ifail}}=7$
Optimum value of rho lies above u:

## Accuracy

When minimizing the cross-validation or generalized cross-validation, the error in the estimate of ρ$\rho$ should be within ± 3(tol × rho + tol)$±3\left({\mathbf{tol}}×{\mathbf{rho}}+{\mathbf{tol}}\right)$. When finding ρ$\rho$ for a fixed number of degrees of freedom the error in the estimate of ρ$\rho$ should be within ± 2 × tol × max (1,rho)$±2×{\mathbf{tol}}×\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(1,{\mathbf{rho}}\right)$.
Given the value of ρ$\rho$, the accuracy of the fitted spline depends on the value of ρ$\rho$ and the position of the x$x$ values. The values of xixi1${x}_{i}-{x}_{i-1}$ and wi${w}_{i}$ are scaled and ρ$\rho$ is transformed to avoid underflow and overflow problems.

The time to fit the spline for a given value of ρ$\rho$ is of order n$n$.
When finding the value of ρ$\rho$ that gives the required degrees of freedom, the algorithm examines the interval 0.0$0.0$ to u. For small degrees of freedom the value of ρ$\rho$ can be large, as in the theoretical case of two degrees of freedom when the spline reduces to a straight line and ρ$\rho$ is infinite. If the CV or GCV is to be minimized then the algorithm searches for the minimum value in the interval 0.0$0.0$ to u. If the function is decreasing in that range then the boundary value of u will be returned. In either case, the larger the value of u the more likely is the interval to contain the required solution, but the process will be less efficient.
Regression splines with a small ( < n)$\left( number of knots can be fitted by nag_fit_1dspline_knots (e02ba) and nag_fit_1dspline_auto (e02be).

## Example

```function nag_smooth_fit_spline_parest_example
method = 'D';
x = [0.9;
1;
1.8;
1.9;
2.2;
4.2;
4.8;
5.1;
5.2;
5.8;
6.9;
7.9;
8.1;
8.5;
8.8;
8.9;
9.8;
9.9;
10.4;
10.5;
10.6;
10.8;
11;
11.1;
11.3;
11.5;
11.8;
11.9;
12.4;
12.5;
12.7;
12.8;
13.2;
13.8;
14.5;
15.5;
15.6];
y = [3;
3.9;
3.4;
3.7;
3.9;
5.1;
4.2;
4.6;
4.85;
5.6;
5.1;
4.8;
5.2;
5.3;
4.1;
4.9;
4.8;
4.9;
5;
5.2;
5;
5.1;
4.4;
4.9;
5.1;
5.5;
4.6;
5.1;
5.2;
4.1;
3.4;
6.6;
5.3;
3.7;
5.7;
4.9;
4.9];
wt = [1;
1;
1;
1;
1;
1;
2;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
1;
1];
crit = 12;
[yhat, c, rss, df, res, h, critOut, rho, ifail] = ...
nag_smooth_fit_spline_parest(method, x, y, crit, 'wt', wt)
```
```

yhat =

3.3732
3.4061
3.6424
3.6856
3.8395
4.6142
4.5762
4.7146
4.7830
5.1926
5.1836
4.9578
4.9307
4.8452
4.7630
4.7475
4.8500
4.8745
4.9704
4.9774
4.9788
4.9702
4.9614
4.9636
4.9753
4.9747
4.9297
4.9112
4.8517
4.8570
4.9002
4.9316
4.9547
4.7972
5.0757
4.9794
4.9462

c =

0.3349         0   -0.6265
0.3161   -0.1879    0.2026
0.4044    0.2983   -0.2043
0.4579    0.2371   -0.1801
0.5516    0.0750   -0.0785
-0.0909   -0.3962    0.7369
0.2295    0.9303   -0.5260
0.6457    0.4569   -0.7183
0.7155    0.2414   -0.4933
0.4724   -0.6465    0.1905
-0.2583   -0.0178    0.0503
-0.1430    0.1330   -0.4794
-0.1474   -0.1546   -0.0273
-0.2842   -0.1874    0.7361
-0.1979    0.4751   -0.3767
-0.1142    0.3621   -0.1208
0.2440    0.0359   -0.2047
0.2450   -0.0255   -0.1619
0.0981   -0.2683   -0.1123
0.0411   -0.3020    0.2613
-0.0115   -0.2236    0.3326
-0.0610   -0.0241    0.5504
-0.0046    0.3062   -0.3920
0.0448    0.1886   -0.6056
0.0476   -0.1748   -0.3963
-0.0699   -0.4125    0.4855
-0.1863    0.0244   -0.0679
-0.1834    0.0041    0.2490
0.0074    0.3776    0.8337
0.1079    0.6277   -0.4370
0.3065    0.3654   -2.9552
0.2910   -0.5211   -0.1546
-0.2002   -0.7067    1.0047
0.0368    1.1017   -0.8370
0.3488   -0.6560    0.2109
-0.3305   -0.0233    0.0776

10.3516

df =

25.0000

res =

-0.3732
0.4939
-0.2424
0.0144
0.0605
0.4858
-0.5320
-0.1146
0.0948
0.4074
-0.0836
-0.2231
0.2693
0.4548
-0.6630
0.1525
-0.0500
0.0255
0.0296
0.2226
0.0300
0.1298
-0.5614
-0.0900
0.1247
0.5253
-0.3297
0.1888
0.3483
-0.7570
-1.5002
1.6684
0.4884
-1.0972
0.6243
-0.0794
-0.0462

h =

0.5336
0.4273
0.3128
0.3127
0.4477
0.5640
0.4418
0.1889
0.4072
0.4552
0.5922
0.5299
0.2345
0.2446
0.2707
0.2924
0.3006
0.2765
0.1727
0.1542
0.2849
0.1356
0.1373
0.2836
0.1617
0.1857
0.2126
0.2202
0.2057
0.1957
0.1889
0.1932
0.4880
0.4077
0.5591
0.4455
0.5352

critOut =

12

rho =

2.6803

ifail =

0

```
```function g10ac_example
method = 'D';
x = [0.9;
1;
1.8;
1.9;
2.2;
4.2;
4.8;
5.1;
5.2;
5.8;
6.9;
7.9;
8.1;
8.5;
8.8;
8.9;
9.8;
9.9;
10.4;
10.5;
10.6;
10.8;
11;
11.1;
11.3;
11.5;
11.8;
11.9;
12.4;
12.5;
12.7;
12.8;
13.2;
13.8;
14.5;
15.5;
15.6];
y = [3;
3.9;
3.4;
3.7;
3.9;
5.1;
4.2;
4.6;
4.85;
5.6;
5.1;
4.8;
5.2;
5.3;
4.1;
4.9;
4.8;
4.9;
5;
5.2;
5;
5.1;
4.4;
4.9;
5.1;
5.5;
4.6;
5.1;
5.2;
4.1;
3.4;
6.6;
5.3;
3.7;
5.7;
4.9;
4.9];
wt = [1;
1;
1;
1;
1;
1;
2;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
2;
1;
1;
1;
1;
1;
1;
1;
1;
2;
1;
1;
1;
1];
crit = 12;
[yhat, c, rss, df, res, h, critOut, rho, ifail] = ...
g10ac(method, x, y, crit, 'wt', wt)
```
```

yhat =

3.3732
3.4061
3.6424
3.6856
3.8395
4.6142
4.5762
4.7146
4.7830
5.1926
5.1836
4.9578
4.9307
4.8452
4.7630
4.7475
4.8500
4.8745
4.9704
4.9774
4.9788
4.9702
4.9614
4.9636
4.9753
4.9747
4.9297
4.9112
4.8517
4.8570
4.9002
4.9316
4.9547
4.7972
5.0757
4.9794
4.9462

c =

0.3349         0   -0.6265
0.3161   -0.1879    0.2026
0.4044    0.2983   -0.2043
0.4579    0.2371   -0.1801
0.5516    0.0750   -0.0785
-0.0909   -0.3962    0.7369
0.2295    0.9303   -0.5260
0.6457    0.4569   -0.7183
0.7155    0.2414   -0.4933
0.4724   -0.6465    0.1905
-0.2583   -0.0178    0.0503
-0.1430    0.1330   -0.4794
-0.1474   -0.1546   -0.0273
-0.2842   -0.1874    0.7361
-0.1979    0.4751   -0.3767
-0.1142    0.3621   -0.1208
0.2440    0.0359   -0.2047
0.2450   -0.0255   -0.1619
0.0981   -0.2683   -0.1123
0.0411   -0.3020    0.2613
-0.0115   -0.2236    0.3326
-0.0610   -0.0241    0.5504
-0.0046    0.3062   -0.3920
0.0448    0.1886   -0.6056
0.0476   -0.1748   -0.3963
-0.0699   -0.4125    0.4855
-0.1863    0.0244   -0.0679
-0.1834    0.0041    0.2490
0.0074    0.3776    0.8337
0.1079    0.6277   -0.4370
0.3065    0.3654   -2.9552
0.2910   -0.5211   -0.1546
-0.2002   -0.7067    1.0047
0.0368    1.1017   -0.8370
0.3488   -0.6560    0.2109
-0.3305   -0.0233    0.0776

10.3516

df =

25.0000

res =

-0.3732
0.4939
-0.2424
0.0144
0.0605
0.4858
-0.5320
-0.1146
0.0948
0.4074
-0.0836
-0.2231
0.2693
0.4548
-0.6630
0.1525
-0.0500
0.0255
0.0296
0.2226
0.0300
0.1298
-0.5614
-0.0900
0.1247
0.5253
-0.3297
0.1888
0.3483
-0.7570
-1.5002
1.6684
0.4884
-1.0972
0.6243
-0.0794
-0.0462

h =

0.5336
0.4273
0.3128
0.3127
0.4477
0.5640
0.4418
0.1889
0.4072
0.4552
0.5922
0.5299
0.2345
0.2446
0.2707
0.2924
0.3006
0.2765
0.1727
0.1542
0.2849
0.1356
0.1373
0.2836
0.1617
0.1857
0.2126
0.2202
0.2057
0.1957
0.1889
0.1932
0.4880
0.4077
0.5591
0.4455
0.5352

critOut =

12

rho =

2.6803

ifail =

0

```