g10ab:: Smoothing in Statistics (NAG Toolbox)

Cubic smoothing splines arise as the unique real-valued solution function

f

, with absolutely continuous first derivative and squared-integrable second derivative, which minimizes:

\sum_{i = 1}^{n} w_{i} {(y_{i} - f (x_{i}))}^{2} + ρ \int_{- \infty}^{\infty} {(f^{''} (x))}^{2} d x,

where

w_{i}

is the (optional) weight for the

i

th observation and

ρ

is the smoothing parameter. This criterion consists of two parts: the first measures the fit of the curve, and the second the smoothness of the curve. The value of the smoothing parameter

ρ

weights these two aspects; larger values of

ρ

give a smoother fitted curve but, in general, a poorer fit. For details of how the cubic spline can be estimated see Hutchinson and de Hoog (1985) and Reinsch (1967).

nag_smooth_fit_spline (g10ab) requires the

x_{i}

to be strictly increasing. If two or more observations have the same

x_{i}

-value then they should be replaced by a single observation with

y_{i}

equal to the (weighted) mean of the

y

values and weight,

w_{i}

, equal to the sum of the weights. This operation can be performed by nag_smooth_data_order (g10za).

When fitting the spline for several different values of

ρ

, phase (i) need only be carried out once and then phase (ii) repeated for different values of

ρ

. If the spline is being fitted as part of an iterative weighted least squares procedure phases (i) and (ii) have to be repeated for each set of weights. In either case, phase (iii) will often only have to be performed after the final fit has been computed.

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Accuracy

Further Comments

Example

The data, given by Hastie and Tibshirani (1990), is the age,

x_{i}

, and C-peptide concentration (pmol/ml),

y_{i}

, from a study of the factors affecting insulin-dependent diabetes mellitus in children. The data is input, reduced to a strictly ordered set by nag_smooth_data_order (g10za) and a series of splines fit using a range of values for the smoothing parameter,

ρ

function g10ab_example


fprintf('g10ab example results\n\n');

x =  [ 5.2  8.8 10.5 10.6 10.4  1.8 12.7 15.6  5.8  1.9 ...
       2.2  4.8  7.9  5.2  0.9 11.8  7.9 11.5 10.6  8.5 ...
      11.1 12.8 11.3  1.0 14.5 11.9  8.1 13.8 15.5  9.8 ...
      11.0 12.4 11.1  5.1  4.8  4.2  6.9 13.2  9.9 12.5 ...
      13.2  8.9 10.8];
y =  [ 4.8  4.1  5.2  5.5  5.0  3.4  3.4  4.9  5.6  3.7 ...
       3.9  4.5  4.8  4.9  3.0  4.6  4.8  5.5  4.5  5.3 ...
       4.7  6.6  5.1  3.9  5.7  5.1  5.2  3.7  4.9  4.8 ...
       4.4  5.2  5.1  4.6  3.9  5.1  5.1  6.0  4.9  4.1 ...
       4.6  4.9  5.1];

% Reorder x, remove ties and weight accordingly
[n, x, y, wt, rss, ifail] = g10za( ...
                                   x, y);
x = x(1:n);
y = y(1:n);

rho  = [1 10 100];
nrho = numel(rho);

c    = zeros(n, 3);
comm = zeros(9*n+14, 1);
yhat = zeros(n,nrho);
rss  = zeros(nrho,1);
df   = zeros(nrho,1);

% Initialize and fit for rho(1)
mode = 'P';
[yhat(:,1), c, rss(1), df(1), res, h, comm, ifail] = ...
  g10ab(mode, x, y, rho(1), c, comm, 'wt', wt);

% Fit for subsequent rhos
mode = 'Q';
for j = 2:nrho
  [yhat(:,j), c, rss(j), df(j), res, h, comm, ifail] = ...
  g10ab( ...
         mode, x, y, rho(j), c, comm, 'wt', wt);
end

%  Display results
fprintf('Smoothing coefficient (rho) = ');
fprintf('  %8.2f', rho);
fprintf('\nResidual sum of squares     = ');
fprintf('%10.3f', rss);
fprintf('\nDegrees of freedom          = ');
fprintf('%10.3f', df);
fprintf('\n\n    x       y                            Fitted Values\n');
fprintf('%8.4f%8.4f%24.4f%10.4f%10.4f\n', [x y yhat]');

fig1 = figure;
plot(x,y,'+',x,yhat(:,1),x,yhat(:,2),x,yhat(:,3));
legend('Raw data', '\rho = 1', '\rho = 10', '\rho = 100', ...
       'Location','NorthWest');
xlabel('Age (years)');
ylabel('C-peptide concentration (pmol/ml)');
title({'Cubic smoothing spline', ...
       'Factors affecting insulin-dependent diabetis mellitus', ...
       'in children; Hastie and Tibshirani (1990)'});

g10ab example results

Smoothing coefficient (rho) =       1.00     10.00    100.00
Residual sum of squares     =      9.118    11.288    11.881
Degrees of freedom          =     22.505    27.785    31.191

    x       y                            Fitted Values
  0.9000  3.0000                  3.3784    3.3674    3.3699
  1.0000  3.9000                  3.4173    3.4008    3.4063
  1.8000  3.4000                  3.6144    3.6642    3.6973
  1.9000  3.7000                  3.6639    3.7016    3.7341
  2.2000  3.9000                  3.8607    3.8214    3.8449
  4.2000  5.1000                  4.7441    4.5265    4.5194
  4.8000  4.2000                  4.4914    4.6471    4.6746
  5.1000  4.6000                  4.6708    4.7561    4.7470
  5.2000  4.8500                  4.7704    4.7993    4.7702
  5.8000  5.6000                  5.3426    5.0458    4.8879
  6.9000  5.1000                  5.1728    5.1204    4.9753
  7.9000  4.8000                  4.9467    4.9590    4.9537
  8.1000  5.2000                  4.9556    4.9262    4.9452
  8.5000  5.3000                  4.8742    4.8595    4.9276
  8.8000  4.1000                  4.7305    4.8172    4.9168
  8.9000  4.9000                  4.7024    4.8095    4.9143
  9.8000  4.8000                  4.8394    4.8676    4.9170
  9.9000  4.9000                  4.8746    4.8818    4.9191
 10.4000  5.0000                  4.9971    4.9445    4.9303
 10.5000  5.2000                  4.9997    4.9521    4.9321
 10.6000  5.0000                  4.9921    4.9572    4.9335
 10.8000  5.1000                  4.9603    4.9613    4.9354
 11.0000  4.4000                  4.9396    4.9614    4.9363
 11.1000  4.9000                  4.9494    4.9618    4.9366
 11.3000  5.1000                  4.9926    4.9623    4.9366
 11.5000  5.5000                  5.0116    4.9568    4.9355
 11.8000  4.6000                  4.9372    4.9338    4.9315
 11.9000  5.1000                  4.9042    4.9251    4.9300
 12.4000  5.2000                  4.7929    4.8943    4.9240
 12.5000  4.1000                  4.8042    4.8944    4.9237
 12.7000  3.4000                  4.9020    4.9051    4.9244
 12.8000  6.6000                  4.9752    4.9138    4.9252
 13.2000  5.3000                  5.0173    4.9239    4.9276
 13.8000  3.7000                  4.6164    4.8930    4.9304
 14.5000  5.7000                  5.1883    4.9938    4.9518
 15.5000  4.9000                  4.9854    4.9773    4.9687
 15.6000  4.9000                  4.9167    4.9682    4.9697

On entry,	$n < 3$ ,
or	$ldc < n - 1$ ,
or	$rho < 0.0$ ,
or	$mode \neq'Q'$ , $'P'$ or $'F'$ ,
or	$weight \neq'W'$ or $'U'$ .

NAG Toolbox: nag_smooth_fit_spline (g10ab)

▸▿ Contents

Purpose

Syntax

Description