g10ba:: Smoothing in Statistics (NAG Toolbox)

Description

Given a sample of

n

observations,

x_{1}, x_{2}, \dots, x_{n}

, from a distribution with unknown density function,

f (x)

, an estimate of the density function,

\hat{f} (x)

, may be required. The simplest form of density estimator is the histogram. This may be defined by:

\hat{f} (x) = \frac{1}{n h} n_{j}, a + (j - 1) h < x < a + j h, j = 1, 2, \dots, n_{s},

where

n_{j}

is the number of observations falling in the interval

a + (j - 1) h

a + j h

a

is the lower bound to the histogram and

b = n_{s} h

is the upper bound. The value

h

is known as the window width. To produce a smoother density estimate a kernel method can be used. A kernel function,

K (t)

, satisfies the conditions:

\int_{- \infty}^{\infty} K (t) d t = 1 and K (t) \geq 0 .

The kernel density estimator is then defined as

\hat{f} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h}) .

The choice of

K

is usually not important but to ease the computational burden use can be made of the Gaussian kernel defined as

K (t) = \frac{1}{\sqrt{2 π}} e^{- t^{2} / 2} .

The smoothness of the estimator depends on the window width

h

. The larger the value of

h

the smoother the density estimate. The value of

h

can be chosen by examining plots of the smoothed density for different values of

h

or by using cross-validation methods (see Silverman (1990)).

Silverman (1982) and Silverman (1990) show how the Gaussian kernel density estimator can be computed using a fast Fourier transform (fft). In order to compute the kernel density estimate over the range

a

b

the following steps are required.

(i)	Discretize the data to give $n_{s}$ equally spaced points $t_{l}$ with weights $ξ_{l}$ (see Jones and Lotwick (1984)).
(ii)	Compute the fft of the weights $ξ_{l}$ to give $Y_{l}$ .
(iii)	Compute $ζ_{l} = e^{- \frac{1}{2} h^{2} s_{l}^{2}} Y_{l}$ where $s_{l} = 2 π l / (b - a)$ .
(iv)	Find the inverse fft of $ζ_{l}$ to give $\hat{f} (x)$ .

To compute the kernel density estimate for further values of

h

only steps (iii) and (iv) need be repeated.

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

Further Comments

Example

function g10ba_example


fprintf('g10ba example results\n\n');

% sample data
x = [  0.114 -0.232 -0.570  1.853 -0.994 ...
      -0.374 -1.028  0.509  0.881 -0.453 ...
       0.588 -0.625 -1.622 -0.567  0.421 ...
      -0.475  0.054  0.817  1.015  0.608 ...
      -1.353 -0.912 -1.136  1.067  0.121 ...
      -0.075 -0.745  1.217 -1.058 -0.894 ...
       1.026 -0.967 -1.065  0.513  0.969 ...
       0.582 -0.985  0.097  0.416 -0.514 ...
       0.898 -0.154  0.617 -0.436 -1.212 ...
      -1.571  0.210 -1.101  1.018 -1.702 ...
      -2.230 -0.648 -0.350  0.446 -2.667 ...
       0.094 -0.380 -2.852 -0.888 -1.481 ...
      -0.359 -0.554  1.531  0.052 -1.715 ...
       1.255 -0.540  0.362 -0.654 -0.272 ...
      -1.810  0.269 -1.918  0.001  1.240 ...
      -0.368 -0.647 -2.282  0.498  0.001 ...
      -3.059 -1.171  0.566  0.948  0.925 ...
       0.825  0.130  0.930  0.523  0.443 ...
      -0.649  0.554 -2.823  0.158 -1.180 ...
       0.610  0.877  0.791 -0.078  1.412 ];

% Control parameters
window = 0.4;
slo    = -5;
shi    = 5;
usefft = false;
fft    = zeros(100,1);

% Perform kernel density estimation
[smooth, t, fft, ifail] = g10ba( ...
                                 x, window, slo, shi, usefft, fft);

% Display the results
fprintf('Window Width Used = %11.4e\n', window);
fprintf('Interval = (%11.4e, %11.4e)\n\n', slo, shi);
fprintf('First 20 output values:\n\n');
fprintf('      Time        Density\n');
fprintf('      Point       Estimate\n');
fprintf(' ---------------------------\n');
fprintf('%13.3e%13.3e\n', [t(1:20), smooth(1:20)]');

fig1 = figure;
plot(t,smooth);
title('Plot of the Smoothed Density (window = 0.4)');
xlabel('t');
ylabel('Density estimate');
set(gca, 'XTick', [-5:5]);

g10ba example results

Window Width Used =  4.0000e-01
Interval = (-5.0000e+00,  5.0000e+00)

First 20 output values:

      Time        Density
      Point       Estimate
 ---------------------------
   -4.950e+00    4.108e-12
   -4.850e+00    3.915e-11
   -4.750e+00    3.309e-10
   -4.650e+00    2.480e-09
   -4.550e+00    1.649e-08
   -4.450e+00    9.730e-08
   -4.350e+00    5.097e-07
   -4.250e+00    2.372e-06
   -4.150e+00    9.817e-06
   -4.050e+00    3.615e-05
   -3.950e+00    1.186e-04
   -3.850e+00    3.475e-04
   -3.750e+00    9.100e-04
   -3.650e+00    2.136e-03
   -3.550e+00    4.504e-03
   -3.450e+00    8.556e-03
   -3.350e+00    1.468e-02
   -3.250e+00    2.283e-02
   -3.150e+00    3.225e-02
   -3.050e+00    4.154e-02

On entry,	$n \leq 0$ ,
or	$ns < 2$ ,
or	$shi \leq slo$ ,
or	$window \leq 0.0$ .

On entry,	nag_smooth_kerndens_gauss (g10ba) has been called with $usefft = true$ but the function has not been called previously with $usefft = false$ ,
or	nag_smooth_kerndens_gauss (g10ba) has been called with $usefft = true$ but some of the arguments n, slo, shi, ns have been changed since the previous call to nag_smooth_kerndens_gauss (g10ba) with $usefft = false$ .

NAG Toolbox: nag_smooth_withdraw_kerndens_gauss (g10ba)

▸▿ Contents

Purpose

Syntax