g10bb:: Smoothing in Statistics (NAG Toolbox)

Syntax

[window, slo, shi, smooth, t, rcomm, ifail] = nag_smooth_kerndens_gauss(x, fcall, rcomm, 'n', n, 'wtype', wtype, 'window', window, 'slo', slo, 'shi', shi, 'ns', ns)

Description

Given a sample of

n

observations,

x_{1}, x_{2}, \dots, x_{n}

, from a distribution with unknown density function,

f (x)

, an estimate of the density function,

\hat{f} (x)

, may be required. The simplest form of density estimator is the histogram. This may be defined by:

\hat{f} (x) = \frac{1}{n h} n_{j}, a + (j - 1) h < x < a + j h, j = 1, 2, \dots, n_{s},

where

n_{j}

is the number of observations falling in the interval

a + (j - 1) h

a + j h

a

is the lower bound to the histogram,

b = n_{s} h

is the upper bound and

n_{s}

is the total number of intervals. The value

h

is known as the window width. To produce a smoother density estimate a kernel method can be used. A kernel function,

K (t)

, satisfies the conditions:

\int_{- \infty}^{\infty} K (t) d t = 1 and K (t) \geq 0 .

The kernel density estimator is then defined as

\hat{f} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h}) .

The choice of

K

is usually not important but to ease the computational burden use can be made of the Gaussian kernel defined as

K (t) = \frac{1}{\sqrt{2 π}} e^{- t^{2} / 2} .

The smoothness of the estimator depends on the window width

h

. The larger the value of

h

the smoother the density estimate. The value of

h

can be chosen by examining plots of the smoothed density for different values of

h

or by using cross-validation methods (see Silverman (1990)).

Silverman (1982) and Silverman (1990) show how the Gaussian kernel density estimator can be computed using a fast Fourier transform (FFT). In order to compute the kernel density estimate over the range

a

b

the following steps are required.

(i)	Discretize the data to give $n_{s}$ equally spaced points $t_{l}$ with weights $ξ_{l}$ (see Jones and Lotwick (1984)).
(ii)	Compute the FFT of the weights $ξ_{l}$ to give $Y_{l}$ .
(iii)	Compute $ζ_{l} = e^{- \frac{1}{2} h^{2} s_{l}^{2}} Y_{l}$ where $s_{l} = 2 π l / (b - a)$ .
(iv)	Find the inverse FFT of $ζ_{l}$ to give $\hat{f} (x)$ .

To compute the kernel density estimate for further values of

h

only steps (iii) and (iv) need be repeated.

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

Further Comments

Example

function g10bb_example


fprintf('g10bb example results\n\n');

% kernel density estimation from 100 values
x  = [ 0.114 -0.232 -0.570  1.853 -0.994 ...
      -0.374 -1.028  0.509  0.881 -0.453 ...
       0.588 -0.625 -1.622 -0.567  0.421 ...
      -0.475  0.054  0.817  1.015  0.608 ...
      -1.353 -0.912 -1.136  1.067  0.121 ...
      -0.075 -0.745  1.217 -1.058 -0.894 ...
       1.026 -0.967 -1.065  0.513  0.969 ...
       0.582 -0.985  0.097  0.416 -0.514 ...
       0.898 -0.154  0.617 -0.436 -1.212 ...
      -1.571  0.210 -1.101  1.018 -1.702 ...
      -2.230 -0.648 -0.350  0.446 -2.667 ...
       0.094 -0.380 -2.852 -0.888 -1.481 ...
      -0.359 -0.554  1.531  0.052 -1.715 ...
       1.255 -0.540  0.362 -0.654 -0.272 ...
      -1.810  0.269 -1.918  0.001  1.240 ...
      -0.368 -0.647 -2.282  0.498  0.001 ...
      -3.059 -1.171  0.566  0.948  0.925 ...
       0.825  0.130  0.930  0.523  0.443 ...
      -0.649  0.554 -2.823  0.158 -1.180 ...
       0.610  0.877  0.791 -0.078  1.412];

% Calculate window width from data.
wtype = int64(2);

% First Call
fcall = int64(1);
ns    = 512;
rcomm = zeros(ns+20,1);

% Perform kernel density estimation
[window, slo, shi, smooth, t, rcomm, ifail] = ...
   g10bb( ...
          x, fcall, rcomm, 'wtype',wtype);

% Display the results
fprintf('Window Width Used = %11.4e\n', window);
fprintf('Interval = (%11.4e,%11.4e)\n\n', slo, shi);
fprintf('First %2d output values:\n\n',20);
fprintf('    Time point      Density estimate\n');
fprintf('    ----------      ----------------\n');
fprintf(' %13.4f     %13.4e\n', [t(1:20), smooth(1:20)]')

fig1 = figure;
plot(t,smooth);
title('Gaussian Kernel Density Estimation');
xlabel('t');
ylabel('Density Estimate');
wind_leg = sprintf('window = %7.4f',window);
legend(wind_leg);
legend('boxoff');

g10bb example results

Window Width Used =  3.7638e-01
Interval = (-4.1882e+00, 2.9822e+00)

First 20 output values:

    Time point      Density estimate
    ----------      ----------------
       -4.1811        3.8281e-06
       -4.1671        4.0305e-06
       -4.1531        4.4233e-06
       -4.1391        5.0212e-06
       -4.1251        5.8461e-06
       -4.1111        6.9279e-06
       -4.0971        8.3048e-06
       -4.0831        1.0025e-05
       -4.0691        1.2145e-05
       -4.0551        1.4736e-05
       -4.0411        1.7881e-05
       -4.0271        2.1677e-05
       -4.0131        2.6239e-05
       -3.9991        3.1700e-05
       -3.9851        3.8214e-05
       -3.9711        4.5960e-05
       -3.9571        5.5141e-05
       -3.9431        6.5990e-05
       -3.9291        7.8775e-05
       -3.9151        9.3796e-05

NAG Toolbox: nag_smooth_kerndens_gauss (g10bb)

▸▿ Contents

Purpose