g10 Chapter Contents
g10 Chapter Introduction
NAG Library Manual

# NAG Library Function Documentnag_kernel_density_gauss (g10bbc)

## 1  Purpose

nag_kernel_density_gauss (g10bbc) performs kernel density estimation using a Gaussian kernel.

## 2  Specification

 #include #include
 void nag_kernel_density_gauss (Integer n, const double x[], Nag_WindowType wtype, double *window, double *slo, double *shi, Integer ns, double smooth[], double t[], Nag_Boolean fcall, double rcomm[], NagError *fail)

## 3  Description

Given a sample of $n$ observations, ${x}_{1},{x}_{2},\dots ,{x}_{n}$, from a distribution with unknown density function, $f\left(x\right)$, an estimate of the density function, $\stackrel{^}{f}\left(x\right)$, may be required. The simplest form of density estimator is the histogram. This may be defined by:
 $f^ x = 1nh nj , a + j-1 h < x < a + j h , j=1,2,…,ns ,$
where ${n}_{j}$ is the number of observations falling in the interval $a+\left(j-1\right)h$ to $a+jh$, $a$ is the lower bound to the histogram, $b={n}_{s}h$ is the upper bound and ${n}_{s}$ is the total number of intervals. The value $h$ is known as the window width. To produce a smoother density estimate a kernel method can be used. A kernel function, $K\left(t\right)$, satisfies the conditions:
 $∫-∞∞Ktdt=1 and Kt≥0.$
The kernel density estimator is then defined as
 $f^x=1nh ∑i= 1nK x-xih .$
The choice of $K$ is usually not important but to ease the computational burden use can be made of the Gaussian kernel defined as
 $Kt=12πe-t2/2.$
The smoothness of the estimator depends on the window width $h$. The larger the value of $h$ the smoother the density estimate. The value of $h$ can be chosen by examining plots of the smoothed density for different values of $h$ or by using cross-validation methods (see Silverman (1990)).
Silverman (1982) and Silverman (1990) show how the Gaussian kernel density estimator can be computed using a fast Fourier transform (FFT). In order to compute the kernel density estimate over the range $a$ to $b$ the following steps are required.
 (i) Discretize the data to give ${n}_{s}$ equally spaced points ${t}_{l}$ with weights ${\xi }_{l}$ (see Jones and Lotwick (1984)). (ii) Compute the FFT of the weights ${\xi }_{l}$ to give ${Y}_{l}$. (iii) Compute ${\zeta }_{l}={e}^{-\frac{1}{2}{h}^{2}{s}_{l}^{2}}{Y}_{l}$ where ${s}_{l}=2\pi l/\left(b-a\right)$. (iv) Find the inverse FFT of ${\zeta }_{l}$ to give $\stackrel{^}{f}\left(x\right)$.
To compute the kernel density estimate for further values of $h$ only steps (iii) and (iv) need be repeated.

## 4  References

Jones M C and Lotwick H W (1984) Remark AS R50. A remark on algorithm AS 176. Kernel density estimation using the Fast Fourier Transform Appl. Statist. 33 120–122
Silverman B W (1982) Algorithm AS 176. Kernel density estimation using the fast Fourier transform Appl. Statist. 31 93–99
Silverman B W (1990) Density Estimation Chapman and Hall

## 5  Arguments

1:     nIntegerInput
On entry: $n$, the number of observations in the sample.
If ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, n must be unchanged since the last call to nag_kernel_density_gauss (g10bbc).
Constraint: ${\mathbf{n}}>0$.
2:     x[n]const doubleInput
On entry: ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$.
If ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, x must be unchanged since the last call to nag_kernel_density_gauss (g10bbc).
3:     wtypeNag_WindowTypeInput
On entry: how the window width, $h$, is to be calculated:
${\mathbf{wtype}}=\mathrm{Nag_WindowSupplied}$
$h$ is supplied in window.
${\mathbf{wtype}}=\mathrm{Nag_RuleOfThumb}$
$h$ is to be calculated from the data, with
 $h=m× 0.9× minq75-q25,σ n0.2$
where ${q}_{75}-{q}_{25}$ is the inter-quartile range and $\sigma$ the standard deviation of the sample, $x$, and $m$ is a multipler supplied in window. The $25%$ and $75%$ quartiles, ${q}_{25}$ and ${q}_{75}$, are calculated using nag_double_quantiles (g01amc). This is the "rule-of-thumb" suggested by Silverman (1990).
Suggested value: ${\mathbf{wtype}}=\mathrm{Nag_RuleOfThumb}$ and ${\mathbf{window}}=1.0$
Constraint: ${\mathbf{wtype}}=\mathrm{Nag_WindowSupplied}$ or $\mathrm{Nag_RuleOfThumb}$.
4:     windowdouble *Input/Output
On entry: if ${\mathbf{wtype}}=\mathrm{Nag_WindowSupplied}$, then $h$, the window width. Otherwise, $m$, the multiplier used in the calculation of $h$.
Suggested value: ${\mathbf{window}}=1.0$ and ${\mathbf{wtype}}=\mathrm{Nag_RuleOfThumb}$
On exit: $h$, the window width actually used.
Constraint: ${\mathbf{window}}>0.0$.
5:     slodouble *Input/Output
On entry: if ${\mathbf{slo}}<{\mathbf{shi}}$ then $a$, the lower limit of the interval on which the estimate is calculated. Otherwise, $a$ and $b$, the lower and upper limits of the interval, are calculated as follows:
 $a = minixi-slo×h b = maxixi+slo×h$
where $h$ is the window width.
For most applications $a$ should be at least three window widths below the lowest data point.
If ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, slo must be unchanged since the last call to nag_kernel_density_gauss (g10bbc).
Suggested value: ${\mathbf{slo}}=3.0$ and ${\mathbf{shi}}=0.0$ which would cause $a$ and $b$ to be set $3$ window widths below and above the lowest and highest data points respectively.
On exit: $a$, the lower limit actually used.
6:     shidouble *Input/Output
On entry: if ${\mathbf{slo}}<{\mathbf{shi}}$ then $b$, the upper limit of the interval on which the estimate is calculated. Otherwise a value for $b$ is calculated from the data as stated in the description of slo and the value supplied in shi is not used.
For most applications $b$ should be at least three window widths above the highest data point.
If ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, shi must be unchanged since the last call to nag_kernel_density_gauss (g10bbc).
On exit: $b$, the upper limit actually used.
7:     nsIntegerInput
On entry: ${n}_{s}$, the number of points at which the estimate is calculated.
If ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, ns must be unchanged since the last call to nag_kernel_density_gauss (g10bbc).
Suggested value: ${\mathbf{ns}}=512$
Constraints:
• ${\mathbf{ns}}\ge 2$;
• The largest prime factor of ns must not exceed $19$, and the total number of prime factors of ns, counting repetitions, must not exceed $20$.
8:     smooth[ns]doubleOutput
On exit: $\stackrel{^}{f}\left({t}_{\mathit{l}}\right)$, for $\mathit{l}=1,2,\dots ,{n}_{s}$, the ${n}_{s}$ values of the density estimate.
9:     t[ns]doubleOutput
On exit: ${t}_{\mathit{l}}$, for $\mathit{l}=1,2,\dots ,{n}_{s}$, the points at which the estimate is calculated.
10:   fcallNag_BooleanInput
On entry: If ${\mathbf{fcall}}=\mathrm{Nag_TRUE}$ then the values of ${Y}_{l}$ are to be calculated by this call to nag_kernel_density_gauss (g10bbc), otherwise it is assumed that the values of ${Y}_{l}$ were calculated by a previous call to this routine and the relevant information is stored in rcomm.
11:   rcomm[${\mathbf{ns}}+20$]doubleCommunication Array
On entry: communication array, used to store information between calls to nag_kernel_density_gauss (g10bbc).
If ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, rcomm must be unchanged since the last call to nag_kernel_density_gauss (g10bbc).
On exit: the last ns elements of rcomm contain the fast Fourier transform of the weights of the discretized data, that is ${\mathbf{rcomm}}\left[\mathit{l}+19\right]={Y}_{\mathit{l}}$, for $\mathit{l}=1,2,\dots ,{n}_{s}$.
12:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument $⟨\mathit{\text{value}}⟩$ had an illegal value.
NE_ILLEGAL_COMM
rcomm has been corrupted between calls.
NE_INT
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}>0$.
On entry, ${\mathbf{ns}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ns}}\ge 2$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_PREV_CALL
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
On entry at previous call, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, n must be unchanged since previous call.
On entry, ${\mathbf{ns}}=⟨\mathit{\text{value}}⟩$.
On entry at previous call, ${\mathbf{ns}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, ns must be unchanged since previous call.
On entry, ${\mathbf{shi}}=⟨\mathit{\text{value}}⟩$.
On exit from previous call, ${\mathbf{shi}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, shi must be unchanged since previous call.
On entry, ${\mathbf{slo}}=⟨\mathit{\text{value}}⟩$.
On exit from previous call, ${\mathbf{slo}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{fcall}}=\mathrm{Nag_FALSE}$, slo must be unchanged since previous call.
NE_PRIME_FACTOR
On entry, ${\mathbf{ns}}=⟨\mathit{\text{value}}⟩$.
Constraint: Largest prime factor of ns must not exceed $19$.
On entry, ${\mathbf{ns}}=⟨\mathit{\text{value}}⟩$.
Constraint: Total number of prime factors of ns must not exceed $20$.
NE_REAL
On entry, ${\mathbf{window}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{window}}>0.0$.
NW_POTENTIAL_PROBLEM
On entry, ${\mathbf{slo}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{shi}}=⟨\mathit{\text{value}}⟩$.
On entry, $\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{x}}\right)=⟨\mathit{\text{value}}⟩$ and $\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({\mathbf{x}}\right)=⟨\mathit{\text{value}}⟩$.
Expected values of at least $⟨\mathit{\text{value}}⟩$ and $⟨\mathit{\text{value}}⟩$ for slo and shi.
All output values have been returned.

## 7  Accuracy

See Jones and Lotwick (1984) for a discussion of the accuracy of this method.

## 8  Parallelism and Performance

Not applicable.

The time for computing the weights of the discretized data is of order $n$, while the time for computing the FFT is of order ${n}_{s}\mathrm{log}\left({n}_{s}\right)$, as is the time for computing the inverse of the FFT.

## 10  Example

Data is read from a file and the density estimated. The first $20$ values are then printed.

### 10.1  Program Text

Program Text (g10bbce.c)

### 10.2  Program Data

Program Data (g10bbce.d)

### 10.3  Program Results

Program Results (g10bbce.r)

This plot shows the estimated density function for the example data for several window widths.