Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_stat_frequency_table (g01ae)

Purpose

nag_stat_frequency_table (g01ae) constructs a frequency distribution of a variable, according to either user-supplied, or function-calculated class boundary values.

Syntax

[cb, ifreq, xmin, xmax, ifail] = g01ae(k, x, 'n', n, 'cb', cb)
[cb, ifreq, xmin, xmax, ifail] = nag_stat_frequency_table(k, x, 'n', n, 'cb', cb)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 23: iclass is no longer an input parameter; cb was made optional; k was made a compulsory input parameter

Description

The data consists of a sample of $n$ observations of a continuous variable, denoted by ${x}_{i}$, for $\mathit{i}=1,2,\dots ,n$. Let $a=\mathrm{min}\phantom{\rule{0.125em}{0ex}}\left({x}_{1},\dots ,{x}_{n}\right)$ and $b=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left({x}_{1},\dots ,{x}_{n}\right)$.
nag_stat_frequency_table (g01ae) constructs a frequency distribution with $k\left(>1\right)$ classes denoted by ${f}_{i}$, for $\mathit{i}=1,2,\dots ,k$.
The boundary values may be either user-supplied, or function-calculated, and are denoted by ${y}_{j}$, for $\mathit{j}=1,2,\dots ,k-1$.
If the boundary values of the classes are to be function-calculated, then they are determined in one of the following ways:
 (a) if $k>2$, the range of $x$ values is divided into $k-2$ intervals of equal length, and two extreme intervals, defined by the class boundary values ${y}_{1},{y}_{2},\dots ,{y}_{k-1}$; (b) if $k=2$, ${y}_{1}=\frac{1}{2}\left(a+b\right)$.
However formed, the values ${y}_{1},\dots ,{y}_{k-1}$ are assumed to be in ascending order. The class frequencies are formed with
• ${f}_{1}=\text{}$ the number of $x$ values in the interval $\left(-\infty ,{y}_{1}\right)$
• ${f}_{i}=\text{}$ the number of $x$ values in the interval $\left[{y}_{i-1},{y}_{i}\right)$, $\text{ }i=2,\dots ,k-1$
• ${f}_{k}=\text{}$ the number of $x$ values in the interval $\left[{y}_{k-1},\infty \right)$,
where [ means inclusive, and ) means exclusive. If the class boundary values are function-calculated and $k>2$, then ${f}_{1}={f}_{k}=0$, and ${y}_{1}$ and ${y}_{k-1}$ are chosen so that ${y}_{1} and ${y}_{k-1}>b$
If a frequency distribution is required for a discrete variable, then it is suggested that you supply the class boundary values; function-calculated boundary values may be slightly imprecise (due to the adjustment of ${y}_{1}$ and ${y}_{k-1}$ outlined above) and cause values very close to a class boundary to be assigned to the wrong class.

Parameters

Compulsory Input Parameters

1:     $\mathrm{k}$int64int32nag_int scalar
$k$, the number of classes desired in the frequency distribution. Whether or not class boundary values are user-supplied, k must include the two extreme classes which stretch to $±\infty$.
Constraint: ${\mathbf{k}}\ge 2$.
2:     $\mathrm{x}\left({\mathbf{n}}\right)$ – double array
The sample of observations of the variable for which the frequency distribution is required, ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n$. The values may be in any order.

Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the dimension of the array x.
$n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 1$.
2:     $\mathrm{cb}\left({\mathbf{k}}\right)$ – double array
If cb is not supplied, nag_stat_frequency_table (g01ae) calculates $k-1$ class boundary values.
If cb is supplied, the first $k-1$ elements of cb must contain the class boundary values you supplied, in ascending order.
Constraint: ${\mathbf{cb}}\left(\mathit{i}\right)<{\mathbf{cb}}\left(\mathit{i}+1\right)$, for $\mathit{i}=1,2,\dots ,k-2$.

Output Parameters

1:     $\mathrm{cb}\left({\mathbf{k}}\right)$ – double array
The first $k-1$ elements of cb contain the class boundary values in ascending order.
2:     $\mathrm{ifreq}\left({\mathbf{k}}\right)$int64int32nag_int array
The elements of ifreq contain the frequencies in each class, ${f}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,k$. In particular ${\mathbf{ifreq}}\left(1\right)$ contains the frequency of the class up to ${\mathbf{cb}}\left(1\right)$, ${f}_{1}$, and ${\mathbf{ifreq}}\left(k\right)$ contains the frequency of the class greater than ${\mathbf{cb}}\left(k-1\right)$, ${f}_{k}$.
3:     $\mathrm{xmin}$ – double scalar
The smallest value in the sample, $a$.
4:     $\mathrm{xmax}$ – double scalar
The largest value in the sample, $b$.
5:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{k}}<2$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{n}}<1$.
${\mathbf{ifail}}=3$
 On entry, the user-supplied class boundary values are not in ascending order.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

Accuracy

The method used is believed to be stable.

The time taken by nag_stat_frequency_table (g01ae) increases with k and n. It also depends on the distribution of the sample observations.

Example

This example summarises a number of datasets. For each dataset the sample observations and optionally class boundary values are read. nag_stat_frequency_table (g01ae) is then called and the frequency distribution and largest and smallest observations printed.
```function g01ae_example

fprintf('g01ae example results\n\n');

x = [22.3; 21.6; 22.6; 22.4; 22.4; 22.4; 22.1; 21.9; 23.1; 23.4; 23.4;
22.6; 22.5; 22.5; 22.1; 22.6; 22.3; 22.4; 21.8; 22.3; 22.1; 23.6;
20.8; 22.2; 23.1; 21.1; 21.7; 21.4; 21.6; 22.5; 21.2; 22.6; 22.2;
22.2; 21.4; 21.7; 23.2; 23.1; 22.3; 22.3; 21.1; 21.4; 21.5; 21.8;
22.8; 21.4; 20.7; 21.6; 23.2; 23.6; 22.7; 21.7; 23.0; 21.9; 22.6;
22.1; 22.2; 23.4; 21.5; 23.0; 22.8; 21.4; 23.2; 21.8; 21.2; 22.0;
22.4; 22.8; 23.2; 23.6];

k = int64(7);
[cb, ifreq, xmin, xmax, ifail] = g01ae(k, x);

fprintf('Number of cases     %3d\n',size(x,1));
fprintf('Number of classes   %3d\n\n',k);
fprintf('Routine-supplied class boundaries\n\n');
fprintf('        Class            Frequency\n');
fprintf('%9s to%7.2f%14d\n', 'Up', cb(1), ifreq(1));
for i=2:k-1
fprintf('%7.2f   to%7.2f%14d\n', cb(i-1), cb(i), ifreq(i));
end
fprintf('%7.2f  and%7s%14d\n\n', cb(k-1), 'over', ifreq(k));
fprintf('Total frequency = %5d\n',sum(ifreq));
fprintf('Minimum         = %8.2f\n',xmin);
fprintf('Maximum         = %8.2f\n',xmax);

```
```g01ae example results

Number of cases      70
Number of classes     7

Routine-supplied class boundaries

Class            Frequency
Up to  20.70             0
20.70   to  21.28             6
21.28   to  21.86            16
21.86   to  22.44            21
22.44   to  23.02            14
23.02   to  23.60            13
23.60  and   over             0

Total frequency =    70
Minimum         =    20.70
Maximum         =    23.60
```