Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_stat_plot_stem_leaf (g01ar)

## Purpose

nag_stat_plot_stem_leaf (g01ar) produces a stem and leaf display for a single sample of observations.

## Syntax

[unit, lines, ifail, plot, sorty] = g01ar(y, nstepx, nstepy, 'range', range, 'prt', prt, 'n', n, 'unit', unit)
[unit, lines, ifail, plot, sorty] = nag_stat_plot_stem_leaf(y, nstepx, nstepy, 'range', range, 'prt', prt, 'n', n, 'unit', unit)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 23: range was made optional (default 'E'); prt was made optional (default 'P'); unit was made optional (default 0); output parameters were reordered

## Description

nag_stat_plot_stem_leaf (g01ar) produces a stem and leaf display for a single sample of $n$ observations. The stem and leaf display shows data values separated into the form of a ‘stem’ and a ‘leaf’. For example, a value of $473$ could be represented as $47$ $3$ where the stem is $47$ and the leaf is $3$. The data is scaled using a value known as the ‘leaf digit unit’. In the above example the leaf digit unit would be $1.0$.
The following example illustrates a stem and leaf display.
For the $10$ observations:
 $1.8 2.3 2.1 1.9 2.1 2.4 2.0 2.0 1.9 2.1$
the stem and leaf display is:
```1  1  8
3  1  99
5  2  00
5  2  111
2  2
2  2  3
1  2  4```
where the leaf digit unit is $0.1$ so that $1$ $8$ represents $1.8$ (i.e., $18×0.1$). The leaf digit unit distinguishes between the numbers $18.0$, $1.8$, $0.18$, etc. which may otherwise all be represented by $1$ $8$.
Included in the above display is an initial column specifying the cumulative count of values, up to and including that particular line, from either the top or bottom of the display, whichever is smaller. An exception to this is when the line on which the median lies is reached, in which case the actual count of values on that line is displayed, rather than a cumulative count, and this is highlighted by enclosing the count in parentheses. In this case the median is $2.05$ and thus falls between the two lines at which the cumulative count has reached $n/2$ where $n$ is the number of observations.
Some of the other features of the stem and leaf display are illustrated by the following two examples.
For the $30$ observations:
 $-19.0 -3.0 -1.0 0.0 1.0 2.0 2.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 7.0 7.0 8.0 10.0 11.0 11.0 13.0 31.0$
the stem and leaf display may be:
``` 1   1.  9
1   1*
1  -0.
3  -0*  13
15  +0*  012233344444
15  +0.  55556667788
5   1*  011
2   1.  3
1   2
1   2.
1   3   1
```
In the above display all the data are plotted and the leaf digit unit is $1.0$. Also in this display different leaves, that is different digits, may be plotted on a particular line. In this case we have $5$ possible digits per line, that is $2$ lines per stem, and these are represented as follows:
• * indicates that the line may contain the digits $0$ to 4;
• . indicates that the line may contain the digits $5$ to $9$.
Alternatively the stem and leaf display may look like:
```      LO   -19

2   -0*  3
3   +0T  1
5   +0*  01
10   +0T  22333
( 9)  +0F  444445555
11   +0*  66677
6   +0T  8
5    1*  011
2    1T  3

HI   31
```
Again the leaf digit unit is $1.0$ but in this display just the data between the fences, which are the hinges $±1\frac{1}{2}×\text{}$ the inter-hinge range, are plotted. Any data points that fall outside the fences are presented separately in the display under the headings LO for those points below the lower fence and HI for those points above the upper fence.
Again in this display different leaves, that is different digits, may be plotted on a particular line. However in this case we have $2$ possible digits per line, that is $5$ lines per stem, and these are represented as follows
• * indicates that the line may contain the digits $0$ or $1$;
• T indicates that the line may contain the digits $2$ or $3$;
• F indicates that the line may contain the digits $4$ or $5$;
• S indicates that the line may contain the digits $6$ or $7$;
• . indicates that the line may contain the digits $8$ or $9$.
A display may also allow $10$ different digits ($0$ to $9$) per line, that is $1$ line per stem, or just $1$ digit per line, that is $10$ lines per stem, as in the first of the three examples above.
Note that the median here is $4.5$. This falls between two lines in the first display but is highlighted on the second display since it lies on a particular line.
Finally if there are positive and negative numbers on the display these are highlighted by a $+$ or $-$ sign where the distinction is required, that is near the zero-point.
If there are too many leaves to fit in the plot width allowed, nag_stat_plot_stem_leaf (g01ar) plots as many leaves as possible and places an asterisk to the right to indicate that some leaves are not displayed. If this occurs and you wish to be able to plot all the leaves then the width of the plot may be adjusted.
Options also allow the leaf unit and the height of the display to be specified by you or calculated by nag_stat_plot_stem_leaf (g01ar). These arguments may be used to control the type of the display you wish to obtain. Fixing the unit and changing the height of the display may alter the number of lines used per stem, that is the number of different digits per line. nag_stat_plot_stem_leaf (g01ar) will choose a display for the fixed unit that attempts to make as much use of the available height as possible, thus increasing the height may allow for more lines per stem whereas decreasing the height may force the display to use fewer lines per stem. Similarly you may wish to fix the height and vary the leaf digit unit used on the display. See Further Comments for further details.
The display is returned in a character array with the option of printing the display.

## References

Erickson B H and Nosanchuk T A (1985) Understanding Data Open University Press, Milton Keynes
Tukey J W (1977) Exploratory Data Analysis Addison–Wesley
Velleman P F and Hoaglin D C (1981) Applications, Basics, and Computing of Exploratory Data Analysis Duxbury Press, Boston, MA

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{y}\left({\mathbf{n}}\right)$ – double array
The $n$ observations.
2:     $\mathrm{nstepx}$int64int32nag_int scalar
The number of character positions to be plotted horizontally.
Constraint: ${\mathbf{nstepx}}\ge 35$.
3:     $\mathrm{nstepy}$int64int32nag_int scalar
The maximum number of character positions to be plotted vertically.
If ${\mathbf{nstepy}}\le 0$ a suitable value will be used by nag_stat_plot_stem_leaf (g01ar) for the number of character positions to be plotted vertically. This will clearly be less than or equal to the value of ldplot.
Constraint: ${\mathbf{nstepy}}\le 0$ or ${\mathbf{nstepy}}\ge 5$.

### Optional Input Parameters

1:     $\mathrm{range}$ – string (length ≥ 1)
Default: $\text{'E'}$
Indicates whether you wish to scale the plot to the extremes of the data or to the fences.
${\mathbf{range}}=\text{'E'}$
The display is a plot to the extremes, that is a plot of all the data.
${\mathbf{range}}=\text{'F'}$
The display is a plot of the data between the fences.
Constraint: ${\mathbf{range}}=\text{'E'}$ or $\text{'F'}$.
2:     $\mathrm{prt}$ – string (length ≥ 1)
Default: $\text{'P'}$
Indicates whether the stem and leaf display is to be output to an external file.
${\mathbf{prt}}=\text{'N'}$
The display is not output to an external file.
${\mathbf{prt}}=\text{'P'}$
The display is output to the current advisory message unit as defined by nag_file_set_unit_advisory (x04ab). Only the first $132$ characters of each line are actually printed.
Constraint: ${\mathbf{prt}}=\text{'P'}$ or $\text{'N'}$.
3:     $\mathrm{n}$int64int32nag_int scalar
Default: the dimension of the array y.
$n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 2$.
4:     $\mathrm{unit}$ – double scalar
Default: $0$
Indicates the leaf digit unit to be used.
If ${\mathbf{unit}}>0.0$ and is not a power of ten, it will be converted to the nearest power of ten below the input value for unit.
If ${\mathbf{unit}}\le 0.0$, the optimum unit will be used. This is based on the range of the data to be plotted and the number of lines available for the display.

### Output Parameters

1:     $\mathrm{unit}$ – double scalar
Default: $0$
Contains the actual unit used in the stem and leaf display.
2:     $\mathrm{lines}$int64int32nag_int scalar
The actual number of lines needed for the display.
3:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).
4:     $\mathrm{plot}\left(\mathit{ldplot},{\mathbf{nstepx}}\right)$ – cell array of strings
The stem and leaf display.
5:     $\mathrm{sorty}\left({\mathbf{n}}\right)$ – double array
The observations sorted into ascending order.

## Error Indicators and Warnings

Errors or warnings detected by the function:
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{n}}<2$, or ${\mathbf{nstepx}}<35$, or $0<{\mathbf{nstepy}}<5$, or $\mathit{ldplot}<5$, or $\mathit{ldplot}<{\mathbf{nstepy}}$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{prt}}\ne \text{'P'}$ or $\text{'N'}$, or ${\mathbf{range}}\ne \text{'E'}$ or $\text{'F'}$.
${\mathbf{ifail}}=3$
The number of lines needed to produce the display exceeds the maximum number of lines allowed. You may wish to increase nstepy.
${\mathbf{ifail}}=4$
One of the observations is too large and causes a value to exceed the maximum integer allowed.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

## Accuracy

Accuracy is limited by the number of significant figures that may be represented on the display which will depend on the data, the number of lines available and the unit used.

nag_stat_plot_stem_leaf (g01ar) uses integer representations of the data. If very large data values are being used they should be scaled before using this function. The largest integer can be found by calling nag_machine_integer_max (x02bb).
If an asterisk is plotted at the end of a line to indicate that some leaves are not displayed you should increase nstepx if they wish to be able to print the rest of the leaves on that line.
Note that if you request nag_stat_plot_stem_leaf (g01ar) to print the plot only the first $132$ characters of each line are printed. The full plot is stored in the array plot so you do have the option of printing a plot which has more than $132$ characters on a line.
When the leaf digit unit is set, the number of lines per stem is decided as follows:
Let $r$ be the range of the data to be plotted:
• $r$ = largest observation – smallest observation: if all the data to both extremes are to be plotted (that is if ${\mathbf{range}}=\text{'E'}$),
• $r$ = upper fence – lower fence: if only the data between the fences are to be plotted (that is if ${\mathbf{range}}=\text{'F'}$).
Let $l$ be the number of lines available for the plot:
• $l={\mathbf{nstepy}}-4$ if ${\mathbf{nstepy}}>0$,
• $l=\mathit{ldplot}-4$ if ${\mathbf{nstepy}}\le 0$.
• The $4$ lines are subtracted to allow space for the display headings. If only the data between the fences are to be plotted then $l$ must be further reduced to allow space to present those values outside the fences. This will involve a minimum of another $4$ lines.
Let $e=\frac{\left(r/{\mathbf{unit}}\right)+1}{l}$,
• then the number of lines per stem is:
 $01​ if ​5
The time taken by the function increases with $n$.

## Example

A program to produce two stem and leaf displays for a sample of $30$ observations. The first illustrates a plot produced automatically by nag_stat_plot_stem_leaf (g01ar) and the second shows how to print the display under your control.
```function g01ar_example

fprintf('g01ar example results\n\n');

y = [31;  1;   2;   3;    4;     5;     6;     7;     8;    -9;
1;   2;   3;    4;     5;     6;     7;     8;
2;   3;    4;     5;     6;     7;
3;    4;     5;     6;
4;     5];
nstepx = int64(72);
nstepy = int64(20);

[unit, lines, ifail, plot, sorty] = ...
g01ar( ...
y, nstepx, nstepy, 'range', 'Fences');

[unit, lines, ifail, plot, sorty] = ...
g01ar( ...
y, nstepx, nstepy, 'range', 'Extremes', 'prt', 'Noprint');

fprintf('\n');
for i = 1:lines
fprintf('%s\n', char(plot(i,1:nstepx)));
end

```
```g01ar example results

Stem-and-leaf display
Leaf digit unit = 1.0
1  2  represents  12.

LO  -9

3    0  11
6    0  222
10    0  3333
15    0  44444
15    0  55555
10    0  6666
6    0  777
3    0  88

HI  31

Stem-and-leaf display
Leaf digit unit = 1.0
1  2  represents  12.

1    -0. 9
1    -0*
15    +0* 11222333344444
15    +0. 55555666677788
1     1*
1     1.
1     2*
1     2.
1     3* 1
```