The routine may be called by the names g01arf or nagf_stat_plot_stem_leaf.
3Description
g01arf produces a stem and leaf display for a single sample of $n$ observations. The stem and leaf display shows data values separated into the form of a ‘stem’ and a ‘leaf’. For example, a value of $473$ could be represented as $47$$3$ where the stem is $47$ and the leaf is $3$. The data is scaled using a value known as the ‘leaf digit unit’. In the above example the leaf digit unit would be $1.0$.
The following example illustrates a stem and leaf display.
where the leaf digit unit is $0.1$ so that $1$$8$ represents $1.8$ (i.e., $18\times 0.1$). The leaf digit unit distinguishes between the numbers $18.0$, $1.8$, $0.18$, etc. which may otherwise all be represented by $1$$8$.
Included in the above display is an initial column specifying the cumulative count of values, up to and including that particular line, from either the top or bottom of the display, whichever is smaller. An exception to this is when the line on which the median lies is reached, in which case the actual count of values on that line is displayed, rather than a cumulative count, and this is highlighted by enclosing the count in parentheses. In this case the median is $2.05$ and thus falls between the two lines at which the cumulative count has reached $n/2$ where $n$ is the number of observations.
Some of the other features of the stem and leaf display are illustrated by the following two examples.
In the above display all the data are plotted and the leaf digit unit is $1.0$. Also in this display different leaves, that is different digits, may be plotted on a particular line. In this case we have $5$ possible digits per line, that is $2$ lines per stem, and these are represented as follows:
* indicates that the line may contain the digits $0$ to 4;
. indicates that the line may contain the digits $5$ to $9$.
Alternatively the stem and leaf display may look like:
Again the leaf digit unit is $1.0$ but in this display just the data between the fences, which are the hinges $\pm 1\frac{1}{2}\times \text{}$ the inter-hinge range, are plotted. Any data points that fall outside the fences are presented separately in the display under the headings LO for those points below the lower fence and HI for those points above the upper fence.
Again in this display different leaves, that is different digits, may be plotted on a particular line. However in this case we have $2$ possible digits per line, that is $5$ lines per stem, and these are represented as follows
* indicates that the line may contain the digits $0$ or $1$;
T indicates that the line may contain the digits $2$ or $3$;
F indicates that the line may contain the digits $4$ or $5$;
S indicates that the line may contain the digits $6$ or $7$;
. indicates that the line may contain the digits $8$ or $9$.
A display may also allow $10$ different digits ($0$ to $9$) per line, that is $1$ line per stem, or just $1$ digit per line, that is $10$ lines per stem, as in the first of the three examples above.
Note that the median here is $4.5$. This falls between two lines in the first display but is highlighted on the second display since it lies on a particular line.
Finally if there are positive and negative numbers on the display these are highlighted by a $+$ or $-$ sign where the distinction is required, that is near the zero-point.
If there are too many leaves to fit in the plot width allowed, g01arf plots as many leaves as possible and places an asterisk to the right to indicate that some leaves are not displayed. If this occurs and you wish to be able to plot all the leaves then the width of the plot may be adjusted.
Options also allow the leaf unit and the height of the display to be specified by you or calculated by g01arf. These arguments may be used to control the type of the display you wish to obtain. Fixing the unit and changing the height of the display may alter the number of lines used per stem, that is the number of different digits per line. g01arf will choose a display for the fixed unit that attempts to make as much use of the available height as possible, thus increasing the height may allow for more lines per stem whereas decreasing the height may force the display to use fewer lines per stem. Similarly you may wish to fix the height and vary the leaf digit unit used on the display. See Section 9 for further details.
The display is returned in a character array with the option of printing the display.
4References
Erickson B H and Nosanchuk T A (1985) Understanding Data Open University Press, Milton Keynes
Tukey J W (1977) Exploratory Data Analysis Addison–Wesley
Velleman P F and Hoaglin D C (1981) Applications, Basics, and Computing of Exploratory Data Analysis Duxbury Press, Boston, MA
5Arguments
1: $\mathbf{range}$ – Character(1)Input
On entry: indicates whether you wish to scale the plot to the extremes of the data or to the fences.
${\mathbf{range}}=\text{'E'}$
The display is a plot to the extremes, that is a plot of all the data.
${\mathbf{range}}=\text{'F'}$
The display is a plot of the data between the fences.
Constraint:
${\mathbf{range}}=\text{'E'}$ or $\text{'F'}$.
2: $\mathbf{prt}$ – Character(1)Input
On entry: indicates whether the stem and leaf display is to be output to an external file.
${\mathbf{prt}}=\text{'N'}$
The display is not output to an external file.
${\mathbf{prt}}=\text{'P'}$
The display is output to the current advisory message unit as defined by x04abf. Only the first $132$ characters of each line are actually printed.
Constraint:
${\mathbf{prt}}=\text{'P'}$ or $\text{'N'}$.
3: $\mathbf{n}$ – IntegerInput
On entry: $n$, the number of observations.
Constraint:
${\mathbf{n}}\ge 2$.
4: $\mathbf{y}\left({\mathbf{n}}\right)$ – Real (Kind=nag_wp) arrayInput
On entry: the $n$ observations.
5: $\mathbf{nstepx}$ – IntegerInput
On entry: the number of character positions to be plotted horizontally.
Constraint:
${\mathbf{nstepx}}\ge 35$.
6: $\mathbf{nstepy}$ – IntegerInput
On entry: the maximum number of character positions to be plotted vertically.
If ${\mathbf{nstepy}}\le 0$ a suitable value will be used by g01arf for the number of character positions to be plotted vertically. This will clearly be less than or equal to the value of ldplot.
Constraint:
${\mathbf{nstepy}}\le 0$ or ${\mathbf{nstepy}}\ge 5$.
7: $\mathbf{unit}$ – Real (Kind=nag_wp)Input/Output
On entry: indicates the leaf digit unit to be used.
If ${\mathbf{unit}}>0.0$ and is not a power of ten, it will be converted to the nearest power of ten below the input value for unit.
If ${\mathbf{unit}}\le 0.0$, the optimum unit will be used. This is based on the range of the data to be plotted and the number of lines available for the display.
On exit: contains the actual unit used in the stem and leaf display.
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).
6Error Indicators and Warnings
If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{ldplot}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{ldplot}}\ge 5$.
On entry, ${\mathbf{ldplot}}=\u27e8\mathit{\text{value}}\u27e9$ and ${\mathbf{nstepy}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{ldplot}}\ge {\mathbf{nstepy}}$.
On entry, ${\mathbf{n}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{n}}\ge 2$.
On entry, ${\mathbf{nstepx}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{nstepx}}\ge 35$.
On entry, ${\mathbf{nstepy}}>0$ and ${\mathbf{nstepy}}<5$: ${\mathbf{nstepy}}=\u27e8\mathit{\text{value}}\u27e9$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{prt}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{prt}}=\text{'P'}$ or $\text{'N'}$.
On entry, ${\mathbf{range}}=\u27e8\mathit{\text{value}}\u27e9$.
Constraint: ${\mathbf{range}}=\text{'E'}$ or $\text{'F'}$.
${\mathbf{ifail}}=3$
Lines needed for display ($\u27e8\mathit{\text{value}}\u27e9$) exceed nstepy ($\u27e8\mathit{\text{value}}\u27e9$).
${\mathbf{ifail}}=4$
A value exceeds maximum allowed for an integer.
${\mathbf{ifail}}=-99$
An unexpected error has been triggered by this routine. Please
contact NAG.
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.
7Accuracy
Accuracy is limited by the number of significant figures that may be represented on the display which will depend on the data, the number of lines available and the unit used.
8Parallelism and Performance
g01arf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.
9Further Comments
g01arf uses integer representations of the data. If very large data values are being used they should be scaled before using this routine. The largest integer can be found by calling x02bbf.
If an asterisk is plotted at the end of a line to indicate that some leaves are not displayed you should increase nstepx if they wish to be able to print the rest of the leaves on that line.
Note that if you request g01arf to print the plot only the first $132$ characters of each line are printed. The full plot is stored in the array plot so you do have the option of printing a plot which has more than $132$ characters on a line.
When the leaf digit unit is set, the number of lines per stem is decided as follows:
Let $r$ be the range of the data to be plotted:
$r$ = largest observation – smallest observation: if all the data to both extremes are to be plotted (that is if ${\mathbf{range}}=\text{'E'}$),
$r$ = upper fence – lower fence: if only the data between the fences are to be plotted (that is if ${\mathbf{range}}=\text{'F'}$).
Let $l$ be the number of lines available for the plot:
$l={\mathbf{nstepy}}-4$ if ${\mathbf{nstepy}}>0$,
$l={\mathbf{ldplot}}-4$ if ${\mathbf{nstepy}}\le 0$.
The $4$ lines are subtracted to allow space for the display headings. If only the data between the fences are to be plotted then $l$ must be further reduced to allow space to present those values outside the fences. This will involve a minimum of another $4$ lines.
Let $e=\frac{(r/{\mathbf{unit}})+1}{l}$,
then the number of lines per stem is:
$$\begin{array}{c}\phantom{0}1\text{ if}5<e\le 10\text{, that is digits per line is}10\text{,}\\ \phantom{0}2\text{ if}2<e\le \phantom{0}5\text{, that is digits per line is}5\text{,}\\ \phantom{0}5\text{ if}1<e\le \phantom{0}2\text{, that is digits per line is}2\text{,}\\ 10\text{ if}0<e\le \phantom{0}1\text{, that is digits per line is}1\text{.}\end{array}$$
The time taken by the routine increases with $n$.
10Example
A program to produce two stem and leaf displays for a sample of $30$ observations. The first illustrates a plot produced automatically by g01arf and the second shows how to print the display under your control.