G03EJF computes a cluster indicator variable from the results of
G03ECF.
Given a distance or dissimilarity matrix for
$n$ objects, cluster analysis aims to group the
$n$ objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see
G03ECF), a hierarchical tree is produced by starting with
$n$ clusters each with a single object and then at each of
$n1$ stages, merging two clusters to form a larger cluster until all objects are in a single cluster. G03EJF takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see
G03EHF) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and G03EJF will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
 1: N – INTEGERInput
On entry: $n$, the number of objects.
Constraint:
${\mathbf{N}}\ge 2$.
 2: CD(${\mathbf{N}}1$) – REAL (KIND=nag_wp) arrayInput
On entry: the clustering distances in increasing order as returned by
G03ECF.
Constraint:
${\mathbf{CD}}\left(\mathit{i}+1\right)\ge {\mathbf{CD}}\left(\mathit{i}\right)$, for $\mathit{i}=1,2,\dots ,{\mathbf{N}}2$.
 3: IORD(N) – INTEGER arrayInput
On entry: the objects in dendrogram order as returned by
G03ECF.
 4: DORD(N) – REAL (KIND=nag_wp) arrayInput
On entry: the clustering distances corresponding to the order in
IORD.
 5: K – INTEGERInput/Output
On entry: indicates if a specified number of clusters is required.
If
${\mathbf{K}}>0$ then G03EJF will attempt to find
K clusters.
If
${\mathbf{K}}\le 0$ then G03EJF will find the clusters based on the distance given in
DLEVEL.
Constraint:
${\mathbf{K}}\le {\mathbf{N}}$.
On exit: the number of clusters produced, $k$.
 6: DLEVEL – REAL (KIND=nag_wp)Input/Output
On entry: if
${\mathbf{K}}\le 0$,
DLEVEL must contain the distance at which clusters are produced. Otherwise
DLEVEL need not be set.
Constraint:
if ${\mathbf{DLEVEL}}>0.0$, ${\mathbf{K}}\le 0$.
On exit: if
${\mathbf{K}}>0$ on entry,
DLEVEL contains the distance at which the required number of clusters are found. Otherwise
DLEVEL remains unchanged.
 7: IC(N) – INTEGER arrayOutput
On exit: ${\mathbf{IC}}\left(\mathit{i}\right)$ indicates to which of $k$ clusters the $\mathit{i}$th object belongs, for $\mathit{i}=1,2,\dots ,n$.
 8: IFAIL – INTEGERInput/Output

On entry:
IFAIL must be set to
$0$,
$1\text{ or}1$. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
$1\text{ or}1$ is recommended. If the output of error messages is undesirable, then the value
$1$ is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
$0$.
When the value $\mathbf{1}\text{ or}\mathbf{1}$ is used it is essential to test the value of IFAIL on exit.
On exit:
${\mathbf{IFAIL}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see
Section 6).
If on entry
${\mathbf{IFAIL}}={\mathbf{0}}$ or
${{\mathbf{1}}}$, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
The accuracy will depend upon the accuracy of the distances in
CD and
DORD (see
G03ECF).
A fixed number of clusters can be found using the nonhierarchical method used in
G03EFF.
Data consisting of three variables on five objects are input. Euclidean squared distances are computed using
G03EAF and median clustering performed using
G03ECF. A dendrogram is produced by
G03EHF and printed. G03EJF finds two clusters and the results are printed.