# NAG FL Interfaceg03ejf (cluster_​hier_​indicator)

## 1Purpose

g03ejf computes a cluster indicator variable from the results of g03ecf.

## 2Specification

Fortran Interface
 Subroutine g03ejf ( n, cd, iord, dord, k, ic,
 Integer, Intent (In) :: n, iord(n) Integer, Intent (Inout) :: k, ifail Integer, Intent (Out) :: ic(n) Real (Kind=nag_wp), Intent (In) :: cd(n-1), dord(n) Real (Kind=nag_wp), Intent (Inout) :: dlevel
#include <nag.h>
 void g03ejf_ (const Integer *n, const double cd[], const Integer iord[], const double dord[], Integer *k, double *dlevel, Integer ic[], Integer *ifail)
The routine may be called by the names g03ejf or nagf_mv_cluster_hier_indicator.

## 3Description

Given a distance or dissimilarity matrix for $n$ objects, cluster analysis aims to group the $n$ objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see g03ecf), a hierarchical tree is produced by starting with $n$ clusters each with a single object and then at each of $n-1$ stages, merging two clusters to form a larger cluster until all objects are in a single cluster. g03ejf takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see g03ehf) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and g03ejf will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are $k$ clusters then the indicator variable will assign a value between $1$ and $k$ to each object to indicate to which cluster it belongs. Object $1$ always belongs to cluster $1$.

## 4References

Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

## 5Arguments

1: $\mathbf{n}$Integer Input
On entry: $n$, the number of objects.
Constraint: ${\mathbf{n}}\ge 2$.
2: $\mathbf{cd}\left({\mathbf{n}}-1\right)$Real (Kind=nag_wp) array Input
On entry: the clustering distances in increasing order as returned by g03ecf.
Constraint: ${\mathbf{cd}}\left(\mathit{i}+1\right)\ge {\mathbf{cd}}\left(\mathit{i}\right)$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}-2$.
3: $\mathbf{iord}\left({\mathbf{n}}\right)$Integer array Input
On entry: the objects in dendrogram order as returned by g03ecf.
4: $\mathbf{dord}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Input
On entry: the clustering distances corresponding to the order in iord.
5: $\mathbf{k}$Integer Input/Output
On entry: indicates if a specified number of clusters is required.
If ${\mathbf{k}}>0$ then g03ejf will attempt to find k clusters.
If ${\mathbf{k}}\le 0$ then g03ejf will find the clusters based on the distance given in dlevel.
Constraint: ${\mathbf{k}}\le {\mathbf{n}}$.
On exit: the number of clusters produced, $k$.
6: $\mathbf{dlevel}$Real (Kind=nag_wp) Input/Output
On entry: if ${\mathbf{k}}\le 0$, dlevel must contain the distance at which clusters are produced. Otherwise dlevel need not be set.
Constraint: if ${\mathbf{dlevel}}>0.0$, ${\mathbf{k}}\le 0$.
On exit: if ${\mathbf{k}}>0$ on entry, dlevel contains the distance at which the required number of clusters are found. Otherwise dlevel remains unchanged.
7: $\mathbf{ic}\left({\mathbf{n}}\right)$Integer array Output
On exit: ${\mathbf{ic}}\left(\mathit{i}\right)$ indicates to which of $k$ clusters the $\mathit{i}$th object belongs, for $\mathit{i}=1,2,\dots ,n$.
8: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{k}}<0$ and ${\mathbf{dlevel}}\le 0.0$.
On entry, ${\mathbf{k}}=〈\mathit{\text{value}}〉$ and ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{k}}\le {\mathbf{n}}$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 2$.
${\mathbf{ifail}}=2$
On entry the values of cd are not in increasing order.
On entry the values of dord and cd are not compatible.
${\mathbf{ifail}}=3$
All data is merged when ${\mathbf{k}}=1$.
All data merged into one cluster at dlevel, ${\mathbf{dlevel}}=〈\mathit{\text{value}}〉$.
No clustering is performed when ${\mathbf{k}}={\mathbf{n}}$.
No clustering takes place below dlevel, ${\mathbf{dlevel}}=〈\mathit{\text{value}}〉$.
${\mathbf{ifail}}=4$
The precise number of clusters requested is not possible because of tied clustering distances. The actual number of clusters is returned in k.
${\mathbf{ifail}}=-99$
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

The accuracy will depend upon the accuracy of the distances in cd and dord (see g03ecf).

## 8Parallelism and Performance

g03ejf is not threaded in any implementation.

A fixed number of clusters can be found using the non-hierarchical method used in g03eff.

## 10Example

Data consisting of three variables on five objects are input. Euclidean squared distances are computed using g03eaf and median clustering performed using g03ecf. A dendrogram is produced by g03ehf and printed. g03ejf finds two clusters and the results are printed.

### 10.1Program Text

Program Text (g03ejfe.f90)

### 10.2Program Data

Program Data (g03ejfe.d)

### 10.3Program Results

Program Results (g03ejfe.r)