NAG CL Interfaceg03ejc (cluster_​hier_​indicator)

Settings help

CL Name Style:

1Purpose

g03ejc computes a cluster indicator variable from the results of g03ecc.

2Specification

 #include
 void g03ejc (Integer n, const double cd[], const Integer iord[], const double dord[], Integer *k, double *dlevel, Integer ic[], NagError *fail)
The function may be called by the names: g03ejc, nag_mv_cluster_hier_indicator or nag_mv_cluster_indicator.

3Description

Given a distance or dissimilarity matrix for $n$ objects, cluster analysis aims to group the $n$ objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see g03ecc), a hierarchical tree is produced by starting with $n$ clusters each with a single object and then at each of $n-1$ stages, merging two clusters to form a larger cluster until all objects are in a single cluster. g03ejc takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see g03ehc) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and g03ejc will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are $k$ clusters then the indicator variable will assign a value between 1 and $k$ to each object to indicate to which cluster it belongs. Object 1 always belongs to cluster 1.

4References

Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5Arguments

1: $\mathbf{n}$Integer Input
On entry: the number of objects, $n$.
Constraint: ${\mathbf{n}}\ge 2$.
2: $\mathbf{cd}\left[{\mathbf{n}}-1\right]$const double Input
On entry: the clustering distances in increasing order as returned by g03ecc.
Constraint: ${\mathbf{cd}}\left[\mathit{i}\right]\ge {\mathbf{cd}}\left[\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}-2$.
3: $\mathbf{iord}\left[{\mathbf{n}}\right]$const Integer Input
On entry: the objects in the dendrogram order as returned by g03ecc.
4: $\mathbf{dord}\left[{\mathbf{n}}\right]$const double Input
On entry: the clustering distances corresponding to the order in iord.
5: $\mathbf{k}$Integer * Input/Output
On entry: indicates if a specified number of clusters is required.
${\mathbf{k}}>0$
g03ejc will attempt to find k clusters.
${\mathbf{k}}\le 0$
g03ejc will find the clusters based on the distance given in dlevel.
Constraint: ${\mathbf{k}}\le {\mathbf{n}}$.
On exit: the number of clusters produced, $k$.
6: $\mathbf{dlevel}$double * Input/Output
On entry: if ${\mathbf{k}}\le 0$, then dlevel must contain the distance at which clusters are produced. Otherwise dlevel need not be set.
Constraint: if ${\mathbf{k}}\le 0$, ${\mathbf{dlevel}}>0.0$.
On exit: if ${\mathbf{k}}>0$ on entry, then dlevel contains the distance at which the required number of clusters are found. Otherwise dlevel remains unchanged.
7: $\mathbf{ic}\left[{\mathbf{n}}\right]$Integer Output
On exit: ${\mathbf{ic}}\left[\mathit{i}-1\right]$ indicates to which of $k$ clusters the $\mathit{i}$th object belongs, for $\mathit{i}=1,2,\dots ,n$.
8: $\mathbf{fail}$NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6Error Indicators and Warnings

NE_2_INT_ARG_GT
On entry, ${\mathbf{k}}=⟨\mathit{\text{value}}⟩$ while ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$. These arguments must satisfy ${\mathbf{k}}\le {\mathbf{n}}$.
NE_CLUSTER
The precise number of clusters requested is not possible because of
tied clustering distances. The actual number of clusters produced is $⟨\mathit{\text{value}}⟩$.
NE_INCOMP_ARRAYS
Arrays cd and dord are not compatible.
NE_INT_ARG_LT
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}\ge 2$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NOT_INCREASING
The sequence cd is not increasing:
${\mathbf{cd}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$, ${\mathbf{cd}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
NE_REAL_INT
On entry, ${\mathbf{dlevel}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{k}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{k}}\le 0$ and ${\mathbf{dlevel}}>0.0$.
NW_2_INT
On exit, ${\mathbf{k}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Trivial solution returned.
NW_INT
On exit, ${\mathbf{k}}=1$.
Trivial solution returned.
NW_REAL_REALARR
On entry, ${\mathbf{dlevel}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{cd}}\left[⟨\mathit{\text{value}}⟩\right]=⟨\mathit{\text{value}}⟩$.
Trivial solution returned.

7Accuracy

The accuracy will depend upon the accuracy of the distances in cd and dord (see g03ecc).

8Parallelism and Performance

g03ejc is not threaded in any implementation.

A fixed number of clusters can be found using the non-hierarchical method used in g03efc.

10Example

Data consisting of three variables on five objects are input. Euclidean squared distances are computed using g03eac and median clustering performed using g03ecc. A dendrogram is produced by g03ehc and printed. g03ejc finds two clusters and the results are printed.

10.1Program Text

Program Text (g03ejce.c)

10.2Program Data

Program Data (g03ejce.d)

10.3Program Results

Program Results (g03ejce.r)