g03 Chapter Contents
g03 Chapter Introduction
NAG C Library Manual

NAG Library Function Documentnag_mv_cluster_indicator (g03ejc)

1  Purpose

nag_mv_cluster_indicator (g03ejc) computes a cluster indicator variable from the results of nag_mv_hierar_cluster_analysis (g03ecc).

2  Specification

 #include #include
 void nag_mv_cluster_indicator (Integer n, const double cd[], const Integer iord[], const double dord[], Integer *k, double *dlevel, Integer ic[], NagError *fail)

3  Description

Given a distance or dissimilarity matrix for $n$ objects, cluster analysis aims to group the $n$ objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see nag_mv_hierar_cluster_analysis (g03ecc)), a hierarchical tree is produced by starting with $n$ clusters each with a single object and then at each of $n-1$ stages, merging two clusters to form a larger cluster until all objects are in a single cluster. nag_mv_cluster_indicator (g03ejc) takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see nag_mv_dendrogram (g03ehc)) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and nag_mv_cluster_indicator (g03ejc) will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are $k$ clusters then the indicator variable will assign a value between 1 and $k$ to each object to indicate to which cluster it belongs. Object 1 always belongs to cluster 1.

4  References

Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5  Arguments

1:     nIntegerInput
On entry: the number of objects, $n$.
Constraint: ${\mathbf{n}}\ge 2$.
2:     cd[${\mathbf{n}}-1$]const doubleInput
On entry: the clustering distances in increasing order as returned by nag_mv_hierar_cluster_analysis (g03ecc).
Constraint: ${\mathbf{cd}}\left[\mathit{i}\right]\ge {\mathbf{cd}}\left[\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}-2$.
3:     iord[n]const IntegerInput
On entry: the objects in the dendrogram order as returned by nag_mv_hierar_cluster_analysis (g03ecc).
4:     dord[n]const doubleInput
On entry: the clustering distances corresponding to the order in iord.
5:     kInteger *Input/Output
On entry: indicates if a specified number of clusters is required.
${\mathbf{k}}>0$
nag_mv_cluster_indicator (g03ejc) will attempt to find k clusters.
${\mathbf{k}}\le 0$
nag_mv_cluster_indicator (g03ejc) will find the clusters based on the distance given in dlevel.
Constraint: ${\mathbf{k}}\le {\mathbf{n}}$.
On exit: the number of clusters produced, $k$.
6:     dleveldouble *Input/Output
On entry: if ${\mathbf{k}}\le 0$, then dlevel must contain the distance at which clusters are produced. Otherwise dlevel need not be set.
Constraint: if ${\mathbf{k}}\le 0$, ${\mathbf{dlevel}}>0.0$.
On exit: if ${\mathbf{k}}>0$ on entry, then dlevel contains the distance at which the required number of clusters are found. Otherwise dlevel remains unchanged.
7:     ic[n]IntegerOutput
On exit: ${\mathbf{ic}}\left[\mathit{i}-1\right]$ indicates to which of $k$ clusters the $\mathit{i}$th object belongs, for $\mathit{i}=1,2,\dots ,n$.
8:     failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

NE_2_INT_ARG_GT
On entry, ${\mathbf{k}}=〈\mathit{\text{value}}〉$ while ${\mathbf{n}}=〈\mathit{\text{value}}〉$. These arguments must satisfy ${\mathbf{k}}\le {\mathbf{n}}$.
NE_CLUSTER
The precise number of clusters requested is not possible because of
tied clustering distances. The actual number of clusters produced is $〈\mathit{\text{value}}〉$.
NE_INCOMP_ARRAYS
Arrays cd and dord are not compatible.
NE_INT_ARG_LT
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 2$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_NOT_INCREASING
The sequence cd is not increasing:
${\mathbf{cd}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$, ${\mathbf{cd}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$.
NE_REAL_INT
On entry, ${\mathbf{dlevel}}=〈\mathit{\text{value}}〉$, ${\mathbf{k}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{k}}\le 0$ and ${\mathbf{dlevel}}>0.0$.
NW_2_INT
On exit, ${\mathbf{k}}=〈\mathit{\text{value}}〉$, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Trivial solution returned.
NW_INT
On exit, ${\mathbf{k}}=1$.
Trivial solution returned.
NW_REAL_REALARR
On entry, ${\mathbf{dlevel}}=〈\mathit{\text{value}}〉$, ${\mathbf{cd}}\left[〈\mathit{\text{value}}〉\right]=〈\mathit{\text{value}}〉$.
Trivial solution returned.

7  Accuracy

The accuracy will depend upon the accuracy of the distances in cd and dord (see nag_mv_hierar_cluster_analysis (g03ecc)).

A fixed number of clusters can be found using the non-hierarchical method used in nag_mv_kmeans_cluster_analysis (g03efc).

9  Example

Data consisting of three variables on five objects are input. Euclidean squared distances are computed using nag_mv_distance_mat (g03eac) and median clustering performed using nag_mv_hierar_cluster_analysis (g03ecc). A dendrogram is produced by nag_mv_dendrogram (g03ehc) and printed. nag_mv_cluster_indicator (g03ejc) finds two clusters and the results are printed.

9.1  Program Text

Program Text (g03ejce.c)

9.2  Program Data

Program Data (g03ejce.d)

9.3  Program Results

Program Results (g03ejce.r)