naginterfaces.library.mv.cluster_​hier

naginterfaces.library.mv.cluster_hier(method, n, d)[source]

cluster_hier performs hierarchical cluster analysis.

For full information please refer to the NAG Library document for g03ec

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g03/g03ecf.html

Parameters
methodint

Indicates which clustering method is used.

Single link.

Complete link.

Group average.

Centroid.

Median.

Minimum variance.

nint

, the number of objects.

dfloat, array-like, shape

The strictly lower triangle of the distance matrix. must be stored packed by rows, i.e., , must contain .

Returns
dfloat, ndarray, shape

Is overwritten.

ilcint, ndarray, shape

contains the number, , of the cluster merged with cluster (see ), , at step , for .

iucint, ndarray, shape

contains the number, , of the cluster merged with cluster , , at step , for .

cdfloat, ndarray, shape

contains the distance , between clusters and , , merged at step , for .

iordint, ndarray, shape

The objects in dendrogram order.

dordfloat, ndarray, shape

The clustering distances corresponding to the order in . contains the distance at which cluster and merge, for . contains the maximum distance.

Raises
NagValueError
(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: , , , , or .

(errno )

On entry, at least one element of is negative.

(errno )

Minimum cluster distance not increasing, dendrogram invalid.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Given a distance or dissimilarity matrix for objects (see distance_mat()), cluster analysis aims to group the objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods, a hierarchical tree is produced by starting with clusters, each with a single object and then at each of stages, merging two clusters to form a larger cluster, until all objects are in a single cluster. This process may be represented by a dendrogram (see cluster_hier_dendrogram()).

At each stage, the clusters that are nearest are merged, methods differ as to how the distances between the new cluster and other clusters are computed. For three clusters , and let , and be the number of objects in each cluster and let , and be the distances between the clusters. Let clusters and be merged to give cluster , then the distance from cluster to cluster , can be computed in the following ways.

  1. Single link or nearest neighbour : .

  2. Complete link or furthest neighbour : .

  3. Group average : .

  4. Centroid : .

  5. Median : .

  6. Minimum variance : .

For further details see Everitt (1974) and Krzanowski (1990).

If the clusters are numbered then, for convenience, if clusters and , , merge then the new cluster will be referred to as cluster . Information on the clustering history is given by the values of , and for each of the clustering steps. In order to produce a dendrogram, the ordering of the objects such that the clusters that merge are adjacent is required. This ordering is computed so that the first element is . The associated distances with this ordering are also computed.

References

Everitt, B S, 1974, Cluster Analysis, Heinemann

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press