naginterfaces.library.mv.cluster_hier¶

naginterfaces.library.mv.cluster_hier(method, n, d)[source]¶

cluster_hier performs hierarchical cluster analysis.

For full information please refer to the NAG Library document for g03ec

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g03/g03ecf.html

Parameters

methodint

Indicates which clustering method is used.

$m e t h o d = 1$

Single link.

$m e t h o d = 2$

Complete link.

$m e t h o d = 3$

Group average.

$m e t h o d = 4$

Centroid.

$m e t h o d = 5$

Median.

$m e t h o d = 6$

Minimum variance.

nint

$n$ , the number of objects.

dfloat, array-like, shape $(n \times (n - 1) / 2)$

The strictly lower triangle of the distance matrix. $D$ must be stored packed by rows, i.e., $d [(i - 1) (i - 2) / 2 + j - 1]$ , $i > j$ must contain $d_{i j}$ .

Returns

dfloat, ndarray, shape $(n \times (n - 1) / 2)$: Is overwritten.
ilcint, ndarray, shape $(n - 1)$: $i l c [l - 1]$ contains the number, $j$ , of the cluster merged with cluster $k$ (see $i u c$ ), $j < k$ , at step $l$ , for $l = 1, 2, \dots, n - 1$ .
iucint, ndarray, shape $(n - 1)$: $i u c [l - 1]$ contains the number, $k$ , of the cluster merged with cluster $j$ , $j < k$ , at step $l$ , for $l = 1, 2, \dots, n - 1$ .
cdfloat, ndarray, shape $(n - 1)$: $c d [l - 1]$ contains the distance $d_{j k}$ , between clusters $j$ and $k$ , $j < k$ , merged at step $l$ , for $l = 1, 2, \dots, n - 1$ .
iordint, ndarray, shape $(n)$: The objects in dendrogram order.
dordfloat, ndarray, shape $(n)$: The clustering distances corresponding to the order in $i o r d$ . $d o r d [l - 1]$ contains the distance at which cluster $i o r d [l - 1]$ and $i o r d [l]$ merge, for $l = 1, 2, \dots, n - 1$ . $d o r d [n - 1]$ contains the maximum distance.

Raises

NagValueError

(errno $1$ )

On entry, $n = ⟨ v a l u e ⟩$ .

Constraint: $n \geq 2$ .

(errno $1$ )

On entry, $m e t h o d = ⟨ v a l u e ⟩$ .

Constraint: $m e t h o d = 1$ , $2$ , $3$ , $4$ , $5$ or $6$ .

(errno $2$ )

On entry, at least one element of $d$ is negative.

(errno $3$ )

Minimum cluster distance not increasing, dendrogram invalid.

Notes

In the NAG Library the traditional C interface for this routine uses a different algorithmic base. Please contact NAG if you have any questions about compatibility.

Given a distance or dissimilarity matrix for $n$ objects (see distance_mat()), cluster analysis aims to group the $n$ objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods, a hierarchical tree is produced by starting with $n$ clusters, each with a single object and then at each of $n - 1$ stages, merging two clusters to form a larger cluster, until all objects are in a single cluster. This process may be represented by a dendrogram (see cluster_hier_dendrogram()).

At each stage, the clusters that are nearest are merged, methods differ as to how the distances between the new cluster and other clusters are computed. For three clusters $i$ , $j$ and $k$ let $n_{i}$ , $n_{j}$ and $n_{k}$ be the number of objects in each cluster and let $d_{i j}$ , $d_{i k}$ and $d_{j k}$ be the distances between the clusters. Let clusters $j$ and $k$ be merged to give cluster $j k$ , then the distance from cluster $i$ to cluster $j k$ , $d_{i . j k}$ can be computed in the following ways.

Single link or nearest neighbour : $d_{i . j k} = m i n (d_{i j}, d_{i k})$ .
Complete link or furthest neighbour : $d_{i . j k} = m a x (d_{i j}, d_{i k})$ .
Group average : $d_{i . j k} = \frac{n_{j}}{n_{j} + n_{k}} d_{i j} + \frac{n_{k}}{n_{j} + n_{k}} d_{i k}$ .
Centroid : $d_{i . j k} = \frac{n_{j}}{n_{j} + n_{k}} d_{i j} + \frac{n_{k}}{n_{j} + n_{k}} d_{i k} - \frac{n_{j} n_{k}}{{(n_{j} + n_{k})}_{j}^{2}} d_{j k}$ .
Median : $d_{i . j k} = \frac{1}{2} d_{i j} + \frac{1}{2} d_{i k} - \frac{1}{4} d_{j k}$ .
Minimum variance : $d_{i . j k} = {(n_{i} + n_{j}) d_{i j} + (n_{i} + n_{k}) d_{i k} - n_{i} d_{j k}} / (n_{i} + n_{j} + n_{k})$ .

For further details see Everitt (1974) and Krzanowski (1990).

If the clusters are numbered $1, 2, \dots, n$ then, for convenience, if clusters $j$ and $k$ , $j < k$ , merge then the new cluster will be referred to as cluster $j$ . Information on the clustering history is given by the values of $j$ , $k$ and $d_{j k}$ for each of the $n - 1$ clustering steps. In order to produce a dendrogram, the ordering of the objects such that the clusters that merge are adjacent is required. This ordering is computed so that the first element is $1$ . The associated distances with this ordering are also computed.

References

Everitt, B S, 1974, Cluster Analysis, Heinemann

Krzanowski, W J, 1990, Principles of Multivariate Analysis, Oxford University Press

NAG and Python

Return to Front

naginterfaces.library.mv.cluster_hier¶

naginterfaces.library.mv.cluster_​hier¶

naginterfaces.library.mv.cluster_hier¶