nag_mv_cluster_indicator (g03ejc) (PDF version)
g03 Chapter Contents
g03 Chapter Introduction
NAG C Library Manual

NAG Library Function Document

nag_mv_cluster_indicator (g03ejc)

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

nag_mv_cluster_indicator (g03ejc) computes a cluster indicator variable from the results of nag_mv_hierar_cluster_analysis (g03ecc).

2  Specification

#include <nag.h>
#include <nagg03.h>
void  nag_mv_cluster_indicator (Integer n, const double cd[], const Integer iord[], const double dord[], Integer *k, double *dlevel, Integer ic[], NagError *fail)

3  Description

Given a distance or dissimilarity matrix for n  objects, cluster analysis aims to group the n  objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see nag_mv_hierar_cluster_analysis (g03ecc)), a hierarchical tree is produced by starting with n  clusters each with a single object and then at each of n-1  stages, merging two clusters to form a larger cluster until all objects are in a single cluster. nag_mv_cluster_indicator (g03ejc) takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see nag_mv_dendrogram (g03ehc)) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and nag_mv_cluster_indicator (g03ejc) will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are k  clusters then the indicator variable will assign a value between 1 and k  to each object to indicate to which cluster it belongs. Object 1 always belongs to cluster 1.

4  References

Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press

5  Arguments

1:     nIntegerInput
On entry: the number of objects, n .
Constraint: n2 .
2:     cd[n-1]const doubleInput
On entry: the clustering distances in increasing order as returned by nag_mv_hierar_cluster_analysis (g03ecc).
Constraint: cd[i] cd[i-1] , for i=1,2,,n-2.
3:     iord[n]const IntegerInput
On entry: the objects in the dendrogram order as returned by nag_mv_hierar_cluster_analysis (g03ecc).
4:     dord[n]const doubleInput
On entry: the clustering distances corresponding to the order in iord.
5:     kInteger *Input/Output
On entry: indicates if a specified number of clusters is required.
nag_mv_cluster_indicator (g03ejc) will attempt to find k clusters.
nag_mv_cluster_indicator (g03ejc) will find the clusters based on the distance given in dlevel.
Constraint: kn .
On exit: the number of clusters produced, k .
6:     dleveldouble *Input/Output
On entry: if k0 , then dlevel must contain the distance at which clusters are produced. Otherwise dlevel need not be set.
Constraint: if k0 , dlevel>0.0 .
On exit: if k>0  on entry, then dlevel contains the distance at which the required number of clusters are found. Otherwise dlevel remains unchanged.
7:     ic[n]IntegerOutput
On exit: ic[i-1]  indicates to which of k  clusters the i th object belongs, for i=1,2,,n.
8:     failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

On entry, k=value  while n=value . These arguments must satisfy kn .
The precise number of clusters requested is not possible because of
tied clustering distances. The actual number of clusters produced is value.
Arrays cd and dord are not compatible.
On entry, n=value.
Constraint: n2.
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
The sequence cd is not increasing:
cd[value] = value, cd[value] = value.
On entry, dlevel=value , k=value .
Constraint: k0  and dlevel>0.0 .
On exit, k=value , n=value .
Trivial solution returned.
On exit, k=1 .
Trivial solution returned.
On entry, dlevel=value , cd[value] = value.
Trivial solution returned.

7  Accuracy

The accuracy will depend upon the accuracy of the distances in cd and dord (see nag_mv_hierar_cluster_analysis (g03ecc)).

8  Further Comments

A fixed number of clusters can be found using the non-hierarchical method used in nag_mv_kmeans_cluster_analysis (g03efc).

9  Example

Data consisting of three variables on five objects are input. Euclidean squared distances are computed using nag_mv_distance_mat (g03eac) and median clustering performed using nag_mv_hierar_cluster_analysis (g03ecc). A dendrogram is produced by nag_mv_dendrogram (g03ehc) and printed. nag_mv_cluster_indicator (g03ejc) finds two clusters and the results are printed.

9.1  Program Text

Program Text (g03ejce.c)

9.2  Program Data

Program Data (g03ejce.d)

9.3  Program Results

Program Results (g03ejce.r)

nag_mv_cluster_indicator (g03ejc) (PDF version)
g03 Chapter Contents
g03 Chapter Introduction
NAG C Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012