hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_mv_cluster_hier_indicator (g03ej)


    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example


nag_mv_cluster_hier_indicator (g03ej) computes a cluster indicator variable from the results of nag_mv_cluster_hier (g03ec).


[k, dlevel, ic, ifail] = g03ej(cd, iord, dord, k, dlevel, 'n', n)
[k, dlevel, ic, ifail] = nag_mv_cluster_hier_indicator(cd, iord, dord, k, dlevel, 'n', n)


Given a distance or dissimilarity matrix for n objects, cluster analysis aims to group the n objects into a number of more or less homogeneous groups or clusters. With agglomerative clustering methods (see nag_mv_cluster_hier (g03ec)), a hierarchical tree is produced by starting with n clusters each with a single object and then at each of n-1 stages, merging two clusters to form a larger cluster until all objects are in a single cluster. nag_mv_cluster_hier_indicator (g03ej) takes the information from the tree and produces the clusters that exist at a given distance. This is equivalent to taking the dendrogram (see nag_mv_cluster_hier_dendrogram (g03eh)) and drawing a line across at a given distance to produce clusters.
As an alternative to giving the distance at which clusters are required, you can specify the number of clusters required and nag_mv_cluster_hier_indicator (g03ej) will compute the corresponding distance. However, it may not be possible to compute the number of clusters required due to ties in the distance matrix.
If there are k clusters then the indicator variable will assign a value between 1 and k to each object to indicate to which cluster it belongs. Object 1 always belongs to cluster 1.


Everitt B S (1974) Cluster Analysis Heinemann
Krzanowski W J (1990) Principles of Multivariate Analysis Oxford University Press


Compulsory Input Parameters

1:     cdn-1 – double array
The clustering distances in increasing order as returned by nag_mv_cluster_hier (g03ec).
Constraint: cdi+1cdi, for i=1,2,,n-2.
2:     iordn int64int32nag_int array
The objects in dendrogram order as returned by nag_mv_cluster_hier (g03ec).
3:     dordn – double array
The clustering distances corresponding to the order in iord.
4:     k int64int32nag_int scalar
Indicates if a specified number of clusters is required.
If k>0 then nag_mv_cluster_hier_indicator (g03ej) will attempt to find k clusters.
If k0 then nag_mv_cluster_hier_indicator (g03ej) will find the clusters based on the distance given in dlevel.
Constraint: kn.
5:     dlevel – double scalar
If k0, dlevel must contain the distance at which clusters are produced. Otherwise dlevel need not be set.
Constraint: if dlevel>0.0, k0.

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the dimension of the arrays iord, dord. (An error is raised if these dimensions are not equal.)
n, the number of objects.
Constraint: n2.

Output Parameters

1:     k int64int32nag_int scalar
The number of clusters produced, k.
2:     dlevel – double scalar
If k>0 on entry, dlevel contains the distance at which the required number of clusters are found. Otherwise dlevel remains unchanged.
3:     icn int64int32nag_int array
ici indicates to which of k clusters the ith object belongs, for i=1,2,,n.
4:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

On entry,k>n,
ork0 and dlevel0.0.
On entry,cd is not in increasing order,
ordord is incompatible with cd.
On entry,k=1,
Note:  on exit with this value of ifail the trivial clustering solution is returned.
W  ifail=4
The precise number of clusters requested is not possible because of tied clustering distances. The actual number of clusters, less than the number requested, is returned in k.
An unexpected error has been triggered by this routine. Please contact NAG.
Your licence key may have expired or may not have been installed correctly.
Dynamic memory allocation failed.


The accuracy will depend upon the accuracy of the distances in cd and dord (see nag_mv_cluster_hier (g03ec)).

Further Comments

A fixed number of clusters can be found using the non-hierarchical method used in nag_mv_cluster_kmeans (g03ef).


Data consisting of three variables on five objects are input. Euclidean squared distances are computed using nag_mv_distance_mat (g03ea) and median clustering performed using nag_mv_cluster_hier (g03ec). A dendrogram is produced by nag_mv_cluster_hier_dendrogram (g03eh) and printed. nag_mv_cluster_hier_indicator (g03ej) finds two clusters and the results are printed.
function g03ej_example

fprintf('g03ej example results\n\n');

x = [1, 5, 2;
     2, 1, 1;
     3, 4, 3;
     4, 1, 2;
     5, 5, 0];
[n,m]  = size(x);

isx    = ones(m,1,'int64');
isx(1) = int64(0);
s      = ones(m,1);
ld     = (n*(n-1))/2;
d      = zeros(ld,1);

% Compute the distance matrix
update = 'I';
dist = 'S';
scal = 'U';
[s, d, ifail] = g03ea( ...
		       update, dist, scal, x, isx, s, d);

% Clustering method
method = int64(5);
% Perform clustering
n      = int64(n);
[d, ilc, iuc, cd, iord, dord, ifail] = ...
  g03ec(method, n, d);

row = {'A'; 'B'; 'C'; 'D'; 'E'};
fprintf('  Distance   Clusters Joined\n\n');
for i = 1:n-1
  fprintf('%10.3f     %s %s\n', cd(i), row{ilc(i)}, row{iuc(i)})

% k clusters
k = int64(2);
dlevel = 0;

% Compute cluster indicators
[k, dlevel, ic, ifail] = g03ej( ...
				cd, iord, dord, k, dlevel);

% Display the indicators
fprintf('\n Allocation to %2d clusters\n', k);
fprintf(' Clusters found at distance %6.3f\n\n', dlevel);
fprintf(' Object  Cluster\n\n');
for i=1:n
  fprintf('%6s     %2d\n',row{i}, ic(i));

g03ej example results

  Distance   Clusters Joined

     1.000     B D
     2.000     A C
     6.500     A E
    14.125     A B

 Allocation to  2 clusters
 Clusters found at distance  6.500

 Object  Cluster

     A      1
     B      2
     C      1
     D      2
     E      1

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015