hclust

fun hclust(data: Array<DoubleArray>, method: String): HierarchicalClustering

Agglomerative Hierarchical Clustering. This method seeks to build a hierarchy of clusters in a bottom up approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. The results of hierarchical clustering are usually presented in a dendrogram.

In general, the merges are determined in a greedy manner. In order to decide which clusters should be combined, a measure of dissimilarity between sets of observations is required. In most methods of hierarchical clustering, this is achieved by use of an appropriate metric, and a linkage criteria which specifies the dissimilarity of sets as a function of the pairwise distances of observations in the sets.

Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a matrix of distances.

References

  • David Eppstein. Fast hierarchical clustering and other applications of dynamic closest pairs. SODA 1998.

Parameters

data

The data set.

method

the agglomeration method to merge clusters. This should be one of "single", "complete", "upgma", "upgmc", "wpgma", "wpgmc", and "ward".


fun <T> hclust(data: Array<T>, distance: Distance<T>, method: String): HierarchicalClustering

Agglomerative Hierarchical Clustering. This method seeks to build a hierarchy of clusters in a bottom up approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. The results of hierarchical clustering are usually presented in a dendrogram.

In general, the merges are determined in a greedy manner. In order to decide which clusters should be combined, a measure of dissimilarity between sets of observations is required. In most methods of hierarchical clustering, this is achieved by use of an appropriate metric, and a linkage criteria which specifies the dissimilarity of sets as a function of the pairwise distances of observations in the sets.

Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a matrix of distances.

References

  • David Eppstein. Fast hierarchical clustering and other applications of dynamic closest pairs. SODA 1998.

Parameters

data

The data set.

distance

the distance/dissimilarity measure.

method

the agglomeration method to merge clusters. This should be one of "single", "complete", "upgma", "upgmc", "wpgma", "wpgmc", and "ward".