smile.clustering.BBDTree

public class BBDTree extends Object

Balanced Box-Decomposition Tree. BBD tree is a specialized k-d tree that vastly speeds up an iteration of k-means. This is used internally by KMeans and batch SOM., and will most likely not need to be used directly.

The structure works as follows:

All data are placed into a tree where we choose child nodes by partitioning all data along a plane parallel to the axis.
We maintain for each node, the bounding box of all data stored at that node.
To do a k-means iteration, we need to assign data to clusters and calculate the sum and the number of data assigned to each cluster. For each node in the tree, we can rule out some cluster centroids as being too far away from every single point in that bounding box. Once only one cluster is left, all data in the node can be assigned to that cluster in batch.

References

Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE TRANS. PAMI, 2002.

See Also:

Constructor Summary

Constructors

Constructor

Description

BBDTree(double[][] data)

Constructs a tree out of the given n data points living in R^d.
Method Summary

Modifier and Type

Method

Description

double

clustering(int k, double[][] centroids, double[][] sum, int[] size, int[] group)

Given k cluster centroids, this method assigns data to nearest centroids.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- BBDTree
  
  public BBDTree(double[][] data)
  
  Constructs a tree out of the given n data points living in R^d.
  
  Parameters:
  
  data - the data points.
Method Details
- clustering
  
  public double clustering(int k, double[][] centroids, double[][] sum, int[] size, int[] group)
  
  Given k cluster centroids, this method assigns data to nearest centroids. The return value is the distortion to the centroids. The parameter sums will hold the sum of data for each cluster. The parameter counts hold the number of data of each cluster. If membership is not null, it should be an array of size n that will be filled with the index of the cluster [0, k) that each data point is assigned to.
  
  Parameters:
  
  k - the number of clusters.
  
  centroids - the current centroids of clusters.
  
  sum - the workspace storing the sum of data in each cluster.
  
  size - the number of samples in each cluster.
  
  group - the class labels.
  
  Returns:
  
  the distortion.

Class BBDTree

References

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Details

BBDTree

Method Details

clustering