Class CentroidClustering<T,U>

java.lang.Object
smile.clustering.PartitionClustering
smile.clustering.CentroidClustering<T,U>
Type Parameters:
T - the type of centroids.
U - the tpe of observations. Usually, T and U are the same. But in case of SIB, they are different.
All Implemented Interfaces:
Serializable, Comparable<CentroidClustering<T,U>>
Direct Known Subclasses:
CLARANS, DeterministicAnnealing, GMeans, KMeans, KModes, SIB, XMeans

public abstract class CentroidClustering<T,U> extends PartitionClustering implements Comparable<CentroidClustering<T,U>>
In centroid-based clustering, clusters are represented by a central vector, which may not necessarily be a member of the data set. When the number of clusters is fixed to k, k-means clustering gives a formal definition as an optimization problem: find the k cluster centers and assign the objects to the nearest cluster center, such that the squared distances from the cluster are minimized.

Variations of k-means include restricting the centroids to members of the data set (k-medoids), choosing medians (k-medians clustering), choosing the initial centers less randomly (k-means++) or allowing a fuzzy cluster assignment (fuzzy c-means), etc.

Most k-means-type algorithms require the number of clusters to be specified in advance, which is considered to be one of the biggest drawbacks of these algorithms. Furthermore, the algorithms prefer clusters of approximately similar size, as they will always assign an object to the nearest centroid. This often leads to incorrectly cut borders of clusters (which is not surprising since the algorithm optimizes cluster centers, not cluster borders).

See Also:
  • Field Details

    • distortion

      public final double distortion
      The total distortion.
    • centroids

      public final T[] centroids
      The centroids of each cluster.
  • Constructor Details

    • CentroidClustering

      public CentroidClustering(double distortion, T[] centroids, int[] y)
      Constructor.
      Parameters:
      distortion - the total distortion.
      centroids - the centroids of each cluster.
      y - the cluster labels.
  • Method Details

    • compareTo

      public int compareTo(CentroidClustering<T,U> o)
      Specified by:
      compareTo in interface Comparable<T>
    • distance

      protected abstract double distance(T x, U y)
      The distance function.
      Parameters:
      x - an observation.
      y - the other observation.
      Returns:
      the distance.
    • predict

      public int predict(U x)
      Classifies a new observation.
      Parameters:
      x - a new observation.
      Returns:
      the cluster label.
    • toString

      public String toString()
      Overrides:
      toString in class PartitionClustering