Class DeterministicAnnealing

java.lang.Object
smile.clustering.PartitionClustering
smile.clustering.CentroidClustering<double[],double[]>
smile.clustering.DeterministicAnnealing
All Implemented Interfaces:
Serializable, Comparable<CentroidClustering<double[],double[]>>

public class DeterministicAnnealing extends CentroidClustering<double[],double[]>
Deterministic annealing clustering. Deterministic annealing extends soft-clustering to an annealing process. For each temperature value, the algorithm iterates between the calculation of all posteriori probabilities and the update of the centroids vectors, until convergence is reached. The annealing starts with a high temperature. Here, all centroids vectors converge to the center of the pattern distribution (independent of their initial positions). Below a critical temperature the vectors start to split. Further decreasing the temperature leads to more splittings until all centroids vectors are separate. The annealing can therefore avoid (if it is sufficiently slow) the convergence to local minima.

References

  1. Kenneth Rose. Deterministic Annealing for Clustering, Compression, Classification, Regression, and Speech Recognition.
See Also:
  • Constructor Details

    • DeterministicAnnealing

      public DeterministicAnnealing(double distortion, double[][] centroids, int[] y)
      Constructor.
      Parameters:
      distortion - the total distortion.
      centroids - the centroids of each cluster.
      y - the cluster labels.
  • Method Details

    • distance

      protected double distance(double[] x, double[] y)
      Description copied from class: CentroidClustering
      The distance function.
      Specified by:
      distance in class CentroidClustering<double[],double[]>
      Parameters:
      x - an observation.
      y - the other observation.
      Returns:
      the distance.
    • fit

      public static DeterministicAnnealing fit(double[][] data, int Kmax)
      Clustering data into k clusters.
      Parameters:
      data - the input data of which each row is an observation.
      Kmax - the maximum number of clusters.
      Returns:
      the model.
    • fit

      public static DeterministicAnnealing fit(double[][] data, int Kmax, double alpha, int maxIter, double tol, double splitTol)
      Clustering data into k clusters.
      Parameters:
      data - the input data of which each row is an observation.
      Kmax - the maximum number of clusters.
      alpha - the temperature T is decreasing as T = T * alpha. alpha has to be in (0, 1).
      maxIter - the maximum number of iterations at each temperature.
      tol - the tolerance of convergence test.
      splitTol - the tolerance to split a cluster.
      Returns:
      the model.