Class GMeans

All Implemented Interfaces:
Serializable, Comparable<CentroidClustering<double[],double[]>>

public class GMeans extends CentroidClustering<double[],double[]>
G-Means clustering algorithm, an extended K-Means which tries to automatically determine the number of clusters by normality test. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs k-means with increasing k in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each k-means center are Gaussian.

References

  1. G. Hamerly and C. Elkan. Learning the k in k-means. NIPS, 2003.
See Also:
  • Constructor Details

    • GMeans

      public GMeans(double distortion, double[][] centroids, int[] y)
      Constructor.
      Parameters:
      distortion - the total distortion.
      centroids - the centroids of each cluster.
      y - the cluster labels.
  • Method Details

    • distance

      protected double distance(double[] x, double[] y)
      Description copied from class: CentroidClustering
      The distance function.
      Specified by:
      distance in class CentroidClustering<double[],double[]>
      Parameters:
      x - an observation.
      y - the other observation.
      Returns:
      the distance.
    • fit

      public static GMeans fit(double[][] data, int kmax)
      Clustering data with the number of clusters determined by G-Means algorithm automatically.
      Parameters:
      data - the input data of which each row is an observation.
      kmax - the maximum number of clusters.
      Returns:
      the model.
    • fit

      public static GMeans fit(double[][] data, int kmax, int maxIter, double tol)
      Clustering data with the number of clusters determined by G-Means algorithm automatically.
      Parameters:
      data - the input data of which each row is an observation.
      kmax - the maximum number of clusters.
      maxIter - the maximum number of iterations for k-means.
      tol - the tolerance of k-means convergence test.
      Returns:
      the model.