Package smile.clustering
Class GMeans
java.lang.Object
smile.clustering.PartitionClustering
smile.clustering.CentroidClustering<double[],double[]>
smile.clustering.GMeans
- All Implemented Interfaces:
Serializable
,Comparable<CentroidClustering<double[],
double[]>>
G-Means clustering algorithm, an extended K-Means which tries to
automatically determine the number of clusters by normality test.
The G-means algorithm is based on a statistical test for the hypothesis
that a subset of data follows a Gaussian distribution. G-means runs
k-means with increasing k in a hierarchical fashion until the test accepts
the hypothesis that the data assigned to each k-means center are Gaussian.
References
- G. Hamerly and C. Elkan. Learning the k in k-means. NIPS, 2003.
- See Also:
-
Field Summary
Fields inherited from class smile.clustering.CentroidClustering
centroids, distortion
Fields inherited from class smile.clustering.PartitionClustering
k, OUTLIER, size, y
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected double
distance
(double[] x, double[] y) The distance function.static GMeans
fit
(double[][] data, int kmax) Clustering data with the number of clusters determined by G-Means algorithm automatically.static GMeans
fit
(double[][] data, int kmax, int maxIter, double tol) Clustering data with the number of clusters determined by G-Means algorithm automatically.Methods inherited from class smile.clustering.CentroidClustering
compareTo, predict, toString
Methods inherited from class smile.clustering.PartitionClustering
run, seed
-
Constructor Details
-
GMeans
public GMeans(double distortion, double[][] centroids, int[] y) Constructor.- Parameters:
distortion
- the total distortion.centroids
- the centroids of each cluster.y
- the cluster labels.
-
-
Method Details
-
distance
protected double distance(double[] x, double[] y) Description copied from class:CentroidClustering
The distance function.- Specified by:
distance
in classCentroidClustering<double[],
double[]> - Parameters:
x
- an observation.y
- the other observation.- Returns:
- the distance.
-
fit
Clustering data with the number of clusters determined by G-Means algorithm automatically.- Parameters:
data
- the input data of which each row is an observation.kmax
- the maximum number of clusters.- Returns:
- the model.
-
fit
Clustering data with the number of clusters determined by G-Means algorithm automatically.- Parameters:
data
- the input data of which each row is an observation.kmax
- the maximum number of clusters.maxIter
- the maximum number of iterations for k-means.tol
- the tolerance of k-means convergence test.- Returns:
- the model.
-