Package smile.clustering
Class PartitionClustering
java.lang.Object
smile.clustering.PartitionClustering
 All Implemented Interfaces:
Serializable
 Direct Known Subclasses:
CentroidClustering
,DBSCAN
,DENCLUE
,MEC
,SpectralClustering
Partition clustering. Partition methods classify the observations
into distinct nonoverlapping groups.
 See Also:

Field Summary

Constructor Summary

Method Summary
Modifier and TypeMethodDescriptionstatic <T extends PartitionClustering & Comparable<? super T>>
TRuns a clustering algorithm multiple times and return the best one (e.g.static <T> double[]
seed
(T[] data, T[] medoids, int[] y, ToDoubleBiFunction<T, T> distance) Initialize cluster membership of input objects with KMeans++ algorithm.toString()

Field Details

OUTLIER
public static final int OUTLIERCluster label for outliers or noises. See Also:

k
public final int kThe number of clusters. 
y
public final int[] yThe cluster labels of data. 
size
public final int[] sizeThe number of observations in each cluster.


Constructor Details

PartitionClustering
public PartitionClustering(int k, int[] y) Constructor. Parameters:
k
 the number of clusters.y
 the cluster labels.


Method Details

toString

seed
Initialize cluster membership of input objects with KMeans++ algorithm. Many clustering methods, e.g. kmeans, need an initial clustering configuration as a seed.KMeans++ is based on the intuition of spreading the k initial cluster centers away from each other. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its distance squared to the point's closest cluster center.
The exact algorithm is as follows:
 Choose one center uniformly at random from among the data points.
 For each data point x, compute D(x), the distance between x and the nearest center that has already been chosen.
 Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D^{2}(x).
 Repeat Steps 2 and 3 until k centers have been chosen.
 Now that the initial centers have been chosen, proceed using standard kmeans clustering.
 D. Arthur and S. Vassilvitskii. "Kmeans++: the advantages of careful seeding". ACMSIAM symposium on Discrete algorithms, 10271035, 2007.
 Anna D. Peterson, Arka P. Ghosh and Ranjan Maitra. A systematic evaluation of different methods for initializing the Kmeans clustering algorithm. 2010.
 Type Parameters:
T
 the type of input object. Parameters:
data
 data objects array of size n.medoids
 an array of size k to store cluster medoids on output.y
 an array of size n to store cluster labels on output.distance
 the distance function. Returns:
 an array of size n to store the distance of each observation to nearest medoid.

run
public static <T extends PartitionClustering & Comparable<? super T>> T run(int runs, Supplier<T> clustering) Runs a clustering algorithm multiple times and return the best one (e.g. smallest distortion). Type Parameters:
T
 the data type. Parameters:
runs
 the number of runs.clustering
 the clustering algorithm. Returns:
 the model.
