Package smile.clustering
Class KMedoids<T>
java.lang.Object
smile.clustering.KMedoids<T>
- Type Parameters:
T
- the type of input data.
K-Medoids clustering based on randomized search (CLARANS). The k-medoids
algorithm is an adaptation of the k-means algorithm. Rather than calculate
the mean of the items in each cluster, a representative item, or medoid,
is chosen for each cluster at each iteration. In CLARANS, the process of
finding k medoids from n objects is viewed abstractly as searching through
a certain graph. In the graph, a node is represented by a set of k objects
as selected medoids. Two nodes are neighbors if their sets differ by only
one object. In each iteration, CLARANS considers a set of randomly chosen
neighbor nodes as candidate of new medoids. We will move to the neighbor
node if the neighbor is a better choice for medoids. Otherwise, a local
optima is discovered. The entire process is repeated multiple time to
find better.
CLARANS has two parameters: the maximum number of neighbors examined (maxNeighbor) and the number of local minima obtained (numLocal). The higher the value of maxNeighbor, the closer is CLARANS to PAM, and the longer is each search of a local minima. But the quality of such a local minima is higher and fewer local minima needs to be obtained.
The runtime is proportional to numLocal. As for the relative quality, there is an improvement from numLocal = 1 to numLocal = 2. Performing a second search for a local minimum seems to reduce the impact of "unlucky" randomness that may occur in just one search. However, setting numLocal larger than 2 is not cost-effective, as there is little increase in quality.
References
- R. Ng and J. Han. CLARANS: A Method for Clustering Objects for Spatial Data Mining. IEEE TRANS. KNOWLEDGE AND DATA ENGINEERING, 2002.
-
Method Summary
Modifier and TypeMethodDescriptionstatic <T> CentroidClustering
<T, T> Fits k-medoids clustering.static <T> CentroidClustering
<T, T> fit
(T[] data, Distance<T> distance, Clustering.Options options) Fits k-medoids clustering.
-
Method Details
-
fit
Fits k-medoids clustering.- Type Parameters:
T
- the type of input data.- Parameters:
data
- the input data of which each row is an observation.distance
- the distance measure.k
- the number of clusters.- Returns:
- the model.
-
fit
public static <T> CentroidClustering<T,T> fit(T[] data, Distance<T> distance, Clustering.Options options) Fits k-medoids clustering.- Type Parameters:
T
- the type of input data.- Parameters:
data
- the input data of which each row is an observation.distance
- the distance measure.options
- the hyperparameters. The parameter maxIter is used as numLocal while the parameter tol will be interpreted as the ratio to calculate maxNeighbor = tol * n * (n-k).- Returns:
- the model.
-