Class HDBSCAN<T>
java.lang.Object
smile.clustering.Partitioning
smile.clustering.HDBSCAN<T>
- Type Parameters:
T- the data type.
- All Implemented Interfaces:
Serializable
Hierarchical Density-Based Spatial Clustering of Applications with Noise
(HDBSCAN).
HDBSCAN extends DBSCAN by building a hierarchy of density-connected components on the mutual-reachability graph and then selecting stable clusters from the hierarchy.
This implementation follows the core pipeline in the paper and the reference Python implementation:
- estimate core distances with
minPoints - build the mutual-reachability graph
- compute a minimum spanning tree
- convert to a hierarchy and perform stability-based cluster selection
with
minClusterSize
References
- Campello, R. J. G. B., Moulavi, D., and Sander, J. Density-Based Clustering Based on Hierarchical Density Estimates. PAKDD, 2013.
- McInnes, L., Healy, J., Astels, S. hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2017.
- See Also:
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionHDBSCAN(int k, int[] group, int minPoints, int minClusterSize, double[] coreDistance, double[] stability) Constructor. -
Method Summary
Modifier and TypeMethodDescriptiondouble[]Returns the core distances.static HDBSCAN<double[]> fit(double[][] data, int minPoints, int minClusterSize) Clusters the data with Euclidean distance.static HDBSCAN<double[]> fit(double[][] data, HDBSCAN.Options options) Clusters the data with Euclidean distance.static <T> HDBSCAN<T> Clusters the data.static <T> HDBSCAN<T> fit(T[] data, Distance<T> distance, HDBSCAN.Options options) Clusters the data.intReturns the minimum cluster size.intReturns the number of neighbors for core-distance estimation.double[]Returns the stability scores of selected clusters.
-
Constructor Details
-
HDBSCAN
public HDBSCAN(int k, int[] group, int minPoints, int minClusterSize, double[] coreDistance, double[] stability) Constructor.- Parameters:
k- the number of clusters.group- the cluster labels.minPoints- the number of neighbors for core-distance estimation.minClusterSize- the minimum cluster size.coreDistance- the core distance of each point.stability- the stability scores of selected clusters.
-
-
Method Details
-
minPoints
public int minPoints()Returns the number of neighbors for core-distance estimation.- Returns:
- the number of neighbors for core-distance estimation.
-
minClusterSize
public int minClusterSize()Returns the minimum cluster size.- Returns:
- the minimum cluster size.
-
coreDistance
public double[] coreDistance()Returns the core distances.- Returns:
- the core distances.
-
stability
public double[] stability()Returns the stability scores of selected clusters.- Returns:
- the cluster stability.
-
fit
Clusters the data with Euclidean distance.- Parameters:
data- the observations.minPoints- the number of neighbors for core-distance estimation.minClusterSize- the minimum cluster size.- Returns:
- the model.
-
fit
Clusters the data with Euclidean distance.- Parameters:
data- the observations.options- the hyperparameters.- Returns:
- the model.
-
fit
Clusters the data.- Type Parameters:
T- the data type.- Parameters:
data- the observations.distance- the distance function.minPoints- the number of neighbors for core-distance estimation.minClusterSize- the minimum cluster size.- Returns:
- the model.
-
fit
Clusters the data.- Type Parameters:
T- the data type.- Parameters:
data- the observations.distance- the distance function.options- the hyperparameters.- Returns:
- the model.
-