Class DBSCAN<T>
 Type Parameters:
T
 the type of input object.
 All Implemented Interfaces:
Serializable
DBSCAN requires two parameters: radius (i.e. neighborhood radius) and the number of minimum points required to form a cluster (minPts). It starts with an arbitrary starting point that has not been visited. This point's neighborhood is retrieved, and if it contains sufficient number of points, a cluster is started. Otherwise, the point is labeled as noise. Note that this point might later be found in a sufficiently sized radiusenvironment of a different point and hence be made part of a cluster.
If a point is found to be part of a cluster, its neighborhood is also part of that cluster. Hence, all points that are found within the neighborhood are added, as is their own neighborhood. This process continues until the cluster is completely found. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster of noise.
DBSCAN visits each point of the database, possibly multiple times (e.g., as candidates to different clusters). For practical considerations, however, the time complexity is mostly governed by the number of nearest neighbor queries. DBSCAN executes exactly one such query for each point, and if an indexing structure is used that executes such a neighborhood query in O(log n), an overall runtime complexity of O(n log n) is obtained.
DBSCAN has many advantages such as
 DBSCAN does not need to know the number of clusters in the data a priori, as opposed to kmeans.
 DBSCAN can find arbitrarily shaped clusters. It can even find clusters completely surrounded by (but not connected to) a different cluster. Due to the MinPts parameter, the socalled singlelink effect (different clusters being connected by a thin line of points) is reduced.
 DBSCAN has a notion of noise. Outliers are labeled as Clustering.OUTLIER, which is Integer.MAX_VALUE.
 DBSCAN requires just two parameters and is mostly insensitive to the ordering of the points in the database. (Only points sitting on the edge of two different clusters might swap cluster membership if the ordering of the points is changed, and the cluster assignment is unique only up to isomorphism.)
 In high dimensional space, the data are sparse everywhere because of the curse of dimensionality. Therefore, DBSCAN doesn't work well on highdimensional data in general.
 DBSCAN does not respond well to data sets with varying densities.
References
 Martin Ester, HansPeter Kriegel, Jorg Sander, Xiaowei Xu (1996). A densitybased algorithm for discovering clusters in large spatial databases with noise". KDD, 1996.
 Jorg Sander, Martin Ester, HansPeter Kriegel, Xiaowei Xu. (1998). DensityBased Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. 1998.
 See Also:

Field Summary
Modifier and TypeFieldDescriptionfinal double
The minimum number of points required to form a clusterfinal double
The neighborhood radius.Fields inherited from class smile.clustering.PartitionClustering
k, OUTLIER, size, y

Constructor Summary

Method Summary
Methods inherited from class smile.clustering.PartitionClustering
run, seed, toString

Field Details

minPts
public final double minPtsThe minimum number of points required to form a cluster 
radius
public final double radiusThe neighborhood radius.


Constructor Details

DBSCAN
Constructor. Parameters:
minPts
 the minimum number of neighbors for a core data point.radius
 the neighborhood radius.nns
 the data structure for neighborhood search.k
 the number of clusters.y
 the cluster labels.core
 the flag if the point is a core point.


Method Details

fit
Clustering the data with KDtree. DBSCAN is generally applied on lowdimensional data. Therefore, KDtree can speed up the nearest neighbor search a lot. Parameters:
data
 the observations.minPts
 the minimum number of neighbors for a core data point.radius
 the neighborhood radius. Returns:
 the model.

fit
Clustering the data. Type Parameters:
T
 the data type. Parameters:
data
 the observations.distance
 the distance function.minPts
 the minimum number of neighbors for a core data point.radius
 the neighborhood radius. Returns:
 the model.

fit
Clustering the data. Type Parameters:
T
 the data type. Parameters:
data
 the observations.nns
 the data structure for neighborhood search.minPts
 the minimum number of neighbors for a core data point.radius
 the neighborhood radius. Returns:
 the model.

predict
Classifies a new observation. Parameters:
x
 a new observation. Returns:
 the cluster label. Note that it may be
PartitionClustering.OUTLIER
.
