Class BIRCH
- All Implemented Interfaces:
Serializable
,VectorQuantizer
BIRCH has several advantages. For example, each clustering decision is made without scanning all data points and currently existing clusters. It exploits the observation that data space is not usually uniformly occupied and not every data point is equally important. It makes full use of available memory to derive the finest possible sub-clusters while minimizing I/O costs. It is also an incremental method that does not require the whole data set in advance.
This implementation produces a clustering in three steps. First step builds a CF (clustering feature) tree by a single scan of database. The second step clusters the leaves of CF tree by hierarchical clustering. Then the user can use the learned model to classify input data in the final step. In total, we scan the database twice.
References
- Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD, 1996.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionfinal int
The branching factor of non-leaf nodes.final int
The dimensionality of data.final int
The number of CF entries in the leaf nodes.final double
THe maximum radius of a sub-cluster.Fields inherited from interface smile.vq.VectorQuantizer
OUTLIER
-
Constructor Summary
-
Method Summary
-
Field Details
-
B
public final int BThe branching factor of non-leaf nodes. -
L
public final int LThe number of CF entries in the leaf nodes. -
T
public final double TTHe maximum radius of a sub-cluster. -
d
public final int dThe dimensionality of data.
-
-
Constructor Details
-
BIRCH
public BIRCH(int d, int B, int L, double T) Constructor.- Parameters:
d
- the dimensionality of data.B
- the branching factor of non-leaf nodes, i.e. the maximum number of children nodes.L
- the number entries in the leaf nodes.T
- the maximum radius of a sub-cluster.
-
-
Method Details
-
update
public void update(double[] x) Description copied from interface:VectorQuantizer
Update the codebook with a new observation.- Specified by:
update
in interfaceVectorQuantizer
- Parameters:
x
- a new observation.
-
quantize
public double[] quantize(double[] x) Description copied from interface:VectorQuantizer
Quantize a new observation. Returns Optional.empty if the observation is noise.- Specified by:
quantize
in interfaceVectorQuantizer
- Parameters:
x
- a new observation.- Returns:
- the quantized vector.
-
centroids
public double[][] centroids()Returns the cluster centroids of leaf nodes.- Returns:
- the cluster centroids of leaf nodes.
-