smile.clustering

package smile.clustering

Clustering analysis. Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields.

Hierarchical algorithms find successive clusters using previously established clusters. These algorithms usually are either agglomerative ("bottom-up") or divisive ("top-down"). Agglomerative algorithms begin with each element as a separate cluster and merge them into successively larger clusters. Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters.

Partitional algorithms typically determine all clusters at once, but can also be used as divisive algorithms in the hierarchical clustering. Many partitional clustering algorithms require the specification of the number of clusters to produce in the input data set, prior to execution of the algorithm. Barring knowledge of the proper value beforehand, the appropriate value must be determined, a problem on its own for which a number of techniques have been developed.

Density-based clustering algorithms are devised to discover arbitrary-shaped clusters. In this approach, a cluster is regarded as a region in which the density of data objects exceeds a threshold.

Subspace clustering methods look for clusters that can only be seen in a particular projection (subspace, manifold) of the data. These methods thus can ignore irrelevant attributes. The general problem is also known as Correlation clustering while the special case of axis-parallel subspaces is also known as two-way clustering, co-clustering or biclustering in bioinformatics: in these methods not only the objects are clustered but also the features of the objects, i.e., if the data is represented in a data matrix, the rows and columns are clustered simultaneously. They usually do not however work with arbitrary feature combinations as in general subspace methods.

Related Packages

Package

Description

smile.clustering.linkage

Cluster dissimilarity measures.
Class

Description

BBDTree

Balanced Box-Decomposition Tree.

CentroidClustering<T,U>

Centroid-based clustering that uses the center of each cluster to group similar data points into clusters.

Clustering

Clustering utility functions.

Clustering.Options

Iterative clustering algorithm hyperparameters.

DBSCAN<T>

Density-Based Spatial Clustering of Applications with Noise.

DENCLUE

DENsity CLUstering.

DENCLUE.Options

DENCLUE hyperparameters.

DeterministicAnnealing

Deterministic annealing clustering.

DeterministicAnnealing.Options

Deterministic annealing hyperparameters.

GMeans

G-Means clustering algorithm, an extended K-Means which tries to automatically determine the number of clusters by normality test.

HierarchicalClustering

Agglomerative Hierarchical Clustering.

KMeans

K-Means clustering.

KMedoids<T>

K-Medoids clustering based on randomized search (CLARANS).

KModes

K-Modes clustering.

MEC<T>

Non-parametric Minimum Conditional Entropy Clustering.

MEC.Options

MEC hyperparameters.

Partitioning

Clustering partitions.

SIB

The Sequential Information Bottleneck algorithm.

SpectralClustering

Spectral Clustering.

SpectralClustering.Options

Spectral clustering hyperparameters.

XMeans

X-Means clustering algorithm, an extended K-Means which tries to automatically determine the number of clusters based on BIC scores.

Package smile.clustering