mec

fun <T> mec(data: Array<T>, distance: Distance<T>, k: Int, radius: Double): MEC<T>

Nonparametric Minimum Conditional Entropy Clustering. This method performs very well especially when the exact number of clusters is unknown. The method can also correctly reveal the structure of data and effectively identify outliers simultaneously.

The clustering criterion is based on the conditional entropy H(C | x), where C is the cluster label and x is an observation. According to Fano's inequality, we can estimate C with a low probability of error only if the conditional entropy H(C | X) is small. MEC also generalizes the criterion by replacing Shannon's entropy with Havrda-Charvat's structural α-entropy. Interestingly, the minimum entropy criterion based on structural α-entropy is equal to the probability error of the nearest neighbor method when α = 2. To estimate p(C | x), MEC employs Parzen density estimation, a nonparametric approach.

MEC is an iterative algorithm starting with an initial partition given by any other clustering methods, e.g. k-means, CLARNAS, hierarchical clustering, etc. Note that a random initialization is NOT appropriate.

====References:====

  • Haifeng Li. All rights reserved., Keshu Zhang, and Tao Jiang. Minimum Entropy Clustering and Applications to Gene Expression Analysis. CSB, 2004.

Parameters

data

the data set.

distance

the distance measure for neighborhood search.

k

the number of clusters. Note that this is just a hint. The final number of clusters may be less.

radius

the neighborhood radius.


fun <T> mec(data: Array<T>, distance: Metric<T>, k: Int, radius: Double): MEC<T>

Nonparametric Minimum Conditional Entropy Clustering.

Parameters

data

the data set.

distance

the distance measure for neighborhood search.

k

the number of clusters. Note that this is just a hint. The final number of clusters may be less.

radius

the neighborhood radius.


fun mec(data: Array<DoubleArray>, k: Int, radius: Double): MEC<DoubleArray>

Nonparametric Minimum Conditional Entropy Clustering. Assume Euclidean distance.

Parameters

data

the data set.

k

the number of clusters. Note that this is just a hint. The final number of clusters may be less.

radius

the neighborhood radius.


fun <T> mec(data: Array<T>, nns: RNNSearch<T, T>, k: Int, radius: Double, y: IntArray, tol: Double = 1.0E-4): MEC<T>

Nonparametric Minimum Conditional Entropy Clustering.

Parameters

data

the data set.

nns

the data structure for neighborhood search.

k

the number of clusters. Note that this is just a hint. The final number of clusters may be less.

radius

the neighborhood radius.

tol

the tolerance of convergence test.