tsne

fun tsne(X: Array<DoubleArray>, d: Int = 2, perplexity: Double = 20.0, eta: Double = 200.0, earlyExaggeration: Double = 12.0, maxIter: Int = 1000): TSNE(source)

t-distributed stochastic neighbor embedding. t-SNE is a nonlinear dimensionality reduction technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points.

====References:====

  • L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.

  • L.J.P. van der Maaten and G.E. Hinton. Visualizing Non-Metric Similarities in Multiple Maps. Machine Learning 87(1):33-55, 2012.

  • L.J.P. van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. In Proceedings of the Twelfth International Conference on Artificial Intelligence & Statistics (AI-STATS), JMLR W&CP 5:384-391, 2009.

  • L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.

Parameters

X

input data. If X is a square matrix, it is assumed to be the squared distance/dissimilarity matrix.

d

the dimension of the manifold.

perplexity

the perplexity of the conditional distribution.

eta

the learning rate.

earlyExaggeration

Controls how tight natural clusters in the original space are in the embedded space and how much space will be between them. For larger values, the space between natural clusters will be larger in the embedded space. The choice of this parameter is not very critical. If the cost function increases during initial optimization, the early exaggeration factor or the learning rate might be too high.

maxIter

the number of iterations.