Package smile.manifold
Class TSNE
java.lang.Object
smile.manifold.TSNE
- All Implemented Interfaces:
Serializable
The t-distributed stochastic neighbor embedding. The t-SNE is a nonlinear
dimensionality reduction technique that is particularly well suited
for embedding high-dimensional data into a space of two or three
dimensions, which can then be visualized in a scatter plot. Specifically,
it models each high-dimensional object by a two- or three-dimensional
point in such a way that similar objects are modeled by nearby points
and dissimilar objects are modeled by distant points.
The t-SNE algorithm comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback–Leibler divergence between the two distributions with respect to the locations of the points in the map. Note that while the original algorithm uses the Euclidean distance between objects as the base of its similarity metric, this should be changed as appropriate.
References
- L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.
- L.J.P. van der Maaten and G.E. Hinton. Visualizing Non-Metric Similarities in Multiple Maps. Machine Learning 87(1):33-55, 2012.
- L.J.P. van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. In Proceedings of the Twelfth International Conference on Artificial Intelligence & Statistics (AI-STATS), JMLR W&CP 5:384-391, 2009.
- L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionfinal double[][]
The coordinate matrix in embedding space. -
Constructor Summary
-
Method Summary
-
Field Details
-
coordinates
public final double[][] coordinatesThe coordinate matrix in embedding space.
-
-
Constructor Details
-
TSNE
public TSNE(double[][] X, int d) Constructor. Train t-SNE for 1000 iterations, perplexity = 20 and learning rate = 200.- Parameters:
X
- the input data. If X is a square matrix, it is assumed to be the squared distance/dissimilarity matrix.d
- the dimension of embedding space.
-
TSNE
public TSNE(double[][] X, int d, double perplexity, double eta, int iterations) Constructor. Train t-SNE for given number of iterations.- Parameters:
X
- the input data. If X is a square matrix, it is assumed to be the squared distance/dissimilarity matrix.d
- the dimension of embedding space.perplexity
- the perplexity of the conditional distribution.eta
- the learning rate.iterations
- the number of iterations.
-
-
Method Details
-
cost
public double cost()Returns the cost function value.- Returns:
- the cost function value.
-
update
public void update(int iterations) Performs additional iterations.- Parameters:
iterations
- the number of iterations.
-