Class TSNE

java.lang.Object
smile.manifold.TSNE
All Implemented Interfaces:
Serializable

public class TSNE extends Object implements Serializable
The t-distributed stochastic neighbor embedding. The t-SNE is a nonlinear dimensionality reduction technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points.

The t-SNE algorithm comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback–Leibler divergence between the two distributions with respect to the locations of the points in the map. Note that while the original algorithm uses the Euclidean distance between objects as the base of its similarity metric, this should be changed as appropriate.

References

  1. L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.
  2. L.J.P. van der Maaten and G.E. Hinton. Visualizing Non-Metric Similarities in Multiple Maps. Machine Learning 87(1):33-55, 2012.
  3. L.J.P. van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. In Proceedings of the Twelfth International Conference on Artificial Intelligence & Statistics (AI-STATS), JMLR W&CP 5:384-391, 2009.
  4. L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    final double[][]
    The coordinate matrix in embedding space.
  • Constructor Summary

    Constructors
    Constructor
    Description
    TSNE(double[][] X, int d)
    Constructor.
    TSNE(double[][] X, int d, double perplexity, double eta, int iterations)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    Returns the cost function value.
    void
    update(int iterations)
    Performs additional iterations.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • coordinates

      public final double[][] coordinates
      The coordinate matrix in embedding space.
  • Constructor Details

    • TSNE

      public TSNE(double[][] X, int d)
      Constructor. Train t-SNE for 1000 iterations, perplexity = 20 and learning rate = 200.
      Parameters:
      X - the input data. If X is a square matrix, it is assumed to be the squared distance/dissimilarity matrix.
      d - the dimension of embedding space.
    • TSNE

      public TSNE(double[][] X, int d, double perplexity, double eta, int iterations)
      Constructor. Train t-SNE for given number of iterations.
      Parameters:
      X - the input data. If X is a square matrix, it is assumed to be the squared distance/dissimilarity matrix.
      d - the dimension of embedding space.
      perplexity - the perplexity of the conditional distribution.
      eta - the learning rate.
      iterations - the number of iterations.
  • Method Details

    • cost

      public double cost()
      Returns the cost function value.
      Returns:
      the cost function value.
    • update

      public void update(int iterations)
      Performs additional iterations.
      Parameters:
      iterations - the number of iterations.