Class RandomProjection

java.lang.Object
smile.feature.extraction.Projection
smile.feature.extraction.RandomProjection
All Implemented Interfaces:
Serializable, Function<Tuple,Tuple>, Transform

public class RandomProjection extends Projection
Random projection is a promising dimensionality reduction technique for learning mixtures of Gaussians. According to Johnson-Lindenstrauss lemma, any n data points in high dimension can be mapped down to d = O(log n / ε2) dimension without distorting their pairwise distances by more than (1 + ε). However, this reduced dimension is still far too high. Let ε = 1, we need 2d data points, and this usually exceeds n by many orders of magnitude.

Fortunately, we can reduce the dimension of the data far more drastically for the particular case of mixtures of Gaussians. In fact, we can map the data into just d = O(log k) dimensions, where k is the number of Gaussians. Therefore, the amount of data we will need is only polynomial in k. Note that this projected dimension is independent of the number of data points and of their original dimension. Experiments show that a value of log k works nicely.

Besides, even if the original clusters are highly eccentric (that is, far from spherical), random projection will make them more spherical. Note that eccentric clusters are problematic for the EM algorithm because intermediate covariance matrices may become singular or close to singular. Note that for high enough dimension, almost the entire Gaussian distribution lies in a thin shell.

References

  1. S. Dasgupta. Experiments with random projection. UAI, 2000.
  2. D. Achlioptas. Database-friendly random projections. 2001.
  3. Chinmay Hegde, Michael Wakin, and Richard Baraniuk. Random projections for manifold learning. NIPS, 2007.
See Also:
  • Constructor Details

    • RandomProjection

      public RandomProjection(Matrix projection, String... columns)
      Constructor.
      Parameters:
      projection - the projection matrix.
      columns - the columns to transform when applied on Tuple/DataFrame.
  • Method Details

    • of

      public static RandomProjection of(int n, int p, String... columns)
      Generates a non-sparse random projection.
      Parameters:
      n - the dimension of input space.
      p - the dimension of feature space.
      columns - the columns to transform when applied on Tuple/DataFrame.
      Returns:
      the model.
    • sparse

      public static RandomProjection sparse(int n, int p, String... columns)
      Generates a sparse random projection.
      Parameters:
      n - the dimension of input space.
      p - the dimension of feature space.
      columns - the columns to transform when applied on Tuple/DataFrame.
      Returns:
      the model.