Class KPCA<T>

java.lang.Object
smile.manifold.KPCA<T>
Type Parameters:
T - the data type of model input objects.
All Implemented Interfaces:
Serializable, Function<T,double[]>

public class KPCA<T> extends Object implements Function<T,double[]>, Serializable
Kernel principal component analysis. Kernel PCA is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are done in a reproducing kernel Hilbert space with a non-linear mapping.

In practice, a large data set leads to a large Kernel/Gram matrix K, and storing K may become a problem. One way to deal with this is to perform clustering on your large dataset, and populate the kernel with the means of those clusters. Since even this method may yield a relatively large K, it is common to compute only the top P eigenvalues and eigenvectors of K.

Kernel PCA with an isotropic kernel function is closely related to metric MDS. Carrying out metric MDS on the kernel matrix K produces an equivalent configuration of points as the distance (2(1 - K(xi, xj)))1/2 computed in feature space.

Kernel PCA also has close connections with Isomap, LLE, and Laplacian eigenmaps.

References

  1. Bernhard Scholkopf, Alexander Smola, and Klaus-Robert Muller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 1998.
See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    KPCA(T[] data, MercerKernel<T> kernel, double[] mean, double mu, double[][] coordinates, double[] latent, Matrix projection)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    double[]
    apply(T x)
     
    double[][]
    apply(T[] x)
    Project a set of data to the feature space.
    double[][]
    Returns the nonlinear principal component scores, i.e., the representation of learning data in the nonlinear principal component space.
    static <T> KPCA<T>
    fit(T[] data, MercerKernel<T> kernel, int k)
    Fits kernel principal component analysis.
    static <T> KPCA<T>
    fit(T[] data, MercerKernel<T> kernel, int k, double threshold)
    Fits kernel principal component analysis.
    Returns the projection matrix.
    double[]
    Returns the eigenvalues of kernel principal components, ordered from largest to smallest.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface java.util.function.Function

    andThen, compose
  • Constructor Details

    • KPCA

      public KPCA(T[] data, MercerKernel<T> kernel, double[] mean, double mu, double[][] coordinates, double[] latent, Matrix projection)
      Constructor.
      Parameters:
      data - training data.
      kernel - Mercer kernel.
      mean - the row/column average of kernel matrix.
      mu - the average of kernel matrix.
      coordinates - the coordinates of projected training data.
      latent - the projection matrix.
      projection - the projection matrix.
  • Method Details

    • fit

      public static <T> KPCA<T> fit(T[] data, MercerKernel<T> kernel, int k)
      Fits kernel principal component analysis.
      Type Parameters:
      T - the data type of samples.
      Parameters:
      data - training data.
      kernel - Mercer kernel.
      k - choose up to k principal components (larger than 0.0001) used for projection.
      Returns:
      the model.
    • fit

      public static <T> KPCA<T> fit(T[] data, MercerKernel<T> kernel, int k, double threshold)
      Fits kernel principal component analysis.
      Type Parameters:
      T - the data type of samples.
      Parameters:
      data - training data.
      kernel - Mercer kernel.
      k - choose top k principal components used for projection.
      threshold - only principal components with eigenvalues larger than the given threshold will be kept.
      Returns:
      the model.
    • variances

      public double[] variances()
      Returns the eigenvalues of kernel principal components, ordered from largest to smallest.
      Returns:
      the eigenvalues of kernel principal components, ordered from largest to smallest.
    • projection

      public Matrix projection()
      Returns the projection matrix. The dimension reduced data can be obtained by y = W * K(x, ·).
      Returns:
      the projection matrix.
    • coordinates

      public double[][] coordinates()
      Returns the nonlinear principal component scores, i.e., the representation of learning data in the nonlinear principal component space. Rows correspond to observations, columns to components.
      Returns:
      the nonlinear principal component scores.
    • apply

      public double[] apply(T x)
      Specified by:
      apply in interface Function<T,double[]>
    • apply

      public double[][] apply(T[] x)
      Project a set of data to the feature space.
      Parameters:
      x - the data set.
      Returns:
      the projection in the feature space.