Class KernelPCA

java.lang.Object
smile.feature.extraction.Projection
smile.feature.extraction.KernelPCA
All Implemented Interfaces:
Serializable, Function<Tuple,Tuple>, Transform

public class KernelPCA extends Projection
Kernel PCA transform. Kernel PCA is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are done in a reproducing kernel Hilbert space with a non-linear mapping.

In practice, a large data set leads to a large Kernel/Gram matrix K, and storing K may become a problem. One way to deal with this is to perform clustering on your large dataset, and populate the kernel with the means of those clusters. Since even this method may yield a relatively large K, it is common to compute only the top P eigenvalues and eigenvectors of K.

Kernel PCA with an isotropic kernel function is closely related to metric MDS. Carrying out metric MDS on the kernel matrix K produces an equivalent configuration of points as the distance (2(1 - K(xi, xj)))1/2 computed in feature space.

Kernel PCA also has close connections with Isomap, LLE, and Laplacian eigenmaps.

References

  1. Bernhard Scholkopf, Alexander Smola, and Klaus-Robert Muller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 1998.
See Also:
  • Field Details

    • kpca

      public final KPCA<double[]> kpca
      Kernel PCA.
  • Constructor Details

    • KernelPCA

      public KernelPCA(KPCA<double[]> kpca, String... columns)
      Constructor.
      Parameters:
      kpca - kernel PCA object.
      columns - the columns to fit kernel PCA. If empty, all columns will be used.
  • Method Details

    • fit

      public static KernelPCA fit(DataFrame data, MercerKernel<double[]> kernel, int k, String... columns)
      Fits kernel principal component analysis.
      Parameters:
      data - training data.
      kernel - Mercer kernel.
      k - choose up to k principal components (larger than 0.0001) used for projection.
      columns - the columns to fit kernel PCA. If empty, all columns will be used.
      Returns:
      the model.
    • fit

      public static KernelPCA fit(DataFrame data, MercerKernel<double[]> kernel, int k, double threshold, String... columns)
      Fits kernel principal component analysis.
      Parameters:
      data - training data.
      kernel - Mercer kernel.
      k - choose top k principal components used for projection.
      threshold - only principal components with eigenvalues larger than the given threshold will be kept.
      columns - the columns to fit kernel PCA. If empty, all columns will be used.
      Returns:
      the model.
    • apply

      public double[] apply(double[] x)
      Description copied from class: Projection
      Project a data point to the feature space.
      Overrides:
      apply in class Projection
      Parameters:
      x - the data point.
      Returns:
      the projection in the feature space.