smile.feature.extraction.PCA

All Implemented Interfaces:: Serializable, Function<Tuple,Tuple>, Transform

public class PCA extends Projection

Principal component analysis. PCA is an orthogonal linear transformation that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is theoretically the optimum transform for given data in least square terms. PCA can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data. If a multivariate dataset is visualized as a set of coordinates in a high-dimensional data space, PCA supplies the user with a lower-dimensional picture when viewed from its (in some sense) most informative viewpoint.

PCA is mostly used as a tool in exploratory data analysis and for making predictive models. PCA involves the calculation of the eigenvalue decomposition of a data covariance matrix or singular value decomposition of a data matrix, usually after mean centering the data for each attribute. The results of a PCA are usually discussed in terms of component scores and loadings.

As a linear technique, PCA is built for several purposes: first, it enables us to decorrelate the original variables; second, to carry out data compression, where we pay decreasing attention to the numerical accuracy by which we encode the sequence of principal components; third, to reconstruct the original input data using a reduced number of variables according to a least-squares criterion; and fourth, to identify potential clusters in the data.

In certain applications, PCA can be misleading. PCA is heavily influenced when there are outliers in the data. In other situations, the linearity of PCA may be an obstacle to successful data reduction and compression.

See Also:

Field Summary

Fields inherited from class smile.feature.extraction.Projection
columns, projection, schema
Constructor Summary

Constructors

Constructor

Description

PCA(double[] mu, double[] eigvalues, Matrix loadings, Matrix projection, String... columns)

Constructor.
Method Summary

Modifier and Type

Method

Description

double[]

center()

Returns the center of data.

static PCA

cor(double[][] data, String... columns)

Fits principal component analysis with correlation matrix.

static PCA

cor(DataFrame data, String... columns)

Fits principal component analysis with correlation matrix.

double[]

cumulativeVarianceProportion()

Returns the cumulative proportion of variance contained in principal components, ordered from largest to smallest.

static PCA

fit(double[][] data, String... columns)

Fits principal component analysis with covariance matrix.

static PCA

fit(DataFrame data, String... columns)

Fits principal component analysis with covariance matrix.

PCA

getProjection(double p)

Returns the projection with top principal components that contain (more than) the given percentage of variance.

PCA

getProjection(int p)

Returns the projection with given number of principal components.

Matrix

loadings()

Returns the variable loading matrix, ordered from largest to smallest by corresponding eigenvalues.

protected double[]

postprocess(double[] x)

Postprocess the output vector after projection.

double[]

variance()

Returns the principal component variances, ordered from largest to smallest, which are the eigenvalues of the covariance or correlation matrix of learning data.

double[]

varianceProportion()

Returns the proportion of variance contained in each principal component, ordered from largest to smallest.

Methods inherited from class smile.feature.extraction.Projection
apply, apply, apply, apply, preprocess

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.function.Function
andThen, compose

Methods inherited from interface smile.data.transform.Transform
andThen, compose

Constructor Details
- PCA
  
  public PCA(double[] mu, double[] eigvalues, Matrix loadings, Matrix projection, String... columns)
  
  Constructor.
  
  Parameters:
  
  mu - the mean of samples.
  
  eigvalues - the eigen values of principal components.
  
  loadings - the matrix of variable loadings.
  
  projection - the projection matrix.
  
  columns - the columns to transform when applied on Tuple/DataFrame.
Method Details
- fit
  
  public static PCA fit(DataFrame data, String... columns)
  
  Fits principal component analysis with covariance matrix.
  
  Parameters:
  
  data - training data of which each row is a sample.
  
  columns - the columns to fit PCA. If empty, all columns will be used.
  
  Returns:
  
  the model.
- cor
  
  public static PCA cor(DataFrame data, String... columns)
  
  Fits principal component analysis with correlation matrix.
  
  Parameters:
  
  data - training data of which each row is a sample.
  
  columns - the columns to fit PCA. If empty, all columns will be used.
  
  Returns:
  
  the model.
- fit
  
  public static PCA fit(double[][] data, String... columns)
  
  Fits principal component analysis with covariance matrix.
  
  Parameters:
  
  data - training data of which each row is a sample.
  
  columns - the columns to transform when applied on Tuple/DataFrame.
  
  Returns:
  
  the model.
- cor
  
  public static PCA cor(double[][] data, String... columns)
  
  Fits principal component analysis with correlation matrix.
  
  Parameters:
  
  data - training data of which each row is a sample.
  
  columns - the columns to transform when applied on Tuple/DataFrame.
  
  Returns:
  
  the model.
- center
  
  public double[] center()
  
  Returns the center of data.
  
  Returns:
  
  the center of data.
- loadings
  
  public Matrix loadings()
  
  Returns the variable loading matrix, ordered from largest to smallest by corresponding eigenvalues. The matrix columns contain the eigenvectors.
  
  Returns:
  
  the variable loading matrix.
- variance
  
  public double[] variance()
  
  Returns the principal component variances, ordered from largest to smallest, which are the eigenvalues of the covariance or correlation matrix of learning data.
  
  Returns:
  
  the principal component variances.
- varianceProportion
  
  public double[] varianceProportion()
  
  Returns the proportion of variance contained in each principal component, ordered from largest to smallest.
  
  Returns:
  
  the proportion of variance contained in each principal component.
- cumulativeVarianceProportion
  
  public double[] cumulativeVarianceProportion()
  
  Returns the cumulative proportion of variance contained in principal components, ordered from largest to smallest.
  
  Returns:
  
  the cumulative proportion of variance.
- getProjection
  
  public PCA getProjection(int p)
  
  Returns the projection with given number of principal components.
  
  Parameters:
  
  p - choose top p principal components used for projection.
  
  Returns:
  
  a new PCA projection.
- getProjection
  
  public PCA getProjection(double p)
  
  Returns the projection with top principal components that contain (more than) the given percentage of variance.
  
  Parameters:
  
  p - the required percentage of variance.
  
  Returns:
  
  a new PCA projection.
- postprocess
  
  protected double[] postprocess(double[] x)
  
  Description copied from class: Projection
  
  Postprocess the output vector after projection.
  
  Overrides:
  
  postprocess in class Projection
  
  Parameters:
  
  x - the output vector of projection.
  
  Returns:
  
  the postprocessed vector.

Class PCA

Field Summary

Fields inherited from class smile.feature.extraction.Projection

Constructor Summary

Method Summary

Methods inherited from class smile.feature.extraction.Projection

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.function.Function

Methods inherited from interface smile.data.transform.Transform

Constructor Details

PCA

Method Details

fit

cor

fit

cor

center

loadings

variance

varianceProportion

cumulativeVarianceProportion

getProjection

getProjection

postprocess