Class PCA
- All Implemented Interfaces:
Serializable, Function<Tuple,Tuple>, Transform
PCA is mostly used as a tool in exploratory data analysis and for making predictive models. PCA involves the calculation of the eigenvalue decomposition of a data covariance matrix or singular value decomposition of a data matrix, usually after mean centering the data for each attribute. The results of a PCA are usually discussed in terms of component scores and loadings.
As a linear technique, PCA is built for several purposes: first, it enables us to decorrelate the original variables; second, to carry out data compression, where we pay decreasing attention to the numerical accuracy by which we encode the sequence of principal components; third, to reconstruct the original input data using a reduced number of variables according to a least-squares criterion; and fourth, to identify potential clusters in the data.
In certain applications, PCA can be misleading. PCA is heavily influenced when there are outliers in the data. In other situations, the linearity of PCA may be an obstacle to successful data reduction and compression.
- See Also:
-
Field Summary
Fields inherited from class Projection
columns, projection, schema -
Constructor Summary
ConstructorsConstructorDescriptionPCA(Vector mu, Vector eigvalues, DenseMatrix loadings, DenseMatrix projection, String... columns) Constructor. -
Method Summary
Modifier and TypeMethodDescriptioncenter()Returns the center of data.static PCAFits principal component analysis with correlation matrix.static PCAFits principal component analysis with correlation matrix.Returns the cumulative proportion of variance contained in principal components, ordered from largest to smallest.static PCAFits principal component analysis with covariance matrix.static PCAFits principal component analysis with covariance matrix.getProjection(double p) Returns the projection with top principal components that contain (more than) the given percentage of variance.getProjection(int p) Returns the projection with given number of principal components.loadings()Returns the variable loading matrix, ordered from largest to smallest by corresponding eigenvalues.protected double[]postprocess(double[] x) Postprocess the output vector after projection.variance()Returns the principal component variances, ordered from largest to smallest, which are the eigenvalues of the covariance or correlation matrix of learning data.Returns the proportion of variance contained in each principal component, ordered from largest to smallest.Methods inherited from class Projection
apply, apply, apply, apply, preprocess
-
Constructor Details
-
PCA
public PCA(Vector mu, Vector eigvalues, DenseMatrix loadings, DenseMatrix projection, String... columns) Constructor.- Parameters:
mu- the mean of samples.eigvalues- the eigen values of principal components.loadings- the matrix of variable loadings.projection- the projection matrix.columns- the columns to transform when applied on Tuple/DataFrame.
-
-
Method Details
-
fit
-
cor
-
fit
-
cor
-
center
-
loadings
Returns the variable loading matrix, ordered from largest to smallest by corresponding eigenvalues. The matrix columns contain the eigenvectors.- Returns:
- the variable loading matrix.
-
variance
Returns the principal component variances, ordered from largest to smallest, which are the eigenvalues of the covariance or correlation matrix of learning data.- Returns:
- the principal component variances.
-
varianceProportion
Returns the proportion of variance contained in each principal component, ordered from largest to smallest.- Returns:
- the proportion of variance contained in each principal component.
-
cumulativeVarianceProportion
Returns the cumulative proportion of variance contained in principal components, ordered from largest to smallest.- Returns:
- the cumulative proportion of variance.
-
getProjection
Returns the projection with given number of principal components.- Parameters:
p- choose top p principal components used for projection.- Returns:
- a new PCA projection.
-
getProjection
Returns the projection with top principal components that contain (more than) the given percentage of variance.- Parameters:
p- the required percentage of variance.- Returns:
- a new PCA projection.
-
postprocess
protected double[] postprocess(double[] x) Description copied from class:ProjectionPostprocess the output vector after projection.- Overrides:
postprocessin classProjection- Parameters:
x- the output vector of projection.- Returns:
- the postprocessed vector.
-