smile.manifold

package smile.manifold

Manifold learning finds a low-dimensional basis for describing high-dimensional data. Manifold learning is a popular approach to nonlinear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high; though each data point consists of perhaps thousands of features, it may be described as a function of only a few underlying parameters. That is, the data points are actually samples from a low-dimensional manifold that is embedded in a high-dimensional space. Manifold learning algorithms attempt to uncover these parameters in order to find a low-dimensional representation of the data.

Some prominent approaches are locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and LTSA. These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data, and can be viewed as defining a graph-based kernel for Kernel PCA. More recently, techniques have been proposed that, instead of defining a fixed kernel, try to learn the kernel using semidefinite programming. The most prominent example of such a technique is maximum variance unfolding (MVU). The central idea of MVU is to exactly preserve all pairwise distances between nearest neighbors (in the inner product space), while maximizing the distances between points that are not nearest neighbors.

An alternative approach to neighborhood preservation is through the minimization of a cost function that measures differences between distances in the input and output spaces. Important examples of such techniques include classical multidimensional scaling (which is identical to PCA), Isomap (which uses geodesic distances in the data space), diffusion maps (which uses diffusion distances in the data space), t-SNE (which minimizes the divergence between distributions over pairs of points), and curvilinear component analysis.

Multidimensional scaling is a set of related statistical techniques often used in information visualization for exploring similarities or dissimilarities in data. An MDS algorithm starts with a matrix of item-item similarities, then assigns a location to each item in N-dimensional space. For sufficiently small N, the resulting locations may be displayed in a graph or 3D visualization.

The major types of MDS algorithms include:

Classical multidimensional scaling: takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain.
Metric multidimensional scaling: A superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress which is often minimized using a procedure called stress majorization.
Non-metric multidimensional scaling: In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression.
Generalized multidimensional scaling: An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In case when the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another.

Class

Description

IsoMap

Isometric feature mapping.

IsoMap.Options

IsoMap hyperparameters.

IsotonicMDS

Kruskal's non-metric MDS.

IsotonicMDS.Options

Kruskal's non-metric MDS hyperparameters.

KPCA<T>

Kernel principal component analysis.

KPCA.Options

Kernel PCA hyperparameters.

LaplacianEigenmap

Laplacian Eigenmaps.

LaplacianEigenmap.Options

Laplacian Eigenmaps hyperparameters.

LLE

Locally Linear Embedding.

LLE.Options

LLE hyperparameters.

MDS

Classical multidimensional scaling, also known as principal coordinates analysis.

MDS.Options

MDS hyperparameters.

SammonMapping

The Sammon's mapping is an iterative technique for making interpoint distances in the low-dimensional projection as close as possible to the interpoint distances in the high-dimensional object.

SammonMapping.Options

Sammon's mapping hyperparameters.

TSNE

The t-distributed stochastic neighbor embedding.

TSNE.Options

The t-SNE hyperparameters.

UMAP

Uniform Manifold Approximation and Projection.

UMAP.Options

The UMAP hyperparameters.

Package smile.manifold