Package smile.vq

Class SOM

java.lang.Object
smile.vq.SOM
All Implemented Interfaces:
Serializable, VectorQuantizer

public class SOM extends Object implements VectorQuantizer
Self-Organizing Map. An SOM is a unsupervised learning method to produce a low-dimensional (typically two-dimensional) discretized representation (called a map) of the input space of the training samples. The model was first described as an artificial neural network by Teuvo Kohonen, and is sometimes called a Kohonen map.

While it is typical to consider SOMs as related to feed-forward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation because SOMs use a neighborhood function to preserve the topological properties of the input space. This makes SOMs useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling.

SOMs belong to a large family of competitive learning process and vector quantization. An SOM consists of components called nodes or neurons. Associated with each node is a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. During the (iterative) learning, the input vectors are compared to the weight vector of each neuron. Neurons who most closely match the input are known as the best match unit (BMU) of the system. The weight vector of the BMU and those of nearby neurons are adjusted to be closer to the input vector by a certain step size.

There are two ways to interpret a SOM. Because in the training phase weights of the whole neighborhood are moved in the same direction, similar items tend to excite adjacent neurons. Therefore, SOM forms a semantic map where similar samples are mapped close together and dissimilar apart. The other way is to think of neuronal weights as pointers to the input space. They form a discrete approximation of the distribution of training samples. More neurons point to regions with high training sample concentration and fewer where the samples are scarce.

SOM may be considered a nonlinear generalization of Principal components analysis (PCA). It has been shown, using both artificial and real geophysical data, that SOM has many advantages over the conventional feature extraction methods such as Empirical Orthogonal Functions (EOF) or PCA.

It has been shown that while SOMs with a small number of nodes behave in a way that is similar to K-means. However, larger SOMs rearrange data in a way that is fundamentally topological in character and display properties which are emergent. Therefore, large maps are preferable to smaller ones. In maps consisting of thousands of nodes, it is possible to perform cluster operations on the map itself.

A common way to display SOMs is the heat map of U-matrix. The U-matrix value of a particular node is the minimum/maximum/average distance between the node and its closest neighbors. In a rectangular grid for instance, we might consider the closest 4 or 8 nodes.

References

  1. Teuvo KohonenDan. Self-organizing maps. Springer, 3rd edition, 2000.
See Also:
  • Field Summary

    Fields inherited from interface smile.vq.VectorQuantizer

    OUTLIER
  • Constructor Summary

    Constructors
    Constructor
    Description
    SOM(double[][][] neurons, TimeFunction alpha, Neighborhood theta)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    static double[][][]
    lattice(int nrow, int ncol, double[][] samples)
    Creates a lattice of which the weight vectors are randomly selected from samples.
    double[][][]
    Returns the lattice of neurons.
    double[]
    quantize(double[] x)
    Quantize a new observation.
    double[][]
    Calculates the unified distance matrix (u-matrix) for visualization.
    void
    update(double[] x)
    Update the codebook with a new observation.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • SOM

      public SOM(double[][][] neurons, TimeFunction alpha, Neighborhood theta)
      Constructor.
      Parameters:
      neurons - the initial lattice of neurons.
      alpha - the learning rate function.
      theta - the neighborhood function.
  • Method Details

    • lattice

      public static double[][][] lattice(int nrow, int ncol, double[][] samples)
      Creates a lattice of which the weight vectors are randomly selected from samples.
      Parameters:
      nrow - the number of rows in the lattice.
      ncol - the number of columns in the lattice.
      samples - the samples to draw initial weight vectors.
      Returns:
      the lattice.
    • update

      public void update(double[] x)
      Description copied from interface: VectorQuantizer
      Update the codebook with a new observation.
      Specified by:
      update in interface VectorQuantizer
      Parameters:
      x - a new observation.
    • neurons

      public double[][][] neurons()
      Returns the lattice of neurons.
      Returns:
      the lattice of neurons.
    • umatrix

      public double[][] umatrix()
      Calculates the unified distance matrix (u-matrix) for visualization. U-matrix is a popular method of displaying SOMs. The value of umatrix is the maximum of distances between a map unit to its neighbors.
      Returns:
      the unified distance matrix.
    • quantize

      public double[] quantize(double[] x)
      Description copied from interface: VectorQuantizer
      Quantize a new observation. Returns Optional.empty if the observation is noise.
      Specified by:
      quantize in interface VectorQuantizer
      Parameters:
      x - a new observation.
      Returns:
      the quantized vector.