Class Mixture

java.lang.Object
smile.stat.distribution.Mixture
All Implemented Interfaces:
Serializable, Distribution
Direct Known Subclasses:
ExponentialFamilyMixture

public class Mixture extends Object implements Distribution
A finite mixture model is a probabilistic model for density estimation using a mixture distribution. A mixture model can be regarded as a type of unsupervised learning or clustering.

The Expectation-maximization algorithm can be used to compute the parameters of a parametric mixture model distribution. The EM algorithm is a method for finding maximum likelihood estimates of parameters, where the model depends on unobserved latent variables. EM is an iterative method which alternates between performing an expectation (E) step, which computes the expectation of the log-likelihood evaluated using the current estimate for the latent variables, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter estimates are then used to determine the distribution of the latent variables in the next E step.

See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    A component in the mixture distribution is defined by a distribution and its weight in the mixture.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    The components of finite mixture model.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Mixture(Mixture.Component... components)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    bic(double[] data)
    Returns the BIC score.
    double
    cdf(double x)
    Cumulative distribution function.
    double
    Shannon's entropy.
    int
    Returns the number of parameters of the distribution.
    double
    logp(double x)
    The density at x in log scale, which may prevents the underflow problem.
    int
    map(double x)
    Returns the index of component with maximum a posteriori probability.
    double
    Returns the mean of distribution.
    double
    p(double x)
    The probability density function for continuous distribution or probability mass function for discrete distribution at x.
    double[]
    posteriori(double x)
    Returns the posteriori probabilities.
    double
    quantile(double p)
    The quantile, the probability to the left of quantile is p.
    double
    Generates a random number following this distribution.
    int
    Returns the number of components in the mixture.
     
    double
    Returns the variance of distribution.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

    Methods inherited from interface smile.stat.distribution.Distribution

    inverseTransformSampling, likelihood, logLikelihood, quantile, quantile, rand, rejectionSampling, sd
  • Field Details

    • components

      public final Mixture.Component[] components
      The components of finite mixture model.
  • Constructor Details

    • Mixture

      public Mixture(Mixture.Component... components)
      Constructor.
      Parameters:
      components - a list of distributions.
  • Method Details

    • posteriori

      public double[] posteriori(double x)
      Returns the posteriori probabilities.
      Parameters:
      x - a real value.
      Returns:
      the posteriori probabilities.
    • map

      public int map(double x)
      Returns the index of component with maximum a posteriori probability.
      Parameters:
      x - an integer value.
      Returns:
      the index of component with maximum a posteriori probability.
    • mean

      public double mean()
      Description copied from interface: Distribution
      Returns the mean of distribution.
      Specified by:
      mean in interface Distribution
      Returns:
      The mean.
    • variance

      public double variance()
      Description copied from interface: Distribution
      Returns the variance of distribution.
      Specified by:
      variance in interface Distribution
      Returns:
      The variance.
    • entropy

      public double entropy()
      Shannon's entropy. Not supported.
      Specified by:
      entropy in interface Distribution
      Returns:
      Shannon entropy.
    • p

      public double p(double x)
      Description copied from interface: Distribution
      The probability density function for continuous distribution or probability mass function for discrete distribution at x.
      Specified by:
      p in interface Distribution
      Parameters:
      x - a real number.
      Returns:
      the density.
    • logp

      public double logp(double x)
      Description copied from interface: Distribution
      The density at x in log scale, which may prevents the underflow problem.
      Specified by:
      logp in interface Distribution
      Parameters:
      x - a real number.
      Returns:
      the log density.
    • cdf

      public double cdf(double x)
      Description copied from interface: Distribution
      Cumulative distribution function. That is the probability to the left of x.
      Specified by:
      cdf in interface Distribution
      Parameters:
      x - a real number.
      Returns:
      the probability.
    • rand

      public double rand()
      Description copied from interface: Distribution
      Generates a random number following this distribution.
      Specified by:
      rand in interface Distribution
      Returns:
      a random number.
    • quantile

      public double quantile(double p)
      Description copied from interface: Distribution
      The quantile, the probability to the left of quantile is p. It is actually the inverse of cdf.
      Specified by:
      quantile in interface Distribution
      Parameters:
      p - the probability.
      Returns:
      the quantile.
    • length

      public int length()
      Description copied from interface: Distribution
      Returns the number of parameters of the distribution. The "length" is in the sense of the minimum description length principle.
      Specified by:
      length in interface Distribution
      Returns:
      The number of parameters.
    • size

      public int size()
      Returns the number of components in the mixture.
      Returns:
      the number of components in the mixture.
    • bic

      public double bic(double[] data)
      Returns the BIC score.
      Parameters:
      data - the data to calculate likelihood.
      Returns:
      the BIC score.
    • toString

      public String toString()
      Overrides:
      toString in class Object