Class GaussianDistribution

java.lang.Object
smile.stat.distribution.GaussianDistribution
All Implemented Interfaces:
Serializable, Distribution, ExponentialFamily

public class GaussianDistribution extends Object implements ExponentialFamily
The normal distribution or Gaussian distribution is a continuous probability distribution that describes data that clusters around a mean. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known as the Gaussian function or bell curve. The normal distribution can be used to describe any variable that tends to cluster around the mean.

The family of normal distributions is closed under linear transformations. That is, if X is normally distributed, then a linear transform aX + b (for some real numbers a ≠ 0 and b) is also normally distributed. If X1, X2 are two independent normal random variables, then their linear combination will also be normally distributed. The converse is also true: if X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal, which is known as the Cramer's theorem. Of all probability distributions over the real domain with mean μ and variance σ2, the normal distribution N(μ, σ2) is the one with the maximum entropy.

The central limit theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have approximately normal distribution. For example if X1, …, Xn is a sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions of Xi's can be arbitrary, then the central limit theorem states that

n (1⁄n Σ Xi - μ) → N(0, σ2).

The theorem will hold even if the summands Xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.

Therefore, certain other distributions can be approximated by the normal distribution, for example:

  • The binomial distribution B(n, p) is approximately normal N(np, np(1-p)) for large n and for p not too close to zero or one.
  • The Poisson(λ) distribution is approximately normal N(λ, λ) for large values of λ.
  • The chi-squared distribution Χ2(k) is approximately normal N(k, 2k) for large k.
  • The Student's t-distribution t(ν) is approximately normal N(0, 1) when ν is large.
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    final double
    The mean.
    final double
    The standard deviation.
  • Constructor Summary

    Constructors
    Constructor
    Description
    GaussianDistribution(double mu, double sigma)
    Constructor
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    cdf(double x)
    Cumulative distribution function.
    double
    Returns Shannon entropy of the distribution.
    fit(double[] data)
    Estimates the distribution parameters by MLE.
    Returns the standard normal distribution.
    double
    Generates a Gaussian random number with the inverse CDF method.
    int
    Returns the number of parameters of the distribution.
    double
    logp(double x)
    The density at x in log scale, which may prevents the underflow problem.
    M(double[] x, double[] posteriori)
    The M step in the EM algorithm, which depends on the specific distribution.
    double
    Returns the mean of distribution.
    double
    p(double x)
    The probability density function for continuous distribution or probability mass function for discrete distribution at x.
    double
    quantile(double p)
    The quantile, the probability to the left of quantile(p) is p.
    double
    Generates a Gaussian random number with the Box-Muller algorithm.
    double
    sd()
    Returns the standard deviation of distribution.
     
    double
    Returns the variance of distribution.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

    Methods inherited from interface smile.stat.distribution.Distribution

    inverseTransformSampling, likelihood, logLikelihood, quantile, quantile, rand, rejectionSampling
  • Field Details

    • mu

      public final double mu
      The mean.
    • sigma

      public final double sigma
      The standard deviation.
  • Constructor Details

    • GaussianDistribution

      public GaussianDistribution(double mu, double sigma)
      Constructor
      Parameters:
      mu - mean.
      sigma - standard deviation.
  • Method Details

    • fit

      public static GaussianDistribution fit(double[] data)
      Estimates the distribution parameters by MLE.
      Parameters:
      data - the training data.
      Returns:
      the distribution.
    • getInstance

      public static GaussianDistribution getInstance()
      Returns the standard normal distribution.
      Returns:
      the standard normal distribution.
    • length

      public int length()
      Description copied from interface: Distribution
      Returns the number of parameters of the distribution. The "length" is in the sense of the minimum description length principle.
      Specified by:
      length in interface Distribution
      Returns:
      The number of parameters.
    • mean

      public double mean()
      Description copied from interface: Distribution
      Returns the mean of distribution.
      Specified by:
      mean in interface Distribution
      Returns:
      The mean.
    • variance

      public double variance()
      Description copied from interface: Distribution
      Returns the variance of distribution.
      Specified by:
      variance in interface Distribution
      Returns:
      The variance.
    • sd

      public double sd()
      Description copied from interface: Distribution
      Returns the standard deviation of distribution.
      Specified by:
      sd in interface Distribution
      Returns:
      The standard deviation.
    • entropy

      public double entropy()
      Description copied from interface: Distribution
      Returns Shannon entropy of the distribution.
      Specified by:
      entropy in interface Distribution
      Returns:
      Shannon entropy.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • rand

      public double rand()
      Generates a Gaussian random number with the Box-Muller algorithm.
      Specified by:
      rand in interface Distribution
      Returns:
      a random number.
    • inverseCDF

      public double inverseCDF()
      Generates a Gaussian random number with the inverse CDF method.
      Returns:
      a random number.
    • p

      public double p(double x)
      Description copied from interface: Distribution
      The probability density function for continuous distribution or probability mass function for discrete distribution at x.
      Specified by:
      p in interface Distribution
      Parameters:
      x - a real number.
      Returns:
      the density.
    • logp

      public double logp(double x)
      Description copied from interface: Distribution
      The density at x in log scale, which may prevents the underflow problem.
      Specified by:
      logp in interface Distribution
      Parameters:
      x - a real number.
      Returns:
      the log density.
    • cdf

      public double cdf(double x)
      Description copied from interface: Distribution
      Cumulative distribution function. That is the probability to the left of x.
      Specified by:
      cdf in interface Distribution
      Parameters:
      x - a real number.
      Returns:
      the probability.
    • quantile

      public double quantile(double p)
      The quantile, the probability to the left of quantile(p) is p. This is actually the inverse of cdf.

      Original algorithm and Perl implementation can be found at this page.

      Specified by:
      quantile in interface Distribution
      Parameters:
      p - the probability.
      Returns:
      the quantile.
    • M

      public Mixture.Component M(double[] x, double[] posteriori)
      Description copied from interface: ExponentialFamily
      The M step in the EM algorithm, which depends on the specific distribution.
      Specified by:
      M in interface ExponentialFamily
      Parameters:
      x - the input data for estimation
      posteriori - the posteriori probability.
      Returns:
      the (unnormalized) weight of this distribution in the mixture.