# Class GaussianDistribution

java.lang.Object
smile.stat.distribution.GaussianDistribution
All Implemented Interfaces:
`Serializable`, `Distribution`, `ExponentialFamily`

public class GaussianDistribution extends Object implements ExponentialFamily
The normal distribution or Gaussian distribution is a continuous probability distribution that describes data that clusters around a mean. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known as the Gaussian function or bell curve. The normal distribution can be used to describe any variable that tends to cluster around the mean.

The family of normal distributions is closed under linear transformations. That is, if X is normally distributed, then a linear transform `aX + b` (for some real numbers `a ≠ 0 and b`) is also normally distributed. If `X1`, `X2` are two independent normal random variables, then their linear combination will also be normally distributed. The converse is also true: if `X1` and `X2` are independent and their sum `X1 + X2` is distributed normally, then both `X1` and `X2` must also be normal, which is known as the Cramer's theorem. Of all probability distributions over the real domain with mean `μ` and variance `σ2`, the normal distribution `N(μ, σ2)` is the one with the maximum entropy.

The central limit theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have approximately normal distribution. For example if X1, …, Xn is a sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions of Xi's can be arbitrary, then the central limit theorem states that

n (1⁄n Σ Xi - μ) → N(0, σ2).

The theorem will hold even if the summands `Xi` are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.

Therefore, certain other distributions can be approximated by the normal distribution, for example:

• The binomial distribution `B(n, p)` is approximately normal `N(np, np(1-p))` for large n and for p not too close to zero or one.
• The `Poisson(λ)` distribution is approximately normal `N(λ, λ)` for large values of λ.
• The chi-squared distribution `Χ2(k)` is approximately normal `N(k, 2k)` for large k.
• The Student's t-distribution `t(ν)` is approximately normal `N(0, 1)` when ν is large.
• ## Field Summary

Fields
Modifier and Type
Field
Description
`final double`
`mu`
The mean.
`final double`
`sigma`
The standard deviation.
• ## Constructor Summary

Constructors
Constructor
Description
```GaussianDistribution(double mu, double sigma)```
Constructor
• ## Method Summary

Modifier and Type
Method
Description
`double`
`cdf(double x)`
Cumulative distribution function.
`double`
`entropy()`
Returns Shannon entropy of the distribution.
`static GaussianDistribution`
`fit(double[] data)`
Estimates the distribution parameters by MLE.
`static GaussianDistribution`
`getInstance()`
Returns the standard normal distribution.
`double`
`inverseCDF()`
Generates a Gaussian random number with the inverse CDF method.
`int`
`length()`
Returns the number of parameters of the distribution.
`double`
`logp(double x)`
The density at x in log scale, which may prevents the underflow problem.
`Mixture.Component`
```M(double[] x, double[] posteriori)```
The M step in the EM algorithm, which depends on the specific distribution.
`double`
`mean()`
Returns the mean of distribution.
`double`
`p(double x)`
The probability density function for continuous distribution or probability mass function for discrete distribution at x.
`double`
`quantile(double p)`
The quantile, the probability to the left of quantile(p) is p.
`double`
`rand()`
Generates a Gaussian random number with the Box-Muller algorithm.
`double`
`sd()`
Returns the standard deviation of distribution.
`String`
`toString()`

`double`
`variance()`
Returns the variance of distribution.

### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

### Methods inherited from interface smile.stat.distribution.Distribution

`inverseTransformSampling, likelihood, logLikelihood, quantile, quantile, rand, rejectionSampling`
• ## Field Details

• ### mu

public final double mu
The mean.
• ### sigma

public final double sigma
The standard deviation.
• ## Constructor Details

• ### GaussianDistribution

public GaussianDistribution(double mu, double sigma)
Constructor
Parameters:
`mu` - mean.
`sigma` - standard deviation.
• ## Method Details

• ### fit

public static GaussianDistribution fit(double[] data)
Estimates the distribution parameters by MLE.
Parameters:
`data` - the training data.
Returns:
the distribution.
• ### getInstance

public static GaussianDistribution getInstance()
Returns the standard normal distribution.
Returns:
the standard normal distribution.
• ### length

public int length()
Description copied from interface: `Distribution`
Returns the number of parameters of the distribution. The "length" is in the sense of the minimum description length principle.
Specified by:
`length` in interface `Distribution`
Returns:
The number of parameters.
• ### mean

public double mean()
Description copied from interface: `Distribution`
Returns the mean of distribution.
Specified by:
`mean` in interface `Distribution`
Returns:
The mean.
• ### variance

public double variance()
Description copied from interface: `Distribution`
Returns the variance of distribution.
Specified by:
`variance` in interface `Distribution`
Returns:
The variance.
• ### sd

public double sd()
Description copied from interface: `Distribution`
Returns the standard deviation of distribution.
Specified by:
`sd` in interface `Distribution`
Returns:
The standard deviation.
• ### entropy

public double entropy()
Description copied from interface: `Distribution`
Returns Shannon entropy of the distribution.
Specified by:
`entropy` in interface `Distribution`
Returns:
Shannon entropy.
• ### toString

public String toString()
Overrides:
`toString` in class `Object`
• ### rand

public double rand()
Generates a Gaussian random number with the Box-Muller algorithm.
Specified by:
`rand` in interface `Distribution`
Returns:
a random number.
• ### inverseCDF

public double inverseCDF()
Generates a Gaussian random number with the inverse CDF method.
Returns:
a random number.
• ### p

public double p(double x)
Description copied from interface: `Distribution`
The probability density function for continuous distribution or probability mass function for discrete distribution at x.
Specified by:
`p` in interface `Distribution`
Parameters:
`x` - a real number.
Returns:
the density.
• ### logp

public double logp(double x)
Description copied from interface: `Distribution`
The density at x in log scale, which may prevents the underflow problem.
Specified by:
`logp` in interface `Distribution`
Parameters:
`x` - a real number.
Returns:
the log density.
• ### cdf

public double cdf(double x)
Description copied from interface: `Distribution`
Cumulative distribution function. That is the probability to the left of x.
Specified by:
`cdf` in interface `Distribution`
Parameters:
`x` - a real number.
Returns:
the probability.
• ### quantile

public double quantile(double p)
The quantile, the probability to the left of quantile(p) is p. This is actually the inverse of cdf.

Specified by:
`quantile` in interface `Distribution`
Parameters:
`p` - the probability.
Returns:
the quantile.
• ### M

public Mixture.Component M(double[] x, double[] posteriori)
Description copied from interface: `ExponentialFamily`
The M step in the EM algorithm, which depends on the specific distribution.
Specified by:
`M` in interface `ExponentialFamily`
Parameters:
`x` - the input data for estimation
`posteriori` - the posteriori probability.
Returns:
the (unnormalized) weight of this distribution in the mixture.