Class GaussianDistribution
 All Implemented Interfaces:
Serializable
,Distribution
,ExponentialFamily
The family of normal distributions is closed under linear transformations.
That is, if X is normally distributed, then a linear transform aX + b
(for some real numbers a ≠ 0 and b
) is also normally distributed.
If X_{1}
, X_{2}
are two
independent normal random variables, then their linear combination
will also be normally distributed. The converse is also true: if
X_{1}
and X_{2}
are independent
and their sum X_{1} + X_{2}
is distributed
normally, then both X_{1}
and X_{2}
must also be normal, which is known as the Cramer's theorem. Of all
probability distributions over the real domain with mean μ
and variance σ^{2}
, the normal
distribution N(μ, σ^{2})
is the one with the maximum entropy.
The central limit theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have approximately normal distribution. For example if X_{1}, …, X_{n} is a sequence of iid random variables, each having mean μ and variance σ^{2} but otherwise distributions of X_{i}'s can be arbitrary, then the central limit theorem states that
√n (1⁄n Σ X_{i}  μ) → N(0, σ^{2}).
The theorem will hold even if the summands X_{i}
are not iid,
although some constraints on the degree of dependence and the growth rate
of moments still have to be imposed.
Therefore, certain other distributions can be approximated by the normal distribution, for example:
 The binomial distribution
B(n, p)
is approximately normalN(np, np(1p))
for large n and for p not too close to zero or one.  The
Poisson(λ)
distribution is approximately normalN(λ, λ)
for large values of λ.  The chisquared distribution
Χ^{2}(k)
is approximately normalN(k, 2k)
for large k.  The Student's tdistribution
t(ν)
is approximately normalN(0, 1)
when ν is large.
 See Also:

Field Summary

Constructor Summary

Method Summary
Modifier and TypeMethodDescriptiondouble
cdf
(double x) Cumulative distribution function.double
entropy()
Returns Shannon entropy of the distribution.static GaussianDistribution
fit
(double[] data) Estimates the distribution parameters by MLE.static GaussianDistribution
Returns the standard normal distribution.double
Generates a Gaussian random number with the inverse CDF method.int
length()
Returns the number of parameters of the distribution.double
logp
(double x) The density at x in log scale, which may prevents the underflow problem.M
(double[] x, double[] posteriori) The M step in the EM algorithm, which depends on the specific distribution.double
mean()
Returns the mean of distribution.double
p
(double x) The probability density function for continuous distribution or probability mass function for discrete distribution at x.double
quantile
(double p) The quantile, the probability to the left of quantile(p) is p.double
rand()
Generates a Gaussian random number with the BoxMuller algorithm.double
sd()
Returns the standard deviation of distribution.toString()
double
variance()
Returns the variance of distribution.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface smile.stat.distribution.Distribution
inverseTransformSampling, likelihood, logLikelihood, quantile, quantile, rand, rejectionSampling

Field Details

mu
public final double muThe mean. 
sigma
public final double sigmaThe standard deviation.


Constructor Details

GaussianDistribution
public GaussianDistribution(double mu, double sigma) Constructor Parameters:
mu
 mean.sigma
 standard deviation.


Method Details

fit
Estimates the distribution parameters by MLE. Parameters:
data
 the training data. Returns:
 the distribution.

getInstance
Returns the standard normal distribution. Returns:
 the standard normal distribution.

length
public int length()Description copied from interface:Distribution
Returns the number of parameters of the distribution. The "length" is in the sense of the minimum description length principle. Specified by:
length
in interfaceDistribution
 Returns:
 The number of parameters.

mean
public double mean()Description copied from interface:Distribution
Returns the mean of distribution. Specified by:
mean
in interfaceDistribution
 Returns:
 The mean.

variance
public double variance()Description copied from interface:Distribution
Returns the variance of distribution. Specified by:
variance
in interfaceDistribution
 Returns:
 The variance.

sd
public double sd()Description copied from interface:Distribution
Returns the standard deviation of distribution. Specified by:
sd
in interfaceDistribution
 Returns:
 The standard deviation.

entropy
public double entropy()Description copied from interface:Distribution
Returns Shannon entropy of the distribution. Specified by:
entropy
in interfaceDistribution
 Returns:
 Shannon entropy.

toString

rand
public double rand()Generates a Gaussian random number with the BoxMuller algorithm. Specified by:
rand
in interfaceDistribution
 Returns:
 a random number.

inverseCDF
public double inverseCDF()Generates a Gaussian random number with the inverse CDF method. Returns:
 a random number.

p
public double p(double x) Description copied from interface:Distribution
The probability density function for continuous distribution or probability mass function for discrete distribution at x. Specified by:
p
in interfaceDistribution
 Parameters:
x
 a real number. Returns:
 the density.

logp
public double logp(double x) Description copied from interface:Distribution
The density at x in log scale, which may prevents the underflow problem. Specified by:
logp
in interfaceDistribution
 Parameters:
x
 a real number. Returns:
 the log density.

cdf
public double cdf(double x) Description copied from interface:Distribution
Cumulative distribution function. That is the probability to the left of x. Specified by:
cdf
in interfaceDistribution
 Parameters:
x
 a real number. Returns:
 the probability.

quantile
public double quantile(double p) The quantile, the probability to the left of quantile(p) is p. This is actually the inverse of cdf.Original algorithm and Perl implementation can be found at this page.
 Specified by:
quantile
in interfaceDistribution
 Parameters:
p
 the probability. Returns:
 the quantile.

M
Description copied from interface:ExponentialFamily
The M step in the EM algorithm, which depends on the specific distribution. Specified by:
M
in interfaceExponentialFamily
 Parameters:
x
 the input data for estimationposteriori
 the posteriori probability. Returns:
 the (unnormalized) weight of this distribution in the mixture.
