All Superinterfaces:: Serializable

All Known Subinterfaces:: ExponentialFamily

All Known Implementing Classes:: BernoulliDistribution, BetaDistribution, BinomialDistribution, ChiSquareDistribution, DiscreteDistribution, DiscreteExponentialFamilyMixture, DiscreteMixture, EmpiricalDistribution, ExponentialDistribution, ExponentialFamilyMixture, FDistribution, GammaDistribution, GaussianDistribution, GaussianMixture, GeometricDistribution, HyperGeometricDistribution, KernelDensity, LogisticDistribution, LogNormalDistribution, Mixture, NegativeBinomialDistribution, PoissonDistribution, ShiftedGeometricDistribution, TDistribution, WeibullDistribution

public interface Distribution extends Serializable

Probability distribution of univariate random variable. A probability distribution identifies either the probability of each value of a random variable (when the variable is discrete), or the probability of the value falling within a particular interval (when the variable is continuous). When the random variable takes values in the set of real numbers, the probability distribution is completely described by the cumulative distribution function, whose value at each real x is the probability that the random variable is smaller than or equal to x.

Both rejection and inverse transform sampling methods are implemented to provide some general approaches to generate random samples based on probability density function or quantile function. Besides, a quantile function is also provided based on bisection searching.

See Also:

Method Summary

Modifier and Type

Method

Description

double

cdf(double x)

Cumulative distribution function.

double

entropy()

Returns Shannon entropy of the distribution.

default double

inverseTransformSampling()

Use inverse transform sampling (also known as the inverse probability integral transform or inverse transformation method or Smirnov transform) to draw a sample from the given distribution.

int

length()

Returns the number of parameters of the distribution.

default double

likelihood(double[] x)

The likelihood of the sample set following this distribution.

default double

logLikelihood(double[] x)

The log likelihood of the sample set following this distribution.

double

logp(double x)

The density at x in log scale, which may prevents the underflow problem.

double

mean()

Returns the mean of distribution.

double

p(double x)

The probability density function for continuous distribution or probability mass function for discrete distribution at x.

double

quantile(double p)

The quantile, the probability to the left of quantile is p.

default double

quantile(double p, double xmin, double xmax)

Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution.

default double

quantile(double p, double xmin, double xmax, double eps)

Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution.

double

rand()

Generates a random number following this distribution.

default double[]

rand(int n)

Generates a set of random numbers following this distribution.

default double

rejectionSampling(double pmax, double xmin, double xmax)

Use the rejection technique to draw a sample from the given distribution.

default double

sd()

Returns the standard deviation of distribution.

double

variance()

Returns the variance of distribution.

Method Details
- length
  
  int length()
  
  Returns the number of parameters of the distribution. The "length" is in the sense of the minimum description length principle.
  
  Returns:
  
  The number of parameters.
- mean
  
  double mean()
  
  Returns the mean of distribution.
  
  Returns:
  
  The mean.
- variance
  
  double variance()
  
  Returns the variance of distribution.
  
  Returns:
  
  The variance.
- sd
  
  default double sd()
  
  Returns the standard deviation of distribution.
  
  Returns:
  
  The standard deviation.
- entropy
  
  double entropy()
  
  Returns Shannon entropy of the distribution.
  
  Returns:
  
  Shannon entropy.
- rand
  
  double rand()
  
  Generates a random number following this distribution.
  
  Returns:
  
  a random number.
- rand
  
  default double[] rand(int n)
  
  Generates a set of random numbers following this distribution.
  
  Parameters:
  
  n - the number of random numbers to generate.
  
  Returns:
  
  an array of random numbers.
- p
  
  double p(double x)
  
  The probability density function for continuous distribution or probability mass function for discrete distribution at x.
  
  Parameters:
  
  x - a real number.
  
  Returns:
  
  the density.
- logp
  
  double logp(double x)
  
  The density at x in log scale, which may prevents the underflow problem.
  
  Parameters:
  
  x - a real number.
  
  Returns:
  
  the log density.
- cdf
  
  double cdf(double x)
  
  Cumulative distribution function. That is the probability to the left of x.
  
  Parameters:
  
  x - a real number.
  
  Returns:
  
  the probability.
- quantile
  
  double quantile(double p)
  
  The quantile, the probability to the left of quantile is p. It is actually the inverse of cdf.
  
  Parameters:
  
  p - the probability.
  
  Returns:
  
  the quantile.
- likelihood
  
  default double likelihood(double[] x)
  
  The likelihood of the sample set following this distribution.
  
  Parameters:
  
  x - a set of samples.
  
  Returns:
  
  the likelihood.
- logLikelihood
  
  default double logLikelihood(double[] x)
  
  The log likelihood of the sample set following this distribution.
  
  Parameters:
  
  x - a set of samples.
  
  Returns:
  
  the log likelihood.
- rejectionSampling
  
  default double rejectionSampling(double pmax, double xmin, double xmax)
  
  Use the rejection technique to draw a sample from the given distribution. WARNING: this simulation technique can take a very long time. Rejection sampling is also commonly called the acceptance-rejection method or "accept-reject algorithm". It generates sampling values from an arbitrary probability distribution function f(x) by using an instrumental distribution g(x), under the only restriction that f(x) < M g(x) where M > 1 is an appropriate bound on f(x) / g(x).
  Rejection sampling is usually used in cases where the form of f(x) makes sampling difficult. Instead of sampling directly from the distribution f(x), we use an envelope distribution M g(x) where sampling is easier. These samples from M g(x) are probabilistically accepted or rejected.
  This method relates to the general field of Monte Carlo techniques, including Markov chain Monte Carlo algorithms that also use a proxy distribution to achieve simulation from the target distribution f(x). It forms the basis for algorithms such as the Metropolis algorithm.
  
  Parameters:
  
  pmax - the scale of instrumental distribution (uniform).
  
  xmin - the lower bound of random variable range.
  
  xmax - the upper bound of random variable range.
  
  Returns:
  
  a random number.
- inverseTransformSampling
  
  default double inverseTransformSampling()
  
  Use inverse transform sampling (also known as the inverse probability integral transform or inverse transformation method or Smirnov transform) to draw a sample from the given distribution. This is a method for generating sample numbers at random from any probability distribution given its cumulative distribution function (cdf). Subject to the restriction that the distribution is continuous, this method is generally applicable (and can be computationally efficient if the cdf can be analytically inverted), but may be too computationally expensive in practice for some probability distributions. The Box-Muller transform is an example of an algorithm which is less general but more computationally efficient. It is often the case that, even for simple distributions, the inverse transform sampling method can be improved on, given substantial research effort, e.g. the ziggurat algorithm and rejection sampling.
  
  Returns:
  
  a random number.
- quantile
  
  default double quantile(double p, double xmin, double xmax, double eps)
  
  Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution.
  
  Parameters:
  
  p - the probability.
  
  xmin - the lower bound of search range.
  
  xmax - the upper bound of search range.
  
  eps - the epsilon close to zero.
  
  Returns:
  
  the quantile.
- quantile
  
  default double quantile(double p, double xmin, double xmax)
  
  Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution. The default epsilon is 1E-6.
  
  Parameters:
  
  p - the probability.
  
  xmin - the lower bound of search range.
  
  xmax - the upper bound of search range.
  
  Returns:
  
  the quantile.

Interface Distribution

Method Summary

Method Details

length

mean

variance

sd

entropy

rand

rand

p

logp

cdf

quantile

likelihood

logLikelihood

rejectionSampling

inverseTransformSampling

quantile

quantile