Interface Distribution

All Superinterfaces:
Serializable
All Known Subinterfaces:
ExponentialFamily
All Known Implementing Classes:
BernoulliDistribution, BetaDistribution, BinomialDistribution, ChiSquareDistribution, DiscreteDistribution, DiscreteExponentialFamilyMixture, DiscreteMixture, EmpiricalDistribution, ExponentialDistribution, ExponentialFamilyMixture, FDistribution, GammaDistribution, GaussianDistribution, GaussianMixture, GeometricDistribution, HyperGeometricDistribution, KernelDensity, LogisticDistribution, LogNormalDistribution, Mixture, NegativeBinomialDistribution, PoissonDistribution, ShiftedGeometricDistribution, TDistribution, WeibullDistribution

public interface Distribution extends Serializable
Probability distribution of univariate random variable. A probability distribution identifies either the probability of each value of a random variable (when the variable is discrete), or the probability of the value falling within a particular interval (when the variable is continuous). When the random variable takes values in the set of real numbers, the probability distribution is completely described by the cumulative distribution function, whose value at each real x is the probability that the random variable is smaller than or equal to x.

Both rejection and inverse transform sampling methods are implemented to provide some general approaches to generate random samples based on probability density function or quantile function. Besides, a quantile function is also provided based on bisection searching.

See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    cdf(double x)
    Cumulative distribution function.
    double
    Returns Shannon entropy of the distribution.
    default double
    Use inverse transform sampling (also known as the inverse probability integral transform or inverse transformation method or Smirnov transform) to draw a sample from the given distribution.
    int
    Returns the number of parameters of the distribution.
    default double
    likelihood(double[] x)
    The likelihood of the sample set following this distribution.
    default double
    logLikelihood(double[] x)
    The log likelihood of the sample set following this distribution.
    double
    logp(double x)
    The density at x in log scale, which may prevents the underflow problem.
    double
    Returns the mean of distribution.
    double
    p(double x)
    The probability density function for continuous distribution or probability mass function for discrete distribution at x.
    double
    quantile(double p)
    The quantile, the probability to the left of quantile is p.
    default double
    quantile(double p, double xmin, double xmax)
    Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution.
    default double
    quantile(double p, double xmin, double xmax, double eps)
    Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution.
    double
    Generates a random number following this distribution.
    default double[]
    rand(int n)
    Generates a set of random numbers following this distribution.
    default double
    rejectionSampling(double pmax, double xmin, double xmax)
    Use the rejection technique to draw a sample from the given distribution.
    default double
    sd()
    Returns the standard deviation of distribution.
    double
    Returns the variance of distribution.
  • Method Details

    • length

      int length()
      Returns the number of parameters of the distribution. The "length" is in the sense of the minimum description length principle.
      Returns:
      The number of parameters.
    • mean

      double mean()
      Returns the mean of distribution.
      Returns:
      The mean.
    • variance

      double variance()
      Returns the variance of distribution.
      Returns:
      The variance.
    • sd

      default double sd()
      Returns the standard deviation of distribution.
      Returns:
      The standard deviation.
    • entropy

      double entropy()
      Returns Shannon entropy of the distribution.
      Returns:
      Shannon entropy.
    • rand

      double rand()
      Generates a random number following this distribution.
      Returns:
      a random number.
    • rand

      default double[] rand(int n)
      Generates a set of random numbers following this distribution.
      Parameters:
      n - the number of random numbers to generate.
      Returns:
      an array of random numbers.
    • p

      double p(double x)
      The probability density function for continuous distribution or probability mass function for discrete distribution at x.
      Parameters:
      x - a real number.
      Returns:
      the density.
    • logp

      double logp(double x)
      The density at x in log scale, which may prevents the underflow problem.
      Parameters:
      x - a real number.
      Returns:
      the log density.
    • cdf

      double cdf(double x)
      Cumulative distribution function. That is the probability to the left of x.
      Parameters:
      x - a real number.
      Returns:
      the probability.
    • quantile

      double quantile(double p)
      The quantile, the probability to the left of quantile is p. It is actually the inverse of cdf.
      Parameters:
      p - the probability.
      Returns:
      the quantile.
    • likelihood

      default double likelihood(double[] x)
      The likelihood of the sample set following this distribution.
      Parameters:
      x - a set of samples.
      Returns:
      the likelihood.
    • logLikelihood

      default double logLikelihood(double[] x)
      The log likelihood of the sample set following this distribution.
      Parameters:
      x - a set of samples.
      Returns:
      the log likelihood.
    • rejectionSampling

      default double rejectionSampling(double pmax, double xmin, double xmax)
      Use the rejection technique to draw a sample from the given distribution. WARNING: this simulation technique can take a very long time. Rejection sampling is also commonly called the acceptance-rejection method or "accept-reject algorithm". It generates sampling values from an arbitrary probability distribution function f(x) by using an instrumental distribution g(x), under the only restriction that f(x) < M g(x) where M > 1 is an appropriate bound on f(x) / g(x).

      Rejection sampling is usually used in cases where the form of f(x) makes sampling difficult. Instead of sampling directly from the distribution f(x), we use an envelope distribution M g(x) where sampling is easier. These samples from M g(x) are probabilistically accepted or rejected.

      This method relates to the general field of Monte Carlo techniques, including Markov chain Monte Carlo algorithms that also use a proxy distribution to achieve simulation from the target distribution f(x). It forms the basis for algorithms such as the Metropolis algorithm.

      Parameters:
      pmax - the scale of instrumental distribution (uniform).
      xmin - the lower bound of random variable range.
      xmax - the upper bound of random variable range.
      Returns:
      a random number.
    • inverseTransformSampling

      default double inverseTransformSampling()
      Use inverse transform sampling (also known as the inverse probability integral transform or inverse transformation method or Smirnov transform) to draw a sample from the given distribution. This is a method for generating sample numbers at random from any probability distribution given its cumulative distribution function (cdf). Subject to the restriction that the distribution is continuous, this method is generally applicable (and can be computationally efficient if the cdf can be analytically inverted), but may be too computationally expensive in practice for some probability distributions. The Box-Muller transform is an example of an algorithm which is less general but more computationally efficient. It is often the case that, even for simple distributions, the inverse transform sampling method can be improved on, given substantial research effort, e.g. the ziggurat algorithm and rejection sampling.
      Returns:
      a random number.
    • quantile

      default double quantile(double p, double xmin, double xmax, double eps)
      Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution.
      Parameters:
      p - the probability.
      xmin - the lower bound of search range.
      xmax - the upper bound of search range.
      eps - the epsilon close to zero.
      Returns:
      the quantile.
    • quantile

      default double quantile(double p, double xmin, double xmax)
      Inversion of CDF by bisection numeric root finding of "cdf(x) = p" for continuous distribution. The default epsilon is 1E-6.
      Parameters:
      p - the probability.
      xmin - the lower bound of search range.
      xmax - the upper bound of search range.
      Returns:
      the quantile.