Interface ModelSelection


public interface ModelSelection
Model selection criteria. Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice (Occam's razor).

A good model selection technique will balance goodness of fit with simplicity. More complex models will be better able to adapt their shape to fit the data, but the additional parameters may not represent anything useful. Goodness of fit is generally determined using a likelihood ratio approach, or an approximation of this, leading to a chi-squared test. The complexity is generally measured by counting the number of parameters in the model.

The most commonly used criteria are the Akaike information criterion and the Bayesian information criterion. The formula for BIC is similar to the formula for AIC, but with a different penalty for the number of parameters. With AIC the penalty is 2k, whereas with BIC the penalty is log(n) * k.

AIC and BIC are both approximately correct according to a different goal and a different set of asymptotic assumptions. Both sets of assumptions have been criticized as unrealistic.

AIC is better in situations when a false negative finding would be considered more misleading than a false positive, and BIC is better in situations where a false positive is as misleading as, or more misleading than, a false negative.

  • Method Summary

    Static Methods
    Modifier and Type
    Method
    Description
    static double
    AIC(double logL, int k)
    Akaike information criterion.
    static double
    BIC(double logL, int k, int n)
    Bayesian information criterion.
  • Method Details

    • AIC

      static double AIC(double logL, int k)
      Akaike information criterion. AIC = 2 * k - 2 * log(L), where L is the likelihood of estimated model and n is the number of samples.
      Parameters:
      logL - the log-likelihood of estimated model.
      k - the number of free parameters to be estimated in the model.
      Returns:
      the AIC score.
    • BIC

      static double BIC(double logL, int k, int n)
      Bayesian information criterion. BIC = k * log(n) - 2 * log(L), where L is the likelihood of estimated model, k is the number of free parameters to be estimated in the model, and n is the number of samples.
      Parameters:
      logL - the log-likelihood of estimated model.
      k - the number of free parameters to be estimated in the model.
      n - the number of samples.
      Returns:
      the BIC score.