Interface ModelSelection
A good model selection technique will balance goodness of fit with simplicity. More complex models will be better able to adapt their shape to fit the data, but the additional parameters may not represent anything useful. Goodness of fit is generally determined using a likelihood ratio approach, or an approximation of this, leading to a chi-squared test. The complexity is generally measured by counting the number of parameters in the model.
The most commonly used criteria are the Akaike information criterion
and the Bayesian information criterion. The formula for BIC is similar
to the formula for AIC, but with a different penalty for the number of
parameters. With AIC the penalty is 2k
, whereas with BIC
the penalty is log(n) * k
.
AIC and BIC are both approximately correct according to a different goal and a different set of asymptotic assumptions. Both sets of assumptions have been criticized as unrealistic.
AIC is better in situations when a false negative finding would be considered more misleading than a false positive, and BIC is better in situations where a false positive is as misleading as, or more misleading than, a false negative.
-
Method Summary
-
Method Details
-
AIC
static double AIC(double logL, int k) Akaike information criterion. AIC = 2 * k - 2 * log(L), where L is the likelihood of estimated model and n is the number of samples.- Parameters:
logL
- the log-likelihood of estimated model.k
- the number of free parameters to be estimated in the model.- Returns:
- the AIC score.
-
BIC
static double BIC(double logL, int k, int n) Bayesian information criterion. BIC = k * log(n) - 2 * log(L), where L is the likelihood of estimated model, k is the number of free parameters to be estimated in the model, and n is the number of samples.- Parameters:
logL
- the log-likelihood of estimated model.k
- the number of free parameters to be estimated in the model.n
- the number of samples.- Returns:
- the BIC score.
-