adaboost

fun adaboost(formula: Formula, data: DataFrame, ntrees: Int = 500, maxDepth: Int = 20, maxNodes: Int = 6, nodeSize: Int = 1): AdaBoost

AdaBoost (Adaptive Boosting) classifier with decision trees. In principle, AdaBoost is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. In practice, AdaBoost with decision trees is probably the most popular combination. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. However in some problems it can be less susceptible to the over-fitting problem than most learning algorithms.

AdaBoost calls a weak classifier repeatedly in a series of rounds from total T classifiers. For each call a distribution of weights is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.

The basic AdaBoost algorithm is only for binary classification problem. For multi-class classification, a common approach is reducing the multi-class classification problem to multiple two-class problems. This implementation is a multi-class AdaBoost without such reductions.

====References:====

  • Yoav Freund, Robert E. Schapire. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting, 1995.

  • Ji Zhu, Hui Zhou, Saharon Rosset and Trevor Hastie. Multi-class Adaboost, 2009.

Return

AdaBoost model.

Parameters

formula

a symbolic description of the model to be fitted.

data

the data frame of the explanatory and response variables.

ntrees

the number of trees.

maxDepth

the maximum depth of the tree.

maxNodes

the maximum number of leaf nodes in the tree.

nodeSize

the minimum size of leaf nodes.