Class AdaBoost
- All Implemented Interfaces:
Serializable, ToDoubleFunction<Tuple>, ToIntFunction<Tuple>, Classifier<Tuple>, DataFrameClassifier, SHAP<Tuple>, TreeSHAP
AdaBoost (Adaptive Boosting) classifier with decision trees. In principle,
AdaBoost is a meta-algorithm, and can be used in conjunction with many other
learning algorithms to improve their performance. In practice, AdaBoost with
decision trees is probably the most popular combination. AdaBoost is adaptive
in the sense that subsequent classifiers built are tweaked in favor of those
instances misclassified by previous classifiers. AdaBoost is sensitive to
noisy data and outliers. However, in some problems it can be less susceptible
to the over-fitting problem than most learning algorithms.
AdaBoost calls a weak classifier repeatedly in a series of rounds from total T classifiers. For each call a distribution of weights is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.
The basic AdaBoost algorithm is only for binary classification problem. For multi-class classification, a common approach is reducing the multi-class classification problem to multiple two-class problems. This implementation is a multi-class AdaBoost without such reductions.
References
- Yoav Freund, Robert E. Schapire. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting, 1995.
- Ji Zhu, Hui Zhou, Saharon Rosset and Trevor Hastie. Multi-class Adaboost, 2009.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordAdaBoost hyperparameters.static final recordTraining status per tree.Nested classes/interfaces inherited from interface Classifier
Classifier.Trainer<T,M> Nested classes/interfaces inherited from interface DataFrameClassifier
DataFrameClassifier.Trainer<M> -
Field Summary
Fields inherited from class AbstractClassifier
classes -
Constructor Summary
ConstructorsConstructorDescriptionAdaBoost(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance) Constructor.AdaBoost(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance, IntSet labels) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionstatic AdaBoostFits a AdaBoost model.static AdaBoostfit(Formula formula, DataFrame data, AdaBoost.Options options) Fits a AdaBoost model.formula()Returns the formula associated with the model.double[]Returns the variable importance.intPredicts the class label of an instance.intPredicts the class label of an instance and also calculate a posteriori probabilities.schema()Returns the predictor schema.intsize()Returns the number of trees in the model.booleansoft()Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.int[][]Test the model on a validation dataset.trees()Returns the decision trees.voidtrim(int ntrees) Trims the tree model set to a smaller size in case of over-fitting.Methods inherited from class AbstractClassifier
classes, numClassesMethods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface Classifier
applyAsDouble, applyAsInt, classes, numClasses, online, predict, predict, predict, predict, predict, predict, score, update, update, updateMethods inherited from interface DataFrameClassifier
predict, predict
-
Constructor Details
-
AdaBoost
public AdaBoost(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance) Constructor.- Parameters:
formula- a symbolic description of the model to be fitted.k- the number of classes.trees- forest of decision trees.alpha- the weight of each decision tree.error- the weighted error of each decision tree during training.importance- variable importance
-
AdaBoost
public AdaBoost(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance, IntSet labels) Constructor.- Parameters:
formula- a symbolic description of the model to be fitted.k- the number of classes.trees- forest of decision trees.alpha- the weight of each decision tree.error- the weighted error of each decision tree during training.importance- variable importancelabels- the class label encoder.
-
-
Method Details
-
fit
-
fit
Fits a AdaBoost model.- Parameters:
formula- a symbolic description of the model to be fitted.data- the data frame of the explanatory and response variables.options- the hyperparameters.- Returns:
- the model.
-
formula
Description copied from interface:DataFrameClassifierReturns the formula associated with the model.- Specified by:
formulain interfaceDataFrameClassifier- Specified by:
formulain interfaceTreeSHAP- Returns:
- the formula associated with the model.
-
schema
Description copied from interface:DataFrameClassifierReturns the predictor schema.- Specified by:
schemain interfaceDataFrameClassifier- Returns:
- the predictor schema.
-
importance
public double[] importance()Returns the variable importance. Every time a split of a node is made on variable the (GINI, information gain, etc.) impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over all trees in the forest gives a simple measure of variable importance.- Returns:
- the variable importance
-
size
public int size()Returns the number of trees in the model.- Returns:
- the number of trees in the model
-
trees
Returns the decision trees. -
trim
public void trim(int ntrees) Trims the tree model set to a smaller size in case of over-fitting. Or if extra decision trees in the model don't improve the performance, we may remove them to reduce the model size and also improve the speed of prediction.- Parameters:
ntrees- the new (smaller) size of tree model set.
-
soft
public boolean soft()Description copied from interface:ClassifierReturns true if this is a soft classifier that can estimate the posteriori probabilities of classification.- Specified by:
softin interfaceClassifier<Tuple>- Returns:
- true if soft classifier.
-
predict
Description copied from interface:ClassifierPredicts the class label of an instance.- Specified by:
predictin interfaceClassifier<Tuple>- Parameters:
x- the instance to be classified.- Returns:
- the predicted class label.
-
predict
Predicts the class label of an instance and also calculate a posteriori probabilities.- Specified by:
predictin interfaceClassifier<Tuple>- Parameters:
x- an instance to be classified.posteriori- a posteriori probabilities on output.- Returns:
- the predicted class label
-
test
Test the model on a validation dataset.- Parameters:
data- the validation data.- Returns:
- the predictions with first 1, 2, ..., decision trees.
-