Package smile.classification
Class AdaBoost
- All Implemented Interfaces:
Serializable
,ToDoubleFunction<Tuple>
,ToIntFunction<Tuple>
,Classifier<Tuple>
,DataFrameClassifier
,SHAP<Tuple>
,TreeSHAP
AdaBoost (Adaptive Boosting) classifier with decision trees. In principle,
AdaBoost is a meta-algorithm, and can be used in conjunction with many other
learning algorithms to improve their performance. In practice, AdaBoost with
decision trees is probably the most popular combination. AdaBoost is adaptive
in the sense that subsequent classifiers built are tweaked in favor of those
instances misclassified by previous classifiers. AdaBoost is sensitive to
noisy data and outliers. However, in some problems it can be less susceptible
to the over-fitting problem than most learning algorithms.
AdaBoost calls a weak classifier repeatedly in a series of rounds from total T classifiers. For each call a distribution of weights is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.
The basic AdaBoost algorithm is only for binary classification problem. For multi-class classification, a common approach is reducing the multi-class classification problem to multiple two-class problems. This implementation is a multi-class AdaBoost without such reductions.
References
- Yoav Freund, Robert E. Schapire. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting, 1995.
- Ji Zhu, Hui Zhou, Saharon Rosset and Trevor Hastie. Multi-class Adaboost, 2009.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface smile.classification.Classifier
Classifier.Trainer<T,
M extends Classifier<T>> Nested classes/interfaces inherited from interface smile.classification.DataFrameClassifier
DataFrameClassifier.Trainer<M extends DataFrameClassifier>
-
Field Summary
Fields inherited from class smile.classification.AbstractClassifier
classes
-
Constructor Summary
ConstructorDescriptionAdaBoost
(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance) Constructor.AdaBoost
(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance, IntSet labels) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionstatic AdaBoost
Fits a AdaBoost model.static AdaBoost
Fits a AdaBoost model.static AdaBoost
fit
(Formula formula, DataFrame data, Properties params) Fits a AdaBoost model.formula()
Returns the formula associated with the model.double[]
Returns the variable importance.int
Predicts the class label of an instance.int
Predicts the class label of an instance and also calculate a posteriori probabilities.schema()
Returns the predictor schema.int
size()
Returns the number of trees in the model.boolean
soft()
Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.int[][]
Test the model on a validation dataset.trees()
Returns the decision trees.void
trim
(int ntrees) Trims the tree model set to a smaller size in case of over-fitting.Methods inherited from class smile.classification.AbstractClassifier
classes, numClasses
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface smile.classification.Classifier
applyAsDouble, applyAsInt, classes, numClasses, online, predict, predict, predict, predict, predict, predict, score, update, update, update
Methods inherited from interface smile.classification.DataFrameClassifier
predict, predict
-
Constructor Details
-
AdaBoost
public AdaBoost(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance) Constructor.- Parameters:
formula
- a symbolic description of the model to be fitted.k
- the number of classes.trees
- forest of decision trees.alpha
- the weight of each decision tree.error
- the weighted error of each decision tree during training.importance
- variable importance
-
AdaBoost
public AdaBoost(Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance, IntSet labels) Constructor.- Parameters:
formula
- a symbolic description of the model to be fitted.k
- the number of classes.trees
- forest of decision trees.alpha
- the weight of each decision tree.error
- the weighted error of each decision tree during training.importance
- variable importancelabels
- the class label encoder.
-
-
Method Details
-
fit
Fits a AdaBoost model.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.- Returns:
- the model.
-
fit
Fits a AdaBoost model.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.params
- the hyperparameters.- Returns:
- the model.
-
fit
public static AdaBoost fit(Formula formula, DataFrame data, int ntrees, int maxDepth, int maxNodes, int nodeSize) Fits a AdaBoost model.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.ntrees
- the number of trees.maxDepth
- the maximum depth of the tree.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.- Returns:
- the model.
-
formula
Description copied from interface:DataFrameClassifier
Returns the formula associated with the model.- Specified by:
formula
in interfaceDataFrameClassifier
- Specified by:
formula
in interfaceTreeSHAP
- Returns:
- the formula associated with the model.
-
schema
Description copied from interface:DataFrameClassifier
Returns the predictor schema.- Specified by:
schema
in interfaceDataFrameClassifier
- Returns:
- the predictor schema.
-
importance
public double[] importance()Returns the variable importance. Every time a split of a node is made on variable the (GINI, information gain, etc.) impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over all trees in the forest gives a simple measure of variable importance.- Returns:
- the variable importance
-
size
public int size()Returns the number of trees in the model.- Returns:
- the number of trees in the model
-
trees
Returns the decision trees. -
trim
public void trim(int ntrees) Trims the tree model set to a smaller size in case of over-fitting. Or if extra decision trees in the model don't improve the performance, we may remove them to reduce the model size and also improve the speed of prediction.- Parameters:
ntrees
- the new (smaller) size of tree model set.
-
soft
public boolean soft()Description copied from interface:Classifier
Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.- Specified by:
soft
in interfaceClassifier<Tuple>
- Returns:
- true if soft classifier.
-
predict
Description copied from interface:Classifier
Predicts the class label of an instance.- Specified by:
predict
in interfaceClassifier<Tuple>
- Parameters:
x
- the instance to be classified.- Returns:
- the predicted class label.
-
predict
Predicts the class label of an instance and also calculate a posteriori probabilities. Not supported.- Specified by:
predict
in interfaceClassifier<Tuple>
- Parameters:
x
- an instance to be classified.posteriori
- a posteriori probabilities on output.- Returns:
- the predicted class label
-
test
Test the model on a validation dataset.- Parameters:
data
- the validation data.- Returns:
- the predictions with first 1, 2, ..., decision trees.
-