smile.classification.AbstractClassifier<Tuple>

smile.classification.GradientTreeBoost

All Implemented Interfaces:: Serializable, ToDoubleFunction<Tuple>, ToIntFunction<Tuple>, Classifier<Tuple>, DataFrameClassifier, SHAP<Tuple>

public class GradientTreeBoost extends AbstractClassifier<Tuple> implements DataFrameClassifier, SHAP<Tuple>

Gradient boosting for classification. Gradient boosting is typically used with decision trees (especially CART regression trees) of a fixed size as base learners. For this special case Friedman proposes a modification to gradient boosting method which improves the quality of fit of each base learner.

Generic gradient boosting at the t-th step would fit a regression tree to pseudo-residuals. Let J be the number of its leaves. The tree partitions the input space into J disjoint regions and predicts a constant value in each region. The parameter J controls the maximum allowed level of interaction between variables in the model. With J = 2 (decision stumps), no interaction between variables is allowed. With J = 3 the model may include effects of the interaction between up to two variables, and so on. Hastie et al. comment that typically 4 <= J <= 8 work well for boosting and results are fairly insensitive to the choice of in this range, J = 2 is insufficient for many applications, and J > 10 is unlikely to be required.

Fitting the training set too closely can lead to degradation of the model's generalization ability. Several so-called regularization techniques reduce this over-fitting effect by constraining the fitting procedure. One natural regularization parameter is the number of gradient boosting iterations T (i.e. the number of trees in the model when the base learner is a decision tree). Increasing T reduces the error on training set, but setting it too high may lead to over-fitting. An optimal value of T is often selected by monitoring prediction error on a separate validation data set.

Another regularization approach is the shrinkage which times a parameter η (called the "learning rate") to update term. Empirically it has been found that using small learning rates (such as η < 0.1) yields dramatic improvements in model's generalization ability over gradient boosting without shrinking (η = 1). However, it comes at the price of increasing computational time both during training and prediction: lower learning rate requires more iterations.

Soon after the introduction of gradient boosting Friedman proposed a minor modification to the algorithm, motivated by Breiman's bagging method. Specifically, he proposed that at each iteration of the algorithm, a base learner should be fit on a subsample of the training set drawn at random without replacement. Friedman observed a substantial improvement in gradient boosting's accuracy with this modification.

Subsample size is some constant fraction f of the size of the training set. When f = 1, the algorithm is deterministic and identical to the one described above. Smaller values of f introduce randomness into the algorithm and help prevent over-fitting, acting as a kind of regularization. The algorithm also becomes faster, because regression trees have to be fit to smaller datasets at each iteration. Typically, f is set to 0.5, meaning that one half of the training set is used to build each base learner.

Also, like in bagging, sub-sampling allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations which were not used in the building of the next base learner. Out-of-bag estimates help avoid the need for an independent validation dataset, but often underestimate actual performance improvement and the optimal number of iterations.

Gradient tree boosting implementations often also use regularization by limiting the minimum number of observations in trees' terminal nodes. It's used in the tree building process by ignoring any splits that lead to nodes containing fewer than this number of training set instances. Imposing this limit helps to reduce variance in predictions at leaves.

References

J. H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine, 1999.
J. H. Friedman. Stochastic Gradient Boosting, 1999.

See Also:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

GradientTreeBoost.Options

Gradient tree boosting hyperparameters.

static final record

GradientTreeBoost.TrainingStatus

Training status per tree.

Nested classes/interfaces inherited from interface smile.classification.Classifier
Classifier.Trainer<T,M extends Classifier<T>>

Nested classes/interfaces inherited from interface smile.classification.DataFrameClassifier
DataFrameClassifier.Trainer<M extends DataFrameClassifier>
Field Summary

Fields inherited from class smile.classification.AbstractClassifier
classes
Constructor Summary

Constructors

Constructor

Description

GradientTreeBoost(Formula formula, RegressionTree[][] trees, double shrinkage, double[] importance)

Constructor of multi-class.

GradientTreeBoost(Formula formula, RegressionTree[][] trees, double shrinkage, double[] importance, IntSet labels)

Constructor of multi-class.

GradientTreeBoost(Formula formula, RegressionTree[] trees, double b, double shrinkage, double[] importance)

Constructor of binary class.

GradientTreeBoost(Formula formula, RegressionTree[] trees, double b, double shrinkage, double[] importance, IntSet labels)

Constructor of binary class.
Method Summary

Modifier and Type

Method

Description

static GradientTreeBoost

fit(Formula formula, DataFrame data)

Fits a gradient tree boosting for classification.

static GradientTreeBoost

fit(Formula formula, DataFrame data, GradientTreeBoost.Options options)

Fits a gradient tree boosting for classification.

Formula

formula()

Returns the formula associated with the model.

double[]

importance()

Returns the variable importance.

int

predict(Tuple x)

Predicts the class label of an instance.

int

predict(Tuple x, double[] posteriori)

Predicts the class label of an instance and also calculate a posteriori probabilities.

StructType

schema()

Returns the predictor schema.

double[]

shap(DataFrame data)

Returns the average of absolute SHAP values over a data frame.

double[]

shap(Tuple x)

Returns the SHAP values.

int

size()

Returns the number of trees in the model.

boolean

soft()

Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.

int[][]

test(DataFrame data)

Test the model on a validation dataset.

RegressionTree[][]

trees()

Returns the regression trees.

GradientTreeBoost

trim(int ntrees)

Trims the tree model set to a smaller size in case of over-fitting.

Methods inherited from class smile.classification.AbstractClassifier
classes, numClasses

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface smile.classification.Classifier
applyAsDouble, applyAsInt, classes, numClasses, online, predict, predict, predict, predict, predict, predict, score, update, update, update

Methods inherited from interface smile.classification.DataFrameClassifier
predict, predict

Methods inherited from interface smile.feature.importance.SHAP
shap

Constructor Details
- GradientTreeBoost
  
  public GradientTreeBoost(Formula formula, RegressionTree[] trees, double b, double shrinkage, double[] importance)
  
  Constructor of binary class.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  trees - the regression trees.
  
  b - the intercept
  
  shrinkage - the shrinkage parameter in (0, 1] controls the learning rate of procedure.
  
  importance - variable importance
- GradientTreeBoost
  
  public GradientTreeBoost(Formula formula, RegressionTree[] trees, double b, double shrinkage, double[] importance, IntSet labels)
  
  Constructor of binary class.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  trees - the regression trees.
  
  b - the intercept
  
  shrinkage - the shrinkage parameter in (0, 1] controls the learning rate of procedure.
  
  importance - variable importance
  
  labels - class labels
- GradientTreeBoost
  
  public GradientTreeBoost(Formula formula, RegressionTree[][] trees, double shrinkage, double[] importance)
  
  Constructor of multi-class.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  trees - the regression trees, one row per class.
  
  shrinkage - the shrinkage parameter in (0, 1] controls the learning rate of procedure.
  
  importance - variable importance
- GradientTreeBoost
  
  public GradientTreeBoost(Formula formula, RegressionTree[][] trees, double shrinkage, double[] importance, IntSet labels)
  
  Constructor of multi-class.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  trees - the regression trees, one row per class.
  
  shrinkage - the shrinkage parameter in (0, 1] controls the learning rate of procedure.
  
  importance - variable importance
  
  labels - the class label encoder.
Method Details
- fit
  
  public static GradientTreeBoost fit(Formula formula, DataFrame data)
  
  Fits a gradient tree boosting for classification.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  data - the data frame of the explanatory and response variables.
  
  Returns:
  
  the model.
- fit
  
  public static GradientTreeBoost fit(Formula formula, DataFrame data, GradientTreeBoost.Options options)
  
  Fits a gradient tree boosting for classification.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  data - the data frame of the explanatory and response variables.
  
  options - the hyperparameters.
  
  Returns:
  
  the model.
- formula
  
  public Formula formula()
  
  Description copied from interface: DataFrameClassifier
  
  Returns the formula associated with the model.
  
  Specified by:
  
  formula in interface DataFrameClassifier
  
  Returns:
  
  the formula associated with the model.
- schema
  
  public StructType schema()
  
  Description copied from interface: DataFrameClassifier
  
  Returns the predictor schema.
  
  Specified by:
  
  schema in interface DataFrameClassifier
  
  Returns:
  
  the predictor schema.
- importance
  
  public double[] importance()
  
  Returns the variable importance. Every time a split of a node is made on variable the impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over all trees in the forest gives a simple measure of variable importance.
  
  Returns:
  
  the variable importance
- size
  
  public int size()
  
  Returns the number of trees in the model.
  
  Returns:
  
  the number of trees in the model.
- trees
  
  public RegressionTree[][] trees()
  
  Returns the regression trees.
  
  Returns:
  
  the regression trees.
- trim
  
  public GradientTreeBoost trim(int ntrees)
  
  Trims the tree model set to a smaller size in case of over-fitting. Or if extra decision trees in the model don't improve the performance, we may remove them to reduce the model size and also improve the speed of prediction.
  
  Parameters:
  
  ntrees - the new (smaller) size of tree model set.
  
  Returns:
  
  the trimmed model.
- predict
  
  public int predict(Tuple x)
  
  Description copied from interface: Classifier
  
  Predicts the class label of an instance.
  
  Specified by:
  
  predict in interface Classifier<Tuple>
  
  Parameters:
  
  x - the instance to be classified.
  
  Returns:
  
  the predicted class label.
- soft
  
  public boolean soft()
  
  Description copied from interface: Classifier
  
  Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.
  
  Specified by:
  
  soft in interface Classifier<Tuple>
  
  Returns:
  
  true if soft classifier.
- predict
  
  public int predict(Tuple x, double[] posteriori)
  
  Description copied from interface: Classifier
  
  Predicts the class label of an instance and also calculate a posteriori probabilities. Classifiers may NOT support this method since not all classification algorithms are able to calculate such a posteriori probabilities.
  
  Specified by:
  
  predict in interface Classifier<Tuple>
  
  Parameters:
  
  x - an instance to be classified.
  
  posteriori - a posteriori probabilities on output.
  
  Returns:
  
  the predicted class label
- test
  
  public int[][] test(DataFrame data)
  
  Test the model on a validation dataset.
  
  Parameters:
  
  data - the test data set.
  
  Returns:
  
  the predictions with first 1, 2, ..., decision trees.
- shap
  
  public double[] shap(DataFrame data)
  
  Returns the average of absolute SHAP values over a data frame.
  
  Parameters:
  
  data - the data set.
  
  Returns:
  
  the average of absolute SHAP values.
- shap
  
  public double[] shap(Tuple x)
  
  Description copied from interface: SHAP
  
  Returns the SHAP values. For regression, the length of SHAP values is same as the number of features. For classification, SHAP values are of p x k, where p is the number of features and k is the classes. The first k elements are the SHAP values of first feature over k classes, respectively. The rest features follow accordingly.
  
  Specified by:
  
  shap in interface SHAP<Tuple>
  
  Parameters:
  
  x - an instance.
  
  Returns:
  
  the SHAP values.

Class GradientTreeBoost

References

Nested Class Summary

Nested classes/interfaces inherited from interface smile.classification.Classifier

Nested classes/interfaces inherited from interface smile.classification.DataFrameClassifier

Field Summary

Fields inherited from class smile.classification.AbstractClassifier

Constructor Summary

Method Summary

Methods inherited from class smile.classification.AbstractClassifier

Methods inherited from class java.lang.Object

Methods inherited from interface smile.classification.Classifier

Methods inherited from interface smile.classification.DataFrameClassifier

Methods inherited from interface smile.feature.importance.SHAP

Constructor Details

GradientTreeBoost

GradientTreeBoost

GradientTreeBoost

GradientTreeBoost

Method Details

fit

fit

formula

schema

importance

size

trees

trim

predict

soft

predict

test

shap

shap