Package smile.regression
Class RegressionTree
java.lang.Object
smile.base.cart.CART
smile.regression.RegressionTree
- All Implemented Interfaces:
Serializable
,ToDoubleFunction<Tuple>
,SHAP<Tuple>
,DataFrameRegression
,Regression<Tuple>
Regression tree. A classification/regression tree can be learned by
splitting the training set into subsets based on an attribute value
test. This process is repeated on each derived subset in a recursive
manner called recursive partitioning.
Classification and Regression Tree techniques have a number of advantages over many of those alternative techniques.
- Simple to understand and interpret.
- In most cases, the interpretation of results summarized in a tree is very simple. This simplicity is useful not only for purposes of rapid classification of new observations, but can also often yield a much simpler "model" for explaining why observations are classified or predicted in a particular manner.
- Able to handle both numerical and categorical data.
- Other techniques are usually specialized in analyzing datasets that have only one type of variable.
- Tree methods are nonparametric and nonlinear.
- The final results of using tree methods for classification or regression can be summarized in a series of (usually few) logical if-then conditions (tree nodes). Therefore, there is no implicit assumption that the underlying relationships between the predictor variables and the dependent variable are linear, follow some specific non-linear link function, or that they are even monotonic in nature. Thus, tree methods are particularly well suited for data mining tasks, where there is often little a priori knowledge nor any coherent set of theories or predictions regarding which variables are related and how. In those types of data analytics, tree methods can often reveal simple relationships between just a few variables that could have easily gone unnoticed using other analytic techniques.
Some techniques such as bagging, boosting, and random forest use more than one decision tree for their analysis.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface smile.regression.DataFrameRegression
DataFrameRegression.Trainer<M extends DataFrameRegression>
-
Field Summary
-
Constructor Summary
ConstructorDescriptionRegressionTree
(DataFrame x, Loss loss, StructField response, int maxDepth, int maxNodes, int nodeSize, int mtry, int[] samples, int[][] order) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionfindBestSplit
(LeafNode leaf, int j, double impurity, int lo, int hi) Finds the best split for given column.static RegressionTree
Fits a regression tree.static RegressionTree
Fits a regression tree.static RegressionTree
fit
(Formula formula, DataFrame data, Properties params) Fits a regression tree.formula()
Returns null if the tree is part of ensemble algorithm.protected double
Returns the impurity of node.protected LeafNode
newNode
(int[] nodeSamples) Creates a new leaf node.double
Predicts the dependent variable of an instance.schema()
Returns the schema of predictors.Methods inherited from class smile.base.cart.CART
clear, dot, findBestSplit, importance, order, predictors, root, shap, shap, size, split, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface smile.regression.DataFrameRegression
predict
Methods inherited from interface smile.regression.Regression
applyAsDouble, online, predict, predict, predict, update, update, update
-
Constructor Details
-
RegressionTree
public RegressionTree(DataFrame x, Loss loss, StructField response, int maxDepth, int maxNodes, int nodeSize, int mtry, int[] samples, int[][] order) Constructor. Fits a regression tree for AdaBoost and Random Forest.- Parameters:
x
- the data frame of the explanatory variable.loss
- the loss function.response
- the metadata of response variable.maxDepth
- the maximum depth of the tree.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the minimum size of leaf nodes.mtry
- the number of input variables to pick to split on at each node. It seems that sqrt(p) give generally good performance, where p is the number of variables.samples
- the sample set of instances for stochastic learning. samples[i] is the number of sampling for instance i.order
- the index of training values in ascending order. Note that only numeric attributes need be sorted.
-
-
Method Details
-
impurity
Description copied from class:CART
Returns the impurity of node. -
newNode
Description copied from class:CART
Creates a new leaf node. -
findBestSplit
Description copied from class:CART
Finds the best split for given column.- Specified by:
findBestSplit
in classCART
- Parameters:
leaf
- the node to split.j
- the column to split on.impurity
- the impurity of node.lo
- the lower bound of sample index in the node.hi
- the upper bound of sample index in the node.- Returns:
- the best split.
-
fit
Fits a regression tree.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.- Returns:
- the model.
-
fit
Fits a regression tree. The hyper-parameters inprop
includesmile.cart.node.size
smile.cart.max.nodes
- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.params
- the hyper-parameters.- Returns:
- the model.
-
fit
public static RegressionTree fit(Formula formula, DataFrame data, int maxDepth, int maxNodes, int nodeSize) Fits a regression tree.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.maxDepth
- the maximum depth of the tree.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the minimum size of leaf nodes.- Returns:
- the model.
-
predict
Description copied from interface:Regression
Predicts the dependent variable of an instance.- Specified by:
predict
in interfaceRegression<Tuple>
- Parameters:
x
- an instance.- Returns:
- the predicted value of dependent variable.
-
formula
Returns null if the tree is part of ensemble algorithm.- Specified by:
formula
in interfaceDataFrameRegression
- Returns:
- the model formula.
-
schema
Description copied from interface:DataFrameRegression
Returns the schema of predictors.- Specified by:
schema
in interfaceDataFrameRegression
- Returns:
- the schema of predictors.
-