public class RandomForest extends java.lang.Object implements SoftClassifier<double[]>, java.io.Serializable
Each tree is constructed using the following algorithm:
Modifier and Type  Class and Description 

static class 
RandomForest.Trainer
Trainer for random forest classifiers.

Constructor and Description 

RandomForest(Attribute[] attributes,
double[][] x,
int[] y,
int ntrees)
Constructor.

RandomForest(Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int mtry)
Constructor.

RandomForest(Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample)
Constructor.

RandomForest(Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
DecisionTree.SplitRule rule)
Constructor.

RandomForest(Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
DecisionTree.SplitRule rule,
int[] classWeight)
Constructor.

RandomForest(double[][] x,
int[] y,
int ntrees)
Constructor.

RandomForest(double[][] x,
int[] y,
int ntrees,
int mtry)
Constructor.

Modifier and Type  Method and Description 

double 
error()
Returns the outofbag estimation of error rate.

DecisionTree[] 
getTrees()
Returns the decision trees.

double[] 
importance()
Returns the variable importance.

int 
predict(double[] x)
Predicts the class label of an instance.

int 
predict(double[] x,
double[] posteriori)
Predicts the class label of an instance and also calculate a posteriori
probabilities.

int 
size()
Returns the number of trees in the model.

double[] 
test(double[][] x,
int[] y)
Test the model on a validation dataset.

double[][] 
test(double[][] x,
int[] y,
ClassificationMeasure[] measures)
Test the model on a validation dataset.

void 
trim(int ntrees)
Trims the tree model set to a smaller size in case of overfitting.

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
predict
public RandomForest(double[][] x, int[] y, int ntrees)
x
 the training instances.y
 the response variable.ntrees
 the number of trees.public RandomForest(double[][] x, int[] y, int ntrees, int mtry)
x
 the training instances.y
 the response variable.ntrees
 the number of trees.mtry
 the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.public RandomForest(Attribute[] attributes, double[][] x, int[] y, int ntrees)
attributes
 the attribute properties.x
 the training instances.y
 the response variable.ntrees
 the number of trees.public RandomForest(Attribute[] attributes, double[][] x, int[] y, int ntrees, int mtry)
attributes
 the attribute properties.x
 the training instances.y
 the response variable.ntrees
 the number of trees.mtry
 the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.public RandomForest(Attribute[] attributes, double[][] x, int[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample)
attributes
 the attribute properties.x
 the training instances.y
 the response variable.ntrees
 the number of trees.mtry
 the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
 the minimum size of leaf nodes.maxNodes
 the maximum number of leaf nodes in the tree.subsample
 the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.public RandomForest(Attribute[] attributes, double[][] x, int[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, DecisionTree.SplitRule rule)
attributes
 the attribute properties.x
 the training instances.y
 the response variable.ntrees
 the number of trees.mtry
 the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
 the minimum size of leaf nodes.maxNodes
 the maximum number of leaf nodes in the tree.subsample
 the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.rule
 Decision tree split rule.public RandomForest(Attribute[] attributes, double[][] x, int[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, DecisionTree.SplitRule rule, int[] classWeight)
attributes
 the attribute properties.x
 the training instances.y
 the response variable.ntrees
 the number of trees.mtry
 the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
 the minimum size of leaf nodes.maxNodes
 the maximum number of leaf nodes in the tree.subsample
 the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.rule
 Decision tree split rule.classWeight
 Priors of the classes. The weight of each class
is roughly the ratio of samples in each class.
For example, if
there are 400 positive samples and 100 negative
samples, the classWeight should be [1, 4]
(assuming label 0 is of negative, label 1 is of
positive).public double error()
public double[] importance()
public int size()
public void trim(int ntrees)
ntrees
 the new (smaller) size of tree model set.public int predict(double[] x)
Classifier
predict
in interface Classifier<double[]>
x
 the instance to be classified.public int predict(double[] x, double[] posteriori)
SoftClassifier
predict
in interface SoftClassifier<double[]>
x
 the instance to be classified.posteriori
 the array to store a posteriori probabilities on output.public double[] test(double[][] x, int[] y)
x
 the test data set.y
 the test data response values.public double[][] test(double[][] x, int[] y, ClassificationMeasure[] measures)
x
 the test data set.y
 the test data labels.measures
 the performance measures of classification.public DecisionTree[] getTrees()