Interface Bootstrap


public interface Bootstrap
The bootstrap is a general tool for assessing statistical accuracy. The basic idea is to randomly draw samples with replacement from the training data, each samples the same size as the original training set. This is done many times (say k = 100), producing k bootstrap datasets. Then we refit the model to each of the bootstrap datasets and examine the behavior of the fits over the k replications.
  • Method Details

    • of

      static Bag[] of(int n, int k)
      Bootstrap sampling.
      Parameters:
      n - the number of samples.
      k - the number of rounds of bootstrap.
      Returns:
      the samplings.
    • of

      static Bag[] of(int[] category, int k)
      Stratified bootstrap sampling.
      Parameters:
      category - the strata labels.
      k - the number of rounds of bootstrap.
      Returns:
      the samplings.
    • classification

      static <T, M extends Classifier<T>> ClassificationValidations<M> classification(int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)
      Runs classification bootstrap validation.
      Type Parameters:
      T - the data type of samples.
      M - the model type.
      Parameters:
      k - k-fold bootstrap sampling.
      x - the samples.
      y - the sample labels.
      trainer - the lambda to train a model.
      Returns:
      the validation results.
    • classification

      static <M extends DataFrameClassifier> ClassificationValidations<M> classification(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
      Runs classification bootstrap validation.
      Type Parameters:
      M - the model type.
      Parameters:
      k - k-fold bootstrap sampling.
      formula - the model specification.
      data - the training/validation data.
      trainer - the lambda to train a model.
      Returns:
      the validation results.
    • regression

      static <T, M extends Regression<T>> RegressionValidations<M> regression(int k, T[] x, double[] y, BiFunction<T[],double[],M> trainer)
      Runs regression bootstrap validation.
      Type Parameters:
      T - the data type of samples.
      M - the model type.
      Parameters:
      k - k-fold bootstrap sampling.
      x - the samples.
      y - the response variable.
      trainer - the lambda to train a model.
      Returns:
      the validation results.
    • regression

      static <M extends DataFrameRegression> RegressionValidations<M> regression(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
      Runs regression bootstrap validation.
      Type Parameters:
      M - the model type.
      Parameters:
      k - k-fold bootstrap sampling.
      formula - the model specification.
      data - the training/validation data.
      trainer - the lambda to train a model.
      Returns:
      the validation results.