public interface CrossValidation

Cross-validation is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds.

Method Summary

Static Methods

Modifier and Type

Method

Description

static <M extends DataFrameClassifier> ClassificationValidations<M>

classification(int round, int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)

Repeated cross validation of classification.

static <T, M extends Classifier<T>> ClassificationValidations<M>

classification(int round, int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)

Repeated cross validation of classification.

static <M extends DataFrameClassifier> ClassificationValidations<M>

classification(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)

Cross validation of classification.

static <T, M extends Classifier<T>> ClassificationValidations<M>

classification(int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)

Cross validation of classification.

static Bag[]

nonoverlap(int[] group, int k)

Cross validation with non-overlapping groups.

static Bag[]

of(int n, int k)

Creates a k-fold cross validation.

static <M extends DataFrameRegression> RegressionValidations<M>

regression(int round, int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)

Repeated cross validation of regression.

static <T, M extends Regression<T>> RegressionValidations<M>

regression(int round, int k, T[] x, double[] y, BiFunction<T[],double[],M> trainer)

Repeated cross validation of regression.

static <M extends DataFrameRegression> RegressionValidations<M>

regression(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)

Cross validation of regression.

static <T, M extends Regression<T>> RegressionValidations<M>

regression(int k, T[] x, double[] y, BiFunction<T[],double[],M> trainer)

Cross validation of regression.

static Bag[]

stratify(int[] category, int k)

Cross validation with stratified folds.

static <M extends DataFrameClassifier> ClassificationValidations<M>

stratify(int round, int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)

Repeated stratified cross validation of classification.

static <T, M extends Classifier<T>> ClassificationValidations<M>

stratify(int round, int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)

Repeated stratified cross validation of classification.

static <M extends DataFrameClassifier> ClassificationValidations<M>

stratify(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)

Stratified cross validation of classification.

static <T, M extends Classifier<T>> ClassificationValidations<M>

stratify(int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)

Stratified cross validation of classification.

Method Details
- of
  
  static Bag[] of(int n, int k)
  
  Creates a k-fold cross validation.
  
  Parameters:
  
  n - the number of samples.
  
  k - the number of rounds of cross validation.
  
  Returns:
  
  k-fold data splits.
- stratify
  
  static Bag[] stratify(int[] category, int k)
  
  Cross validation with stratified folds. The folds are made by preserving the percentage of samples for each group.
  
  Parameters:
  
  category - the strata labels.
  
  k - the number of folds.
  
  Returns:
  
  k-fold data splits.
- nonoverlap
  
  static Bag[] nonoverlap(int[] group, int k)
  
  Cross validation with non-overlapping groups. The same group will not appear in two different folds (the number of distinct groups has to be at least equal to the number of folds). The folds are approximately balanced in the sense that the number of distinct groups is approximately the same in each fold.
  This is useful when the i.i.d. assumption is known to be broken by the underlying process generating the data. For example, when we have multiple samples by the same user and want to make sure that the model doesn't learn user-specific features that don't generalize to unseen users, this approach could be used.
  
  Parameters:
  
  group - the group labels of the samples.
  
  k - the number of folds.
  
  Returns:
  
  k-fold data splits.
- classification
  
  static <T, M extends Classifier<T>> ClassificationValidations<M> classification(int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)
  
  Cross validation of classification.
  
  Type Parameters:
  
  T - the data type of samples.
  
  M - the model type.
  
  Parameters:
  
  k - k-fold cross validation.
  
  x - the samples.
  
  y - the sample labels.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- classification
  
  static <M extends DataFrameClassifier> ClassificationValidations<M> classification(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
  
  Cross validation of classification.
  
  Type Parameters:
  
  M - the model type.
  
  Parameters:
  
  k - k-fold cross validation.
  
  formula - the model specification.
  
  data - the training/validation data.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- classification
  
  static <T, M extends Classifier<T>> ClassificationValidations<M> classification(int round, int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)
  
  Repeated cross validation of classification.
  
  Type Parameters:
  
  T - the data type of samples.
  
  M - the model type.
  
  Parameters:
  
  round - the number of rounds of repeated cross validation.
  
  k - k-fold cross validation.
  
  x - the samples.
  
  y - the sample labels.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- classification
  
  static <M extends DataFrameClassifier> ClassificationValidations<M> classification(int round, int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
  
  Repeated cross validation of classification.
  
  Type Parameters:
  
  M - the model type.
  
  Parameters:
  
  round - the number of rounds of repeated cross validation.
  
  k - k-fold cross validation.
  
  formula - the model specification.
  
  data - the training/validation data.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- stratify
  
  static <T, M extends Classifier<T>> ClassificationValidations<M> stratify(int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)
  
  Stratified cross validation of classification.
  
  Type Parameters:
  
  T - the data type of samples.
  
  M - the model type.
  
  Parameters:
  
  k - k-fold cross validation.
  
  x - the samples.
  
  y - the sample labels.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- stratify
  
  static <M extends DataFrameClassifier> ClassificationValidations<M> stratify(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
  
  Stratified cross validation of classification.
  
  Type Parameters:
  
  M - the model type.
  
  Parameters:
  
  k - k-fold cross validation.
  
  formula - the model specification.
  
  data - the training/validation data.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- stratify
  
  static <T, M extends Classifier<T>> ClassificationValidations<M> stratify(int round, int k, T[] x, int[] y, BiFunction<T[],int[],M> trainer)
  
  Repeated stratified cross validation of classification.
  
  Type Parameters:
  
  T - the data type of samples.
  
  M - the model type.
  
  Parameters:
  
  round - the number of rounds of repeated cross validation.
  
  k - k-fold cross validation.
  
  x - the samples.
  
  y - the sample labels.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- stratify
  
  static <M extends DataFrameClassifier> ClassificationValidations<M> stratify(int round, int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
  
  Repeated stratified cross validation of classification.
  
  Type Parameters:
  
  M - the model type.
  
  Parameters:
  
  round - the number of rounds of repeated cross validation.
  
  k - k-fold cross validation.
  
  formula - the model specification.
  
  data - the training/validation data.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- regression
  
  static <T, M extends Regression<T>> RegressionValidations<M> regression(int k, T[] x, double[] y, BiFunction<T[],double[],M> trainer)
  
  Cross validation of regression.
  
  Type Parameters:
  
  T - the data type of samples.
  
  M - the model type.
  
  Parameters:
  
  k - k-fold cross validation.
  
  x - the samples.
  
  y - the response variable.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- regression
  
  static <M extends DataFrameRegression> RegressionValidations<M> regression(int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
  
  Cross validation of regression.
  
  Type Parameters:
  
  M - the model type.
  
  Parameters:
  
  k - k-fold cross validation.
  
  formula - the model specification.
  
  data - the training/validation data.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- regression
  
  static <T, M extends Regression<T>> RegressionValidations<M> regression(int round, int k, T[] x, double[] y, BiFunction<T[],double[],M> trainer)
  
  Repeated cross validation of regression.
  
  Type Parameters:
  
  T - the data type of samples.
  
  M - the model type.
  
  Parameters:
  
  round - the number of rounds of repeated cross validation.
  
  k - k-fold cross validation.
  
  x - the samples.
  
  y - the response variable.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.
- regression
  
  static <M extends DataFrameRegression> RegressionValidations<M> regression(int round, int k, Formula formula, DataFrame data, BiFunction<Formula,DataFrame,M> trainer)
  
  Repeated cross validation of regression.
  
  Type Parameters:
  
  M - the model type.
  
  Parameters:
  
  round - the number of rounds of repeated cross validation.
  
  k - k-fold cross validation.
  
  formula - the model specification.
  
  data - the training/validation data.
  
  trainer - the lambda to train a model.
  
  Returns:
  
  the validation results.

Interface CrossValidation

Method Summary

Method Details

of

stratify

nonoverlap

classification

classification

classification

classification

stratify

stratify

stratify

stratify

regression

regression

regression

regression