Class GAFE

java.lang.Object
smile.feature.selection.GAFE

public class GAFE extends Object
Genetic algorithm based feature selection. This method finds many (random) subsets of variables of expected classification power using a Genetic Algorithm. The "fitness" of each subset of variables is determined by its ability to classify the samples according to a given classification method. When many such subsets of variables are obtained, the one with the best performance may be used as selected features. Alternatively, the frequencies with which variables are selected may be analyzed further. The most frequently selected variables may be presumed to be the most relevant to sample distinction and are finally used for prediction. Although GA avoids brute-force search, it is still much slower than univariate feature selection.

References

  1. Leping Li and Clarice R. Weinberg. Gene Selection and Sample Classification Using a Genetic Algorithm/k-Nearest Neighbor Method.
  • Constructor Details

    • GAFE

      public GAFE()
      Constructor.
    • GAFE

      public GAFE(Selection selection, int elitism, Crossover crossover, double crossoverRate, double mutationRate)
      Constructor.
      Parameters:
      selection - the selection strategy.
      elitism - the number of best chromosomes to copy to new population.
      crossover - the strategy of crossover operation.
      crossoverRate - the crossover rate.
      mutationRate - the mutation rate.
  • Method Details

    • apply

      public BitString[] apply(int size, int generation, int length, Fitness<BitString> fitness)
      Genetic algorithm based feature selection for classification.
      Parameters:
      size - the population size of Genetic Algorithm.
      generation - the maximum number of iterations.
      length - the length of bit string, i.e. the number of features.
      fitness - the fitness function.
      Returns:
      bit strings of last generation.
    • fitness

      public static Fitness<BitString> fitness(double[][] x, int[] y, double[][] testx, int[] testy, ClassificationMetric metric, BiFunction<double[][],int[],Classifier<double[]>> trainer)
      Returns the fitness of the classification model.
      Parameters:
      x - training samples.
      y - training labels.
      testx - testing samples.
      testy - testing labels.
      metric - classification metric.
      trainer - the lambda to train a model.
      Returns:
      the fitness of model.
    • fitness

      public static Fitness<BitString> fitness(double[][] x, double[] y, double[][] testx, double[] testy, RegressionMetric metric, BiFunction<double[][],double[],Regression<double[]>> trainer)
      Returns the fitness of the regression model.
      Parameters:
      x - training samples.
      y - training response.
      testx - testing samples.
      testy - testing response.
      metric - classification metric.
      trainer - the lambda to train a model.
      Returns:
      the fitness of model.
    • fitness

      public static Fitness<BitString> fitness(String y, DataFrame train, DataFrame test, ClassificationMetric metric, BiFunction<Formula,DataFrame,DataFrameClassifier> trainer)
      Returns the fitness of the classification model.
      Parameters:
      y - the column name of class labels.
      train - training data.
      test - testing data.
      metric - classification metric.
      trainer - the lambda to train a model.
      Returns:
      the fitness of model.
    • fitness

      public static Fitness<BitString> fitness(String y, DataFrame train, DataFrame test, RegressionMetric metric, BiFunction<Formula,DataFrame,DataFrameRegression> trainer)
      Returns the fitness of the regression model.
      Parameters:
      y - the column name of response variable.
      train - training data.
      test - testing data.
      metric - classification metric.
      trainer - the lambda to train a model.
      Returns:
      the fitness of model.