Package smile.feature.selection


package smile.feature.selection
Feature selection. Feature selection is the technique of selecting a subset of relevant features for building robust learning models. By removing most irrelevant and redundant features from the data, feature selection helps improve the performance of learning models by alleviating the effect of the curse of dimensionality, enhancing generalization capability, speeding up learning process, etc. More importantly, feature selection also helps researchers to acquire better understanding about the data.

Feature selection algorithms typically fall into two categories: feature ranking and subset selection. Feature ranking ranks the features by a metric and eliminates all features that do not achieve an adequate score. Subset selection searches the set of possible features for the optimal subset. Clearly, an exhaustive search of optimal subset is impractical if large numbers of features are available. Commonly, heuristic methods such as genetic algorithms are employed for subset selection.

  • Classes
    Class
    Description
    Genetic algorithm based feature selection.
    Information Value (IV) measures the predictive strength of a feature for a binary dependent variable.
    The signal-to-noise (S2N) metric ratio is a univariate feature ranking metric, which can be used as a feature selection criterion for binary classification problems.
    The ratio of between-groups to within-groups sum of squares is a univariate feature ranking metric, which can be used as a feature selection criterion for multi-class classification problems.