Package smile.classification.resampling
In the real world, class-imbalanced datasets are far more common than class-balanced datasets. For example, in a dataset of credit card transactions, fraudulent purchases might make up less than 0.1% of the examples. Similarly, in a medical diagnosis dataset, the number of patients with a rare virus might be less than 0.01% of the total examples.
Training machine learning models on imbalanced datasets causes significant bias toward the majority class, leading to high overall accuracy but poor detection of critical minority class instances (e.g., fraud or disease).
Resampling techniques address imbalanced datasets by balancing class distributions. Both oversampling and undersampling involve introducing a bias to select more samples from one class than from another. Oversampling is generally employed more frequently than undersampling.
-
ClassDescriptionAdaptive Synthetic Sampling (ADASYN).ADASYN hyperparameters.Borderline Shifting Oversampling (BSO).BSO hyperparameters.BSO strategy controlling which minority samples are used as seeds.Synthetic Minority Over-sampling Technique (SMOTE).SMOTE hyperparameters.SMOTETomek — combined over- and under-sampling.SMOTETomek hyperparameters.SVM-SMOTE — Support Vector Machine guided Synthetic Minority Over-sampling.SVM-SMOTE hyperparameters.Tomek Links under-sampling.TomekLinks hyperparameters.