Record Class SMOTE

java.lang.Object
java.lang.Record
smile.classification.resampling.SMOTE
Record Components:
data - the augmented feature matrix (original + synthetic samples).
labels - the augmented labels (original + synthetic sample labels).

public record SMOTE(double[][] data, int[] labels) extends Record
Synthetic Minority Over-sampling Technique (SMOTE).

SMOTE addresses class imbalance by synthetically generating new minority class samples rather than simply duplicating existing ones. For each minority sample, the algorithm selects one of its k nearest neighbors at random and places a new synthetic sample at a uniformly random position along the line segment connecting the two points in feature space.

The amount of over-sampling is controlled by the ratio parameter in SMOTE.Options. A value of 1.0 doubles the minority class (100% over-sampling), 2.0 triples it, and so on.

Index selection
When the input dimensionality d <= highDimThreshold (default 20), a KDTree is used for exact k-nearest-neighbor search. For higher dimensionality a RandomProjectionForest (approximate NN) is used instead, because k-d trees suffer from the curse of dimensionality and become no faster than a linear scan in high dimensions.

Limitations

  • Feature spaces must be entirely continuous (no categorical features).
  • SMOTE can introduce noise when minority samples already overlap heavily with the majority class.

References

  1. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    SMOTE hyperparameters.
  • Constructor Summary

    Constructors
    Constructor
    Description
    SMOTE(double[][] data, int[] labels)
    Creates an instance of a SMOTE record class.
  • Method Summary

    Modifier and Type
    Method
    Description
    double[][]
    Returns the value of the data record component.
    final boolean
    Indicates whether some other object is "equal to" this one.
    static SMOTE
    fit(double[][] data, int[] labels)
    Applies SMOTE to the given dataset with default SMOTE.Options.
    static SMOTE
    fit(double[][] data, int[] labels, SMOTE.Options options)
    Applies SMOTE to the given dataset.
    final int
    Returns a hash code value for this object.
    int[]
    Returns the value of the labels record component.
    int
    Returns the total number of samples after resampling.
    final String
    Returns a string representation of this record class.

    Methods inherited from class Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • SMOTE

      public SMOTE(double[][] data, int[] labels)
      Creates an instance of a SMOTE record class.
      Parameters:
      data - the value for the data record component
      labels - the value for the labels record component
  • Method Details

    • size

      public int size()
      Returns the total number of samples after resampling.
      Returns:
      the number of rows in data.
    • fit

      public static SMOTE fit(double[][] data, int[] labels)
      Applies SMOTE to the given dataset with default SMOTE.Options.
      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      Returns:
      a SMOTE instance holding the augmented data and labels.
    • fit

      public static SMOTE fit(double[][] data, int[] labels, SMOTE.Options options)
      Applies SMOTE to the given dataset.

      The minority class (label with the fewest occurrences) is identified automatically. Synthetic samples are generated for it and appended to the original dataset.

      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      options - the hyperparameters.
      Returns:
      a SMOTE instance holding the augmented data and labels.
      Throws:
      IllegalArgumentException - if data and labels have different lengths, if the input is empty, or if the minority class has fewer samples than options.k().
    • toString

      public final String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • hashCode

      public final int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public final boolean equals(Object o)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared with Objects::equals(Object,Object).
      Specified by:
      equals in class Record
      Parameters:
      o - the object with which to compare
      Returns:
      true if this object is the same as the o argument; false otherwise.
    • data

      public double[][] data()
      Returns the value of the data record component.
      Returns:
      the value of the data record component
    • labels

      public int[] labels()
      Returns the value of the labels record component.
      Returns:
      the value of the labels record component