Record Class SMOTETomek

java.lang.Object
java.lang.Record
smile.classification.resampling.SMOTETomek
Record Components:
data - the cleaned, balanced feature matrix.
labels - the corresponding class labels.

public record SMOTETomek(double[][] data, int[] labels) extends Record
SMOTETomek — combined over- and under-sampling.

SMOTETomek is a two-phase hybrid resampling strategy:

  1. Over-sampling (SMOTE) — synthetic minority class samples are generated using SMOTE, expanding the minority class and reducing the class imbalance ratio.
  2. Under-sampling (Tomek Links)TomekLinks cleaning is applied to the augmented dataset, removing the majority-class member of every Tomek link. This eliminates ambiguous and noisy boundary samples introduced by both the original dataset and the SMOTE synthesis step, resulting in a cleaner decision boundary.

The two phases are controlled independently through their respective SMOTE.Options and TomekLinks.Options.

Typical use

var smoteOpts = new SMOTE.Options(5, 1.0);
var tomekOpts = new TomekLinks.Options();
SMOTETomek result = SMOTETomek.fit(data, labels,
        new SMOTETomek.Options(smoteOpts, tomekOpts));

References

  1. G. E. A. P. A. Batista, R. C. Prati and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20–29, 2004.
  2. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16:321–357, 2002.
  3. I. Tomek. Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, 6:769–772, 1976.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    SMOTETomek hyperparameters.
  • Constructor Summary

    Constructors
    Constructor
    Description
    SMOTETomek(double[][] data, int[] labels)
    Creates an instance of a SMOTETomek record class.
  • Method Summary

    Modifier and Type
    Method
    Description
    double[][]
    Returns the value of the data record component.
    final boolean
    Indicates whether some other object is "equal to" this one.
    static SMOTETomek
    fit(double[][] data, int[] labels)
    Applies SMOTETomek with default SMOTETomek.Options.
    static SMOTETomek
    fit(double[][] data, int[] labels, SMOTETomek.Options options)
    Applies SMOTETomek to the given dataset.
    final int
    Returns a hash code value for this object.
    int[]
    Returns the value of the labels record component.
    int
    Returns the number of samples after resampling and cleaning.
    final String
    Returns a string representation of this record class.

    Methods inherited from class Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • SMOTETomek

      public SMOTETomek(double[][] data, int[] labels)
      Creates an instance of a SMOTETomek record class.
      Parameters:
      data - the value for the data record component
      labels - the value for the labels record component
  • Method Details

    • size

      public int size()
      Returns the number of samples after resampling and cleaning.
      Returns:
      the number of rows in data.
    • fit

      public static SMOTETomek fit(double[][] data, int[] labels)
      Applies SMOTETomek with default SMOTETomek.Options.
      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      Returns:
      a SMOTETomek record holding the balanced, cleaned dataset.
    • fit

      public static SMOTETomek fit(double[][] data, int[] labels, SMOTETomek.Options options)
      Applies SMOTETomek to the given dataset.

      Phase 1: SMOTE.fit(double[][], int[], SMOTE.Options) generates synthetic minority samples.
      Phase 2: TomekLinks.fit(double[][], int[], TomekLinks.Options) removes the majority-class member of every Tomek link in the augmented dataset.

      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      options - the hyperparameters.
      Returns:
      a SMOTETomek record holding the balanced, cleaned dataset.
      Throws:
      IllegalArgumentException - propagated from SMOTE or TomekLinks if the inputs are invalid.
    • toString

      public final String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • hashCode

      public final int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public final boolean equals(Object o)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared with Objects::equals(Object,Object).
      Specified by:
      equals in class Record
      Parameters:
      o - the object with which to compare
      Returns:
      true if this object is the same as the o argument; false otherwise.
    • data

      public double[][] data()
      Returns the value of the data record component.
      Returns:
      the value of the data record component
    • labels

      public int[] labels()
      Returns the value of the labels record component.
      Returns:
      the value of the labels record component