Record Class SVMSMOTE

java.lang.Object
java.lang.Record
smile.classification.resampling.SVMSMOTE
Record Components:
data - the augmented feature matrix (original + synthetic samples).
labels - the augmented labels (original + synthetic sample labels).

public record SVMSMOTE(double[][] data, int[] labels) extends Record
SVM-SMOTE — Support Vector Machine guided Synthetic Minority Over-sampling.

SVM-SMOTE is a variant of SMOTE that uses an SVM classifier to identify the most informative minority class samples for synthesis. Rather than interpolating between arbitrary minority pairs, synthesis is restricted to support vectors — the minority samples closest to the decision boundary — which are the hardest to classify and the most likely to benefit from additional training data near the margin.

The algorithm proceeds as follows:

  1. Encode the minority class as +1 and all other classes as -1, then train a binary SVM on the full dataset.
  2. Identify minority support vectors: minority samples whose signed decision function value satisfies |score(x)| <= 1 + m_factor * (1 − 1/C), i.e. samples inside or close to the margin band. If no minority support vectors are found, all minority samples are used as seeds.
  3. For each selected seed, find its k nearest neighbors within the minority class. Then interpolate to produce a synthetic sample, choosing the direction depending on whether the randomly selected neighbor is a support vector:
    • If the neighbor is also a support vector, the synthetic sample is placed randomly between the seed and the neighbor (standard SMOTE interpolation).
    • If the neighbor is not a support vector, the synthetic sample is placed randomly between the seed and a point extrapolated away from the interior, pushing synthesis toward the boundary. Specifically the gap is in [0, 0.5) so the sample stays within the safe zone.

Index selection
When the input dimensionality d <= highDimThreshold (default 20), a KDTree is used for exact k-NN search; otherwise a RandomProjectionForest is used.

Limitations

  • Feature spaces must be entirely continuous (no categorical features).
  • Training an SVM adds non-trivial overhead compared to plain SMOTE.
  • SVM performance depends on the choice of kernel and its parameters.

References

  1. H. M. Nguyen, E. W. Cooper and K. Kamei. Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), 2011.
  2. G. E. A. P. A. Batista, R. C. Prati and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20–29, 2004.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    SVM-SMOTE hyperparameters.
  • Constructor Summary

    Constructors
    Constructor
    Description
    SVMSMOTE(double[][] data, int[] labels)
    Creates an instance of a SVMSMOTE record class.
  • Method Summary

    Modifier and Type
    Method
    Description
    double[][]
    Returns the value of the data record component.
    final boolean
    Indicates whether some other object is "equal to" this one.
    static SVMSMOTE
    fit(double[][] data, int[] labels)
    Applies SVM-SMOTE to the given dataset with default SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.
    static SVMSMOTE
    fit(double[][] data, int[] labels, SVMSMOTE.Options options)
    Applies SVM-SMOTE to the given dataset with the given SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.
    static SVMSMOTE
    fit(double[][] data, int[] labels, SVMSMOTE.Options options, MercerKernel<double[]> kernel)
    Applies SVM-SMOTE to the given dataset.
    final int
    Returns a hash code value for this object.
    int[]
    Returns the value of the labels record component.
    int
    Returns the total number of samples after resampling.
    final String
    Returns a string representation of this record class.

    Methods inherited from class Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • SVMSMOTE

      public SVMSMOTE(double[][] data, int[] labels)
      Creates an instance of a SVMSMOTE record class.
      Parameters:
      data - the value for the data record component
      labels - the value for the labels record component
  • Method Details

    • size

      public int size()
      Returns the total number of samples after resampling.
      Returns:
      the number of rows in data.
    • fit

      public static SVMSMOTE fit(double[][] data, int[] labels)
      Applies SVM-SMOTE to the given dataset with default SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.
      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      Returns:
      an SVMSMOTE instance holding the augmented data and labels.
    • fit

      public static SVMSMOTE fit(double[][] data, int[] labels, SVMSMOTE.Options options)
      Applies SVM-SMOTE to the given dataset with the given SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.
      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      options - the hyperparameters.
      Returns:
      an SVMSMOTE instance holding the augmented data and labels.
    • fit

      public static SVMSMOTE fit(double[][] data, int[] labels, SVMSMOTE.Options options, MercerKernel<double[]> kernel)
      Applies SVM-SMOTE to the given dataset.

      The minority class (label with the fewest occurrences) is identified automatically. An SVM is trained with the minority class as +1 and all other classes as -1. Minority support vectors (samples near the decision boundary) are used as seeds for SMOTE interpolation.

      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      options - the hyperparameters.
      kernel - the SVM kernel function.
      Returns:
      an SVMSMOTE instance holding the augmented data and labels.
      Throws:
      IllegalArgumentException - if data and labels have different lengths, if the input is empty, or if the minority class has fewer samples than options.k().
    • toString

      public final String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • hashCode

      public final int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public final boolean equals(Object o)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared with Objects::equals(Object,Object).
      Specified by:
      equals in class Record
      Parameters:
      o - the object with which to compare
      Returns:
      true if this object is the same as the o argument; false otherwise.
    • data

      public double[][] data()
      Returns the value of the data record component.
      Returns:
      the value of the data record component
    • labels

      public int[] labels()
      Returns the value of the labels record component.
      Returns:
      the value of the labels record component