smile.classification.resampling.SVMSMOTE

Record Components:: data - the augmented feature matrix (original + synthetic samples).; labels - the augmented labels (original + synthetic sample labels).

public record SVMSMOTE(double[][] data, int[] labels) extends Record

SVM-SMOTE — Support Vector Machine guided Synthetic Minority Over-sampling.

SVM-SMOTE is a variant of SMOTE that uses an SVM classifier to identify the most informative minority class samples for synthesis. Rather than interpolating between arbitrary minority pairs, synthesis is restricted to support vectors — the minority samples closest to the decision boundary — which are the hardest to classify and the most likely to benefit from additional training data near the margin.

The algorithm proceeds as follows:

Encode the minority class as +1 and all other classes as -1, then train a binary SVM on the full dataset.
Identify minority support vectors: minority samples whose signed decision function value satisfies |score(x)| <= 1 + m_factor * (1 − 1/C), i.e. samples inside or close to the margin band. If no minority support vectors are found, all minority samples are used as seeds.
For each selected seed, find its k nearest neighbors within the minority class. Then interpolate to produce a synthetic sample, choosing the direction depending on whether the randomly selected neighbor is a support vector:
- If the neighbor is also a support vector, the synthetic sample is placed randomly between the seed and the neighbor (standard SMOTE interpolation).
- If the neighbor is not a support vector, the synthetic sample is placed randomly between the seed and a point extrapolated away from the interior, pushing synthesis toward the boundary. Specifically the gap is in [0, 0.5) so the sample stays within the safe zone.

Index selection
When the input dimensionality d <= highDimThreshold (default 20), a KDTree is used for exact k-NN search; otherwise a RandomProjectionForest is used.

Limitations

Feature spaces must be entirely continuous (no categorical features).
Training an SVM adds non-trivial overhead compared to plain SMOTE.
SVM performance depends on the choice of kernel and its parameters.

References

H. M. Nguyen, E. W. Cooper and K. Kamei. Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), 2011.
G. E. A. P. A. Batista, R. C. Prati and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20–29, 2004.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

SVMSMOTE.Options

SVM-SMOTE hyperparameters.
Constructor Summary

Constructors

Constructor

Description

SVMSMOTE(double[][] data, int[] labels)

Creates an instance of a SVMSMOTE record class.
Method Summary

Modifier and Type

Method

Description

double[][]

data()

Returns the value of the data record component.

final boolean

equals(Object o)

Indicates whether some other object is "equal to" this one.

static SVMSMOTE

fit(double[][] data, int[] labels)

Applies SVM-SMOTE to the given dataset with default SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.

static SVMSMOTE

fit(double[][] data, int[] labels, SVMSMOTE.Options options)

Applies SVM-SMOTE to the given dataset with the given SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.

static SVMSMOTE

fit(double[][] data, int[] labels, SVMSMOTE.Options options, MercerKernel<double[]> kernel)

Applies SVM-SMOTE to the given dataset.

final int

hashCode()

Returns a hash code value for this object.

int[]

labels()

Returns the value of the labels record component.

int

size()

Returns the total number of samples after resampling.

final String

toString()

Returns a string representation of this record class.

Methods inherited from class Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Details
- SVMSMOTE
  
  public SVMSMOTE(double[][] data, int[] labels)
  
  Creates an instance of a SVMSMOTE record class.
  
  Parameters:
  
  data - the value for the data record component
  
  labels - the value for the labels record component
Method Details
- size
  
  public int size()
  
  Returns the total number of samples after resampling.
  
  Returns:
  
  the number of rows in data.
- fit
  
  public static SVMSMOTE fit(double[][] data, int[] labels)
  
  Applies SVM-SMOTE to the given dataset with default SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.
  
  Parameters:
  
  data - the input feature matrix; each row is an observation.
  
  labels - the class labels corresponding to each row of data.
  
  Returns:
  
  an SVMSMOTE instance holding the augmented data and labels.
- fit
  
  public static SVMSMOTE fit(double[][] data, int[] labels, SVMSMOTE.Options options)
  
  Applies SVM-SMOTE to the given dataset with the given SVMSMOTE.Options and a Gaussian (RBF) kernel with sigma = 1.
  
  Parameters:
  
  data - the input feature matrix; each row is an observation.
  
  labels - the class labels corresponding to each row of data.
  
  options - the hyperparameters.
  
  Returns:
  
  an SVMSMOTE instance holding the augmented data and labels.
- fit
  
  public static SVMSMOTE fit(double[][] data, int[] labels, SVMSMOTE.Options options, MercerKernel<double[]> kernel)
  
  Applies SVM-SMOTE to the given dataset.
  The minority class (label with the fewest occurrences) is identified automatically. An SVM is trained with the minority class as +1 and all other classes as -1. Minority support vectors (samples near the decision boundary) are used as seeds for SMOTE interpolation.
  
  Parameters:
  
  data - the input feature matrix; each row is an observation.
  
  labels - the class labels corresponding to each row of data.
  
  options - the hyperparameters.
  
  kernel - the SVM kernel function.
  
  Returns:
  
  an SVMSMOTE instance holding the augmented data and labels.
  
  Throws:
  
  IllegalArgumentException - if data and labels have different lengths, if the input is empty, or if the minority class has fewer samples than options.k().
- toString
  
  public final String toString()
  
  Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
  
  Specified by:
  
  toString in class Record
  
  Returns:
  
  a string representation of this object
- hashCode
  
  public final int hashCode()
  
  Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
  
  Specified by:
  
  hashCode in class Record
  
  Returns:
  
  a hash code value for this object
- equals
  
  public final boolean equals(Object o)
  
  Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared with Objects::equals(Object,Object).
  
  Specified by:
  
  equals in class Record
  
  Parameters:
  
  o - the object with which to compare
  
  Returns:
  
  true if this object is the same as the o argument; false otherwise.
- data
  
  public double[][] data()
  
  Returns the value of the data record component.
  
  Returns:
  
  the value of the data record component
- labels
  
  public int[] labels()
  
  Returns the value of the labels record component.
  
  Returns:
  
  the value of the labels record component

Record Class SVMSMOTE

References

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Details

SVMSMOTE

Method Details

size

fit

fit

fit

toString

hashCode

equals

data

labels