Record Class BSO

java.lang.Object
java.lang.Record
smile.classification.resampling.BSO
Record Components:
data - the augmented feature matrix (original + synthetic samples).
labels - the augmented labels (original + synthetic sample labels).

public record BSO(double[][] data, int[] labels) extends Record
Borderline Shifting Oversampling (BSO).

BSO extends SMOTE by combining the borderline sample identification from Borderline-SMOTE (Han et al., 2005) with a shifting synthesis mechanism. Instead of interpolating between two minority samples, BSO shifts each borderline minority sample toward the local majority centroid, placing synthetic points directly in the critical region between the classes.

Borderline classification — for each minority sample xᵢ, its m nearest neighbors in the entire dataset are found and the number of majority neighbors is counted:

  • mʼ = mNOISE: entirely surrounded by majority; never used as a seed.
  • m/2 ≤ mʼ < mDANGER: on the class boundary; the primary seed candidates for synthesis in both DANGER and DANGER_AND_SAFE.
  • mʼ < m/2SAFE: deep in the minority region; skipped in DANGER, used as secondary seeds at a reduced shift rate in DANGER_AND_SAFE.

Shifting mechanism — for a seed xᵢ, its kMaj nearest majority neighbors are found and their centroid cᵢ is computed. A synthetic sample is generated as:

  x_syn = xᵢ + α · (cᵢ − xᵢ) + jitter
where α ∈ (0, 1) is the shift factor and jitter is a small per-dimension noise scaled to a fraction of the shift magnitude to prevent all samples from collapsing to the same point. SAFE seeds in DANGER_AND_SAFE use α/2 to shift more conservatively.

When kMaj = 0, the shifting mechanism is disabled and the algorithm falls back to standard Borderline-SMOTE interpolation: a synthetic sample is placed at a random position on the line segment between the seed and one of its randomly selected minority nearest neighbors.

If no DANGER samples exist, all minority samples are used as seeds (fallback identical to plain SMOTE but with the selected synthesis strategy).

Index selection
When the dimensionality d ≤ highDimThreshold (default 20) a KDTree is used; otherwise a RandomProjectionForest is used.

Limitations

  • Feature spaces must be entirely continuous (no categorical features).
  • A large α may push synthetic samples deep into the majority region, potentially introducing ambiguous training examples.

References

  1. Malhat M.G., Elsobky A.M., Keshk A.E. et al. An approach for handling imbalanced datasets using borderline shifting. Sci Rep (2026).
  2. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer.
  3. H. Han, W. Wang and B. Mao. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. ICIC, LNCS 3644, 878–887, 2005.
  4. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16:321–357, 2002.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    BSO hyperparameters.
    static enum 
    BSO strategy controlling which minority samples are used as seeds.
  • Constructor Summary

    Constructors
    Constructor
    Description
    BSO(double[][] data, int[] labels)
    Creates an instance of a BSO record class.
  • Method Summary

    Modifier and Type
    Method
    Description
    double[][]
    Returns the value of the data record component.
    final boolean
    Indicates whether some other object is "equal to" this one.
    static BSO
    fit(double[][] data, int[] labels)
    Applies BSO to the given dataset with default BSO.Options.
    static BSO
    fit(double[][] data, int[] labels, BSO.Options options)
    Applies BSO to the given dataset.
    final int
    Returns a hash code value for this object.
    int[]
    Returns the value of the labels record component.
    int
    Returns the total number of samples after resampling.
    final String
    Returns a string representation of this record class.

    Methods inherited from class Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • BSO

      public BSO(double[][] data, int[] labels)
      Creates an instance of a BSO record class.
      Parameters:
      data - the value for the data record component
      labels - the value for the labels record component
  • Method Details

    • size

      public int size()
      Returns the total number of samples after resampling.
      Returns:
      the number of rows in data.
    • fit

      public static BSO fit(double[][] data, int[] labels)
      Applies BSO to the given dataset with default BSO.Options.
      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      Returns:
      a BSO instance holding the augmented data and labels.
    • fit

      public static BSO fit(double[][] data, int[] labels, BSO.Options options)
      Applies BSO to the given dataset.

      The minority class (label with the fewest occurrences) is identified automatically. Each minority sample is classified as NOISE, DANGER, or SAFE based on its neighborhood composition. Synthetic samples are then generated by shifting DANGER (and optionally SAFE) samples toward the local majority centroid.

      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      options - the hyperparameters.
      Returns:
      a BSO instance holding the augmented data and labels.
      Throws:
      IllegalArgumentException - if data and labels have different lengths, if the input is empty, or if the minority class has fewer samples than options.m() + 1.
    • toString

      public final String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • hashCode

      public final int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public final boolean equals(Object o)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared with Objects::equals(Object,Object).
      Specified by:
      equals in class Record
      Parameters:
      o - the object with which to compare
      Returns:
      true if this object is the same as the o argument; false otherwise.
    • data

      public double[][] data()
      Returns the value of the data record component.
      Returns:
      the value of the data record component
    • labels

      public int[] labels()
      Returns the value of the labels record component.
      Returns:
      the value of the labels record component