Record Class BSO
- Record Components:
data- the augmented feature matrix (original + synthetic samples).labels- the augmented labels (original + synthetic sample labels).
BSO extends SMOTE by combining the borderline sample identification
from Borderline-SMOTE (Han et al., 2005) with a shifting synthesis
mechanism. Instead of interpolating between two minority samples, BSO shifts
each borderline minority sample toward the local majority centroid, placing
synthetic points directly in the critical region between the classes.
Borderline classification — for each minority sample xᵢ,
its m nearest neighbors in the entire dataset are found and
the number of majority neighbors mʼ is counted:
mʼ = m→ NOISE: entirely surrounded by majority; never used as a seed.m/2 ≤ mʼ < m→ DANGER: on the class boundary; the primary seed candidates for synthesis in bothDANGERandDANGER_AND_SAFE.mʼ < m/2→ SAFE: deep in the minority region; skipped inDANGER, used as secondary seeds at a reduced shift rate inDANGER_AND_SAFE.
Shifting mechanism — for a seed xᵢ, its kMaj nearest
majority neighbors are found and their centroid cᵢ is
computed. A synthetic sample is generated as:
x_syn = xᵢ + α · (cᵢ − xᵢ) + jitterwhere
α ∈ (0, 1) is the shift factor and jitter is a small
per-dimension noise scaled to a fraction of the shift magnitude to prevent
all samples from collapsing to the same point. SAFE seeds in
DANGER_AND_SAFE use α/2 to shift more conservatively.
When kMaj = 0, the shifting mechanism is disabled and the algorithm
falls back to standard Borderline-SMOTE interpolation: a synthetic
sample is placed at a random position on the line segment between the seed
and one of its randomly selected minority nearest neighbors.
If no DANGER samples exist, all minority samples are used as seeds (fallback identical to plain SMOTE but with the selected synthesis strategy).
Index selection
When the dimensionality d ≤ highDimThreshold (default 20) a
KDTree is used; otherwise a RandomProjectionForest is used.
Limitations
- Feature spaces must be entirely continuous (no categorical features).
- A large
αmay push synthetic samples deep into the majority region, potentially introducing ambiguous training examples.
References
- Malhat M.G., Elsobky A.M., Keshk A.E. et al. An approach for handling imbalanced datasets using borderline shifting. Sci Rep (2026).
- N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer.
- H. Han, W. Wang and B. Mao. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. ICIC, LNCS 3644, 878–887, 2005.
- N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16:321–357, 2002.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordBSO hyperparameters.static enumBSO strategy controlling which minority samples are used as seeds. -
Constructor Summary
ConstructorsConstructorDescriptionBSO(double[][] data, int[] labels) Creates an instance of aBSOrecord class. -
Method Summary
Modifier and TypeMethodDescriptiondouble[][]data()Returns the value of thedatarecord component.final booleanIndicates whether some other object is "equal to" this one.static BSOfit(double[][] data, int[] labels) Applies BSO to the given dataset with defaultBSO.Options.static BSOfit(double[][] data, int[] labels, BSO.Options options) Applies BSO to the given dataset.final inthashCode()Returns a hash code value for this object.int[]labels()Returns the value of thelabelsrecord component.intsize()Returns the total number of samples after resampling.final StringtoString()Returns a string representation of this record class.
-
Constructor Details
-
BSO
-
-
Method Details
-
size
public int size()Returns the total number of samples after resampling.- Returns:
- the number of rows in
data.
-
fit
Applies BSO to the given dataset with defaultBSO.Options.- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.- Returns:
- a
BSOinstance holding the augmented data and labels.
-
fit
Applies BSO to the given dataset.The minority class (label with the fewest occurrences) is identified automatically. Each minority sample is classified as NOISE, DANGER, or SAFE based on its neighborhood composition. Synthetic samples are then generated by shifting DANGER (and optionally SAFE) samples toward the local majority centroid.
- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.options- the hyperparameters.- Returns:
- a
BSOinstance holding the augmented data and labels. - Throws:
IllegalArgumentException- ifdataandlabelshave different lengths, if the input is empty, or if the minority class has fewer samples thanoptions.m() + 1.
-
toString
-
hashCode
-
equals
Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared withObjects::equals(Object,Object). -
data
-
labels
-