Record Class SMOTETomek
java.lang.Object
java.lang.Record
smile.classification.resampling.SMOTETomek
- Record Components:
data- the cleaned, balanced feature matrix.labels- the corresponding class labels.
SMOTETomek — combined over- and under-sampling.
SMOTETomek is a two-phase hybrid resampling strategy:
- Over-sampling (SMOTE) — synthetic minority class samples are
generated using
SMOTE, expanding the minority class and reducing the class imbalance ratio. - Under-sampling (Tomek Links) —
TomekLinkscleaning is applied to the augmented dataset, removing the majority-class member of every Tomek link. This eliminates ambiguous and noisy boundary samples introduced by both the original dataset and the SMOTE synthesis step, resulting in a cleaner decision boundary.
The two phases are controlled independently through their respective
SMOTE.Options and TomekLinks.Options.
Typical use
var smoteOpts = new SMOTE.Options(5, 1.0);
var tomekOpts = new TomekLinks.Options();
SMOTETomek result = SMOTETomek.fit(data, labels,
new SMOTETomek.Options(smoteOpts, tomekOpts));
References
- G. E. A. P. A. Batista, R. C. Prati and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20–29, 2004.
- N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16:321–357, 2002.
- I. Tomek. Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, 6:769–772, 1976.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordSMOTETomek hyperparameters. -
Constructor Summary
ConstructorsConstructorDescriptionSMOTETomek(double[][] data, int[] labels) Creates an instance of aSMOTETomekrecord class. -
Method Summary
Modifier and TypeMethodDescriptiondouble[][]data()Returns the value of thedatarecord component.final booleanIndicates whether some other object is "equal to" this one.static SMOTETomekfit(double[][] data, int[] labels) Applies SMOTETomek with defaultSMOTETomek.Options.static SMOTETomekfit(double[][] data, int[] labels, SMOTETomek.Options options) Applies SMOTETomek to the given dataset.final inthashCode()Returns a hash code value for this object.int[]labels()Returns the value of thelabelsrecord component.intsize()Returns the number of samples after resampling and cleaning.final StringtoString()Returns a string representation of this record class.
-
Constructor Details
-
SMOTETomek
-
-
Method Details
-
size
public int size()Returns the number of samples after resampling and cleaning.- Returns:
- the number of rows in
data.
-
fit
Applies SMOTETomek with defaultSMOTETomek.Options.- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.- Returns:
- a
SMOTETomekrecord holding the balanced, cleaned dataset.
-
fit
Applies SMOTETomek to the given dataset.Phase 1:
SMOTE.fit(double[][], int[], SMOTE.Options)generates synthetic minority samples.
Phase 2:TomekLinks.fit(double[][], int[], TomekLinks.Options)removes the majority-class member of every Tomek link in the augmented dataset.- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.options- the hyperparameters.- Returns:
- a
SMOTETomekrecord holding the balanced, cleaned dataset. - Throws:
IllegalArgumentException- propagated fromSMOTEorTomekLinksif the inputs are invalid.
-
toString
-
hashCode
-
equals
Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared withObjects::equals(Object,Object). -
data
-
labels
-