Record Class SMOTE
- Record Components:
data- the augmented feature matrix (original + synthetic samples).labels- the augmented labels (original + synthetic sample labels).
SMOTE addresses class imbalance by synthetically generating new minority
class samples rather than simply duplicating existing ones. For each
minority sample, the algorithm selects one of its k nearest
neighbors at random and places a new synthetic sample at a uniformly
random position along the line segment connecting the two points in
feature space.
The amount of over-sampling is controlled by the ratio parameter
in SMOTE.Options. A value of 1.0 doubles the minority class
(100% over-sampling), 2.0 triples it, and so on.
Index selection
When the input dimensionality d <= highDimThreshold (default 20),
a KDTree is used for exact k-nearest-neighbor search. For higher
dimensionality a RandomProjectionForest (approximate NN) is used
instead, because k-d trees suffer from the curse of dimensionality and
become no faster than a linear scan in high dimensions.
Limitations
- Feature spaces must be entirely continuous (no categorical features).
- SMOTE can introduce noise when minority samples already overlap heavily with the majority class.
References
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionSMOTE(double[][] data, int[] labels) Creates an instance of aSMOTErecord class. -
Method Summary
Modifier and TypeMethodDescriptiondouble[][]data()Returns the value of thedatarecord component.final booleanIndicates whether some other object is "equal to" this one.static SMOTEfit(double[][] data, int[] labels) Applies SMOTE to the given dataset with defaultSMOTE.Options.static SMOTEfit(double[][] data, int[] labels, SMOTE.Options options) Applies SMOTE to the given dataset.final inthashCode()Returns a hash code value for this object.int[]labels()Returns the value of thelabelsrecord component.intsize()Returns the total number of samples after resampling.final StringtoString()Returns a string representation of this record class.
-
Constructor Details
-
SMOTE
-
-
Method Details
-
size
public int size()Returns the total number of samples after resampling.- Returns:
- the number of rows in
data.
-
fit
Applies SMOTE to the given dataset with defaultSMOTE.Options.- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.- Returns:
- a
SMOTEinstance holding the augmented data and labels.
-
fit
Applies SMOTE to the given dataset.The minority class (label with the fewest occurrences) is identified automatically. Synthetic samples are generated for it and appended to the original dataset.
- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.options- the hyperparameters.- Returns:
- a
SMOTEinstance holding the augmented data and labels. - Throws:
IllegalArgumentException- ifdataandlabelshave different lengths, if the input is empty, or if the minority class has fewer samples thanoptions.k().
-
toString
-
hashCode
-
equals
Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared withObjects::equals(Object,Object). -
data
-
labels
-