Record Class ADASYN
- Record Components:
data- the augmented feature matrix (original + synthetic samples).labels- the augmented labels (original + synthetic sample labels).
ADASYN is an extension of SMOTE that adaptively generates synthetic
minority class samples. Rather than producing the same number of synthetic
samples for every minority instance, ADASYN concentrates synthesis where the
local distribution is hardest to learn — i.e. minority instances that are
surrounded by many majority neighbors receive more synthetic samples than
those in denser minority regions.
The algorithm proceeds as follows:
- For each minority instance
x_i, find itsknearest neighbors in the entire dataset (majority and minority). - Compute the density ratio
r_i = Δ_i / k, whereΔ_iis the number of those neighbors that belong to the majority class. - Normalize
r_iso that the weights sum to 1:r̂_i = r_i / Σ r_i. - For each minority instance
x_i, generateg_i = round(r̂_i * G)synthetic samples, whereG = (|majority| − |minority|) * ratiois the total number of samples to generate. - Each synthetic sample is placed on the line segment between
x_iand one of its minority-only nearest neighbors, at a uniformly random position (identical interpolation to SMOTE).
Instances whose density ratio r_i = 0 (entirely surrounded by
minority neighbors) contribute no synthetic samples, so synthesis is
automatically focused on the class boundary.
Index selection
When the input dimensionality d <= highDimThreshold (default 20),
a KDTree is used for exact k-NN search. For higher dimensionality
a RandomProjectionForest (approximate NN) is used instead, because
k-d trees suffer from the curse of dimensionality.
Limitations
- Feature spaces must be entirely continuous (no categorical features).
- When all minority instances have
r_i = 0(perfectly separated classes), no synthetic samples are generated and a warning is logged.
References
- H. He, Y. Bai, E. A. Garcia, and S. Li. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. IJCNN, 2008.
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionADASYN(double[][] data, int[] labels) Creates an instance of aADASYNrecord class. -
Method Summary
Modifier and TypeMethodDescriptiondouble[][]data()Returns the value of thedatarecord component.final booleanIndicates whether some other object is "equal to" this one.static ADASYNfit(double[][] data, int[] labels) Applies ADASYN to the given dataset with defaultADASYN.Options.static ADASYNfit(double[][] data, int[] labels, ADASYN.Options options) Applies ADASYN to the given dataset.final inthashCode()Returns a hash code value for this object.int[]labels()Returns the value of thelabelsrecord component.intsize()Returns the total number of samples after resampling.final StringtoString()Returns a string representation of this record class.
-
Constructor Details
-
ADASYN
-
-
Method Details
-
size
public int size()Returns the total number of samples after resampling.- Returns:
- the number of rows in
data.
-
fit
Applies ADASYN to the given dataset with defaultADASYN.Options.- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.- Returns:
- an
ADASYNinstance holding the augmented data and labels.
-
fit
Applies ADASYN to the given dataset.The minority class (label with the fewest occurrences) is identified automatically. Adaptive synthetic samples are generated and appended to the original dataset.
- Parameters:
data- the input feature matrix; each row is an observation.labels- the class labels corresponding to each row ofdata.options- the hyperparameters.- Returns:
- an
ADASYNinstance holding the augmented data and labels. - Throws:
IllegalArgumentException- ifdataandlabelshave different lengths, if the input is empty, or if the minority class has fewer samples thanoptions.k().
-
toString
-
hashCode
-
equals
Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared withObjects::equals(Object,Object). -
data
-
labels
-