Record Class ADASYN

java.lang.Object
java.lang.Record
smile.classification.resampling.ADASYN
Record Components:
data - the augmented feature matrix (original + synthetic samples).
labels - the augmented labels (original + synthetic sample labels).

public record ADASYN(double[][] data, int[] labels) extends Record
Adaptive Synthetic Sampling (ADASYN).

ADASYN is an extension of SMOTE that adaptively generates synthetic minority class samples. Rather than producing the same number of synthetic samples for every minority instance, ADASYN concentrates synthesis where the local distribution is hardest to learn — i.e. minority instances that are surrounded by many majority neighbors receive more synthetic samples than those in denser minority regions.

The algorithm proceeds as follows:

  1. For each minority instance x_i, find its k nearest neighbors in the entire dataset (majority and minority).
  2. Compute the density ratio r_i = Δ_i / k, where Δ_i is the number of those neighbors that belong to the majority class.
  3. Normalize r_i so that the weights sum to 1: r̂_i = r_i / Σ r_i.
  4. For each minority instance x_i, generate g_i = round(r̂_i * G) synthetic samples, where G = (|majority| − |minority|) * ratio is the total number of samples to generate.
  5. Each synthetic sample is placed on the line segment between x_i and one of its minority-only nearest neighbors, at a uniformly random position (identical interpolation to SMOTE).

Instances whose density ratio r_i = 0 (entirely surrounded by minority neighbors) contribute no synthetic samples, so synthesis is automatically focused on the class boundary.

Index selection
When the input dimensionality d <= highDimThreshold (default 20), a KDTree is used for exact k-NN search. For higher dimensionality a RandomProjectionForest (approximate NN) is used instead, because k-d trees suffer from the curse of dimensionality.

Limitations

  • Feature spaces must be entirely continuous (no categorical features).
  • When all minority instances have r_i = 0 (perfectly separated classes), no synthetic samples are generated and a warning is logged.

References

  1. H. He, Y. Bai, E. A. Garcia, and S. Li. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. IJCNN, 2008.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    ADASYN hyperparameters.
  • Constructor Summary

    Constructors
    Constructor
    Description
    ADASYN(double[][] data, int[] labels)
    Creates an instance of a ADASYN record class.
  • Method Summary

    Modifier and Type
    Method
    Description
    double[][]
    Returns the value of the data record component.
    final boolean
    Indicates whether some other object is "equal to" this one.
    static ADASYN
    fit(double[][] data, int[] labels)
    Applies ADASYN to the given dataset with default ADASYN.Options.
    static ADASYN
    fit(double[][] data, int[] labels, ADASYN.Options options)
    Applies ADASYN to the given dataset.
    final int
    Returns a hash code value for this object.
    int[]
    Returns the value of the labels record component.
    int
    Returns the total number of samples after resampling.
    final String
    Returns a string representation of this record class.

    Methods inherited from class Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • ADASYN

      public ADASYN(double[][] data, int[] labels)
      Creates an instance of a ADASYN record class.
      Parameters:
      data - the value for the data record component
      labels - the value for the labels record component
  • Method Details

    • size

      public int size()
      Returns the total number of samples after resampling.
      Returns:
      the number of rows in data.
    • fit

      public static ADASYN fit(double[][] data, int[] labels)
      Applies ADASYN to the given dataset with default ADASYN.Options.
      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      Returns:
      an ADASYN instance holding the augmented data and labels.
    • fit

      public static ADASYN fit(double[][] data, int[] labels, ADASYN.Options options)
      Applies ADASYN to the given dataset.

      The minority class (label with the fewest occurrences) is identified automatically. Adaptive synthetic samples are generated and appended to the original dataset.

      Parameters:
      data - the input feature matrix; each row is an observation.
      labels - the class labels corresponding to each row of data.
      options - the hyperparameters.
      Returns:
      an ADASYN instance holding the augmented data and labels.
      Throws:
      IllegalArgumentException - if data and labels have different lengths, if the input is empty, or if the minority class has fewer samples than options.k().
    • toString

      public final String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • hashCode

      public final int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public final boolean equals(Object o)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared with Objects::equals(Object,Object).
      Specified by:
      equals in class Record
      Parameters:
      o - the object with which to compare
      Returns:
      true if this object is the same as the o argument; false otherwise.
    • data

      public double[][] data()
      Returns the value of the data record component.
      Returns:
      the value of the data record component
    • labels

      public int[] labels()
      Returns the value of the labels record component.
      Returns:
      the value of the labels record component