Interface Standardizer


public interface Standardizer
Standardizes numeric feature to 0 mean and unit variance. Standardization makes an assumption that the data follows a Gaussian distribution and are also not robust when outliers present. A robust alternative is to subtract the median and divide by the IQR by RobustStandardizer.

The standard deviation is computed with the sample formula (N−1 denominator). For a constant column (stdev = 0), the scale falls back to 1.0 so that the output is simply x - mean (all zeros for training data). A single-row data frame is treated the same way.

  • Method Details

    • fit

      static InvertibleColumnTransform fit(DataFrame data, String... columns)
      Fits the data transformation.
      Parameters:
      data - the training data.
      columns - the columns to transform. If empty, transform all the numeric columns.
      Returns:
      the transform.
      Throws:
      IllegalArgumentException - if the data frame is empty or a specified column is non-numeric.