Interface WinsorScaler
public interface WinsorScaler
Scales all numeric variables into the range [0, 1].
If the dataset has outliers, normalization will certainly scale
the "normal" data to a very small interval. In this case, the
Winsorization procedure should be applied: values greater than the
specified upper limit are replaced with the upper limit, and those
below the lower limit are replaced with the lower limit. Often, the
specified range is indicate in terms of percentiles of the original
distribution (like the 5th and 95th percentile).
-
Method Summary
Static MethodsModifier and TypeMethodDescriptionstatic InvertibleColumnTransformFits the data transformation.static InvertibleColumnTransformFits the data transformation with 5% lower limit and 95% upper limit.
-
Method Details
-
fit
Fits the data transformation with 5% lower limit and 95% upper limit.- Parameters:
data- the training data.columns- the columns to transform. If empty, transform all the numeric columns.- Returns:
- the transform.
-
fit
Fits the data transformation.Note: Quantiles are computed via
IQAgent, which produces approximate results; on very small datasets (< 20 rows) the approximation may deviate from exact sort-based quantiles.- Parameters:
data- the training data.lower- the lower limit in terms of percentiles of the original distribution (e.g. 0.05 for the 5th percentile). Must be in [0, 1) and strictly less thanupper.upper- the upper limit in terms of percentiles of the original distribution (e.g. 0.95 for the 95th percentile). Must be in (0, 1] and strictly greater thanlower.columns- the columns to transform. If empty, transform all the numeric columns.- Returns:
- the transform.
- Throws:
IllegalArgumentException- if the data frame is empty, iflower < 0,upper > 1,lower >= upper, or if a specified column is non-numeric.
-