Class SimpleImputer

java.lang.Object
smile.feature.imputation.SimpleImputer
All Implemented Interfaces:
Serializable, Function<Tuple,Tuple>, Transform

public class SimpleImputer extends Object implements Transform
Simple algorithm replaces missing values with the constant value along each column.
See Also:
  • Constructor Details

    • SimpleImputer

      public SimpleImputer(Map<String,Object> values)
      Constructor.
      Parameters:
      values - the map of column name to the constant value.
  • Method Details

    • hasMissing

      public static boolean hasMissing(Tuple x)
      Return true if the tuple x has missing values.
      Parameters:
      x - a tuple.
      Returns:
      true if the tuple x has missing values.
    • apply

      public Tuple apply(Tuple x)
      Specified by:
      apply in interface Function<Tuple,Tuple>
    • apply

      public DataFrame apply(DataFrame data)
      Description copied from interface: Transform
      Applies this transform to the given argument.
      Specified by:
      apply in interface Transform
      Parameters:
      data - the input data frame.
      Returns:
      the transformed data frame.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • fit

      public static SimpleImputer fit(DataFrame data, String... columns)
      Fits the missing value imputation values. Impute all the numeric columns with median, boolean/nominal columns with mode, and text columns with empty string.
      Parameters:
      data - the training data.
      columns - the columns to impute. If empty, impute all the applicable columns.
      Returns:
      the imputer.
    • fit

      public static SimpleImputer fit(DataFrame data, double lower, double upper, String... columns)
      Fits the missing value imputation values. Impute all the numeric columns with the mean of values in the range [lower, upper], boolean/nominal columns with mode, and text columns with empty string.
      Parameters:
      data - the training data.
      lower - the lower limit in terms of percentiles of the original distribution (e.g. 5th percentile).
      upper - the upper limit in terms of percentiles of the original distribution (e.g. 95th percentile).
      columns - the columns to impute. If empty, impute all the applicable columns.
      Returns:
      the imputer.
    • impute

      public static double[][] impute(double[][] data)
      Impute the missing values with column averages.
      Parameters:
      data - data with missing values.
      Returns:
      the imputed data.
      Throws:
      IllegalArgumentException - when the whole row or column is missing.