Class SimpleImputer

java.lang.Object
smile.feature.imputation.SimpleImputer
All Implemented Interfaces:
Serializable, Function<Tuple,Tuple>, Transform

public class SimpleImputer extends Object implements Transform
Simple algorithm replaces missing values with the constant value along each column.
See Also:
  • Constructor Details Link icon

    • SimpleImputer Link icon

      public SimpleImputer(Map<String,Object> values)
      Constructor.
      Parameters:
      values - the map of column name to the constant value.
  • Method Details Link icon

    • hasMissing Link icon

      public static boolean hasMissing(Tuple x)
      Return true if the tuple x has missing values.
      Parameters:
      x - a tuple.
      Returns:
      true if the tuple x has missing values.
    • apply Link icon

      public Tuple apply(Tuple x)
      Specified by:
      apply in interface Function<Tuple,Tuple>
    • apply Link icon

      public DataFrame apply(DataFrame data)
      Description copied from interface: Transform
      Applies this transform to the given argument.
      Specified by:
      apply in interface Transform
      Parameters:
      data - the input data frame.
      Returns:
      the transformed data frame.
    • toString Link icon

      public String toString()
      Overrides:
      toString in class Object
    • fit Link icon

      public static SimpleImputer fit(DataFrame data, String... columns)
      Fits the missing value imputation values. Impute all the numeric columns with median, boolean/nominal columns with mode, and text columns with empty string.
      Parameters:
      data - the training data.
      columns - the columns to impute. If empty, impute all the applicable columns.
      Returns:
      the imputer.
    • fit Link icon

      public static SimpleImputer fit(DataFrame data, double lower, double upper, String... columns)
      Fits the missing value imputation values. Impute all the numeric columns with the mean of values in the range [lower, upper], boolean/nominal columns with mode, and text columns with empty string.
      Parameters:
      data - the training data.
      lower - the lower limit in terms of percentiles of the original distribution (e.g. 5th percentile).
      upper - the upper limit in terms of percentiles of the original distribution (e.g. 95th percentile).
      columns - the columns to impute. If empty, impute all the applicable columns.
      Returns:
      the imputer.
    • impute Link icon

      public static double[][] impute(double[][] data)
      Impute the missing values with column averages.
      Parameters:
      data - data with missing values.
      Returns:
      the imputed data.
      Throws:
      IllegalArgumentException - when the whole row or column is missing.