Record Class InformationValue

java.lang.Object
java.lang.Record
smile.feature.selection.InformationValue
Record Components:
feature - The feature name.
iv - The information value.
woe - The weight of evidence.
breaks - The breakpoints of intervals for numerical variables.
All Implemented Interfaces:
Comparable<InformationValue>

public record InformationValue(String feature, double iv, double[] woe, double[] breaks) extends Record implements Comparable<InformationValue>
Information Value (IV) measures the predictive strength of a feature for a binary dependent variable. IV is essentially a weighted sum of all the individual Weight of Evidence (WoE) values, where the weights incorporate the absolute difference between the numerator and the denominator (WoE captures the relative difference). Note that the weight follows the same sign as WoE hence ensuring that the IV is always a positive number.

IV is a good measure of the predictive power of a feature. It also helps point out the suspicious feature. Unlike other feature selection methods available, the features selected using IV might not be the best feature set for a non-linear model building.

Interpretation of Information Value
Information ValuePredictive power
<0.02Useless
0.02 to 0.1Weak predictors
0.1 to 0.3Medium Predictors
0.3 to 0.5Strong predictors
>0.5Suspicious
Weight of Evidence (WoE) measures the predictive power of every bin/category of a feature for a binary dependent variable. WoE is calculated as
 WoE = ln (percentage of events / percentage of non-events).
 
Note that the conditional log odds is exactly what a logistic regression model tries to predict.

WoE values of a categorical variable can be used to convert a categorical feature to a numerical feature. If a continuous feature does not have a linear relationship with the log odds, the feature can be binned into groups and a new feature created by replaced each bin with its WoE value. Therefore, WoE is a good variable transformation method for logistic regression.

On arranging a numerical feature in ascending order, if the WoE values are all linear, we know that the feature has the right linear relation with the target. However, if the feature's WoE is non-linear, we should either discard it or consider some other variable transformation to ensure the linearity. Hence, WoE helps check the linear relationship of a feature with its dependent variable to be used in the model. Though WoE and IV are highly useful, always ensure that it is only used with logistic regression.

WoE is better than on-hot encoding as it does not increase the complexity of the model.

  • Constructor Details

    • InformationValue

      public InformationValue(String feature, double iv, double[] woe, double[] breaks)
      Creates an instance of a InformationValue record class.
      Parameters:
      feature - the value for the feature record component
      iv - the value for the iv record component
      woe - the value for the woe record component
      breaks - the value for the breaks record component
  • Method Details

    • compareTo

      public int compareTo(InformationValue other)
      Specified by:
      compareTo in interface Comparable<InformationValue>
    • toString

      public String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • toString

      public static String toString(InformationValue[] ivs)
      Returns a string representation of the array of information values.
      Parameters:
      ivs - the array of information values.
      Returns:
      a string representation of information values
    • toTransform

      public static ColumnTransform toTransform(InformationValue[] values)
      Returns the data transformation that covert feature value to its weight of evidence.
      Parameters:
      values - the information value objects of features.
      Returns:
      the transform.
    • fit

      public static InformationValue[] fit(DataFrame data, String clazz)
      Calculates the information value.
      Parameters:
      data - the data frame of the explanatory and response variables.
      clazz - the column name of binary class labels.
      Returns:
      the information value.
    • fit

      public static InformationValue[] fit(DataFrame data, String clazz, int nbins)
      Calculates the information value.
      Parameters:
      data - the data frame of the explanatory and response variables.
      clazz - the column name of binary class labels.
      nbins - the number of bins to discretize numeric variables in WOE calculation.
      Returns:
      the information value.
    • hashCode

      public final int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public final boolean equals(Object o)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. Reference components are compared with Objects::equals(Object,Object); primitive components are compared with '=='.
      Specified by:
      equals in class Record
      Parameters:
      o - the object with which to compare
      Returns:
      true if this object is the same as the o argument; false otherwise.
    • feature

      public String feature()
      Returns the value of the feature record component.
      Returns:
      the value of the feature record component
    • iv

      public double iv()
      Returns the value of the iv record component.
      Returns:
      the value of the iv record component
    • woe

      public double[] woe()
      Returns the value of the woe record component.
      Returns:
      the value of the woe record component
    • breaks

      public double[] breaks()
      Returns the value of the breaks record component.
      Returns:
      the value of the breaks record component