Class AdjustedMutualInformation

java.lang.Object
smile.validation.metric.AdjustedMutualInformation
All Implemented Interfaces:
Serializable, ClusteringMetric

public class AdjustedMutualInformation extends Object implements ClusteringMetric
Adjusted Mutual Information (AMI) for comparing clustering. Like the Rand index, the baseline value of mutual information between two random clusterings does not take on a constant value, and tends to be larger when the two partitions have a larger number of clusters (with a fixed number of observations). AMI adopts a hypergeometric model of randomness to adjust for chance. The AMI takes a value of 1 when the two partitions are identical and 0 when the MI between two partitions equals the value expected due to chance alone.

WARNING: The computation of adjustment is is really really slow.

References

  1. X. Vinh, J. Epps, J. Bailey. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. JMLR, 2010.
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static enum 
    The normalization method.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    Default instance with max normalization.
    Default instance with min normalization.
    Default instance with sqrt normalization.
    Default instance with sum normalization.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    static double
    max(int[] y1, int[] y2)
    Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (max(H(y1), H(y2)) - E(MI)).
    static double
    min(int[] y1, int[] y2)
    Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (min(H(y1), H(y2)) - E(MI)).
    double
    score(int[] y1, int[] y2)
    Returns a score to measure the quality of clustering.
    static double
    sqrt(int[] y1, int[] y2)
    Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (sqrt(H(y1) * H(y2)) - E(MI)).
    static double
    sum(int[] y1, int[] y2)
    Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (0.5 * (H(y1) + H(y2)) - E(MI)).
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

  • Constructor Details

    • AdjustedMutualInformation

      public AdjustedMutualInformation(AdjustedMutualInformation.Method method)
      Constructor.
      Parameters:
      method - the normalization method.
  • Method Details

    • score

      public double score(int[] y1, int[] y2)
      Description copied from interface: ClusteringMetric
      Returns a score to measure the quality of clustering.
      Specified by:
      score in interface ClusteringMetric
      Parameters:
      y1 - the ground truth (or simply a clustering labels).
      y2 - the alternative cluster labels.
      Returns:
      the metric.
    • max

      public static double max(int[] y1, int[] y2)
      Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (max(H(y1), H(y2)) - E(MI)).
      Parameters:
      y1 - the clustering labels.
      y2 - the alternative cluster labels.
      Returns:
      the metric.
    • sum

      public static double sum(int[] y1, int[] y2)
      Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (0.5 * (H(y1) + H(y2)) - E(MI)).
      Parameters:
      y1 - the clustering labels.
      y2 - the alternative cluster labels.
      Returns:
      the metric.
    • sqrt

      public static double sqrt(int[] y1, int[] y2)
      Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (sqrt(H(y1) * H(y2)) - E(MI)).
      Parameters:
      y1 - the clustering labels.
      y2 - the alternative cluster labels.
      Returns:
      the metric.
    • min

      public static double min(int[] y1, int[] y2)
      Calculates the adjusted mutual information of (I(y1, y2) - E(MI)) / (min(H(y1), H(y2)) - E(MI)).
      Parameters:
      y1 - the clustering labels.
      y2 - the alternative cluster labels.
      Returns:
      the metric.
    • toString

      public String toString()
      Overrides:
      toString in class Object