Distance

The edge-counting distance between two concepts a and b is the number of edges on the shortest path through their lowest common ancestor (LCA):

    d(a, b) = depth(a) + depth(b) − 2 × depth(LCA(a, b))

Semantic Similarity

Three widely-used similarity measures from the computational linguistics literature are provided, all returning values in [0, 1] where 1 means identical.

Wu-Palmer (wup)

Based on the depth of the LCA relative to the depths of the two concepts:

sim(a,b) = 2 × depth(LCA) / (depth(a) + depth(b))

Leacock-Chodorow (lch)

Combines edge-counting distance with the overall depth of the taxonomy:

sim(a,b) = −log(d(a,b) / (2 × H))

where H is the height of the taxonomy. The raw value is in (0, log(2H)]; it is normalized to [0, 1] by dividing by log(2H).

Lin

An information-content-based measure. When no external corpus is available, depth in the taxonomy serves as a proxy for information content: IC(c) = −log((depth(c) + 1) / (H + 1)) where H is the tree height.

sim(a,b) = 2 × IC(LCA) / (IC(a) + IC(b))

Returns 1 when a == b.

References

Z. Wu and M. Palmer. Verb semantics and lexical selection. ACL, 1994.
C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database, 1998.
D. Lin. An information-theoretic definition of similarity. ICML, 1998.

See Also:

Constructor Summary

Constructors

Constructor

Description

TaxonomicDistance(Taxonomy taxonomy)

Constructor.
Method Summary

Modifier and Type

Method

Description

double

d(String x, String y)

Computes the edge-counting distance between two concepts identified by their keywords.

double

d(Concept x, Concept y)

Computes the edge-counting distance between two concepts.

double

leacockChodorow(String x, String y)

Computes the Leacock-Chodorow semantic similarity between two concepts, normalized to [0, 1].

double

leacockChodorow(Concept x, Concept y)

Computes the Leacock-Chodorow semantic similarity between two concept nodes.

double

lin(String x, String y)

Computes the Lin semantic similarity between two concepts using depth as a proxy for information content.

double

lin(Concept x, Concept y)

Computes the Lin semantic similarity between two concept nodes.

double

normalizedDistance(String x, String y)

Returns the normalized edge-counting distance in [0, 1].

String

toString()

double

wuPalmer(String x, String y)

Computes the Wu-Palmer semantic similarity between two concepts.

double

wuPalmer(Concept x, Concept y)

Computes the Wu-Palmer semantic similarity between two concept nodes.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface Distance
apply, applyAsDouble, pdist, pdist

Constructor Details
- TaxonomicDistance
  
  public TaxonomicDistance(Taxonomy taxonomy)
  
  Constructor.
  
  Parameters:
  
  taxonomy - the taxonomy that this distance is associated with.
Method Details
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- d
  
  public double d(String x, String y)
  
  Computes the edge-counting distance between two concepts identified by their keywords.
  
  Parameters:
  
  x - a concept keyword.
  
  y - the other concept keyword.
  
  Returns:
  
  the edge-counting distance.
  
  Throws:
  
  IllegalArgumentException - if either keyword is not in the taxonomy.
- d
  public double d(Concept x, Concept y)
  
  Computes the edge-counting distance between two concepts.
  d(a,b) = depth(a) + depth(b) − 2 × depth(LCA(a,b))
  
  Specified by:
  
  d in interface Distance<Concept>
  
  Parameters:
  
  x - an object.
  
  y - an object.
  
  Returns:
  
  the distance.
- normalizedDistance
  
  public double normalizedDistance(String x, String y)
  
  Returns the normalized edge-counting distance in [0, 1]. The raw distance is divided by the diameter of the taxonomy (the maximum possible distance between any two concepts = 2 × height). Returns 0 when the two concepts are identical, 1 when they are maximally far apart.
  
  Parameters:
  
  x - a concept keyword.
  
  y - the other concept keyword.
  
  Returns:
  
  the normalized distance in [0, 1].
- wuPalmer
  public double wuPalmer(String x, String y)
  
  Computes the Wu-Palmer semantic similarity between two concepts.
  sim(a,b) = 2 × depth(LCA) / (depth(a) + depth(b))
  Returns 1 when the two concepts are the same, and approaches 0 as they become more distantly related.
  
  Parameters:
  
  x - a concept keyword.
  
  y - the other concept keyword.
  
  Returns:
  
  the Wu-Palmer similarity in (0, 1].
- wuPalmer
  
  public double wuPalmer(Concept x, Concept y)
  
  Computes the Wu-Palmer semantic similarity between two concept nodes.
  
  Parameters:
  
  x - a concept.
  
  y - the other concept.
  
  Returns:
  
  the Wu-Palmer similarity in (0, 1].
- leacockChodorow
  public double leacockChodorow(String x, String y)
  
  Computes the Leacock-Chodorow semantic similarity between two concepts, normalized to [0, 1].
  raw = −log(d(a,b) / (2 × H)) norm = raw / log(2 × H) ∈ [0, 1]
  where H is the height of the taxonomy. Returns 1 when the two concepts are identical.
  
  Parameters:
  
  x - a concept keyword.
  
  y - the other concept keyword.
  
  Returns:
  
  the Leacock-Chodorow similarity in [0, 1].
- leacockChodorow
  
  public double leacockChodorow(Concept x, Concept y)
  
  Computes the Leacock-Chodorow semantic similarity between two concept nodes.
  
  Parameters:
  
  x - a concept.
  
  y - the other concept.
  
  Returns:
  
  the Leacock-Chodorow similarity in [0, 1].
- lin
  public double lin(String x, String y)
  
  Computes the Lin semantic similarity between two concepts using depth as a proxy for information content.
  Information content: IC(c) = −log((depth+1)/(H+1))
  sim(a,b) = 2 × IC(LCA) / (IC(a) + IC(b))
  Returns 1 when the two concepts are the same, and 0 when IC(a) + IC(b) == 0 (both at the root with H == 0).
  
  Parameters:
  
  x - a concept keyword.
  
  y - the other concept keyword.
  
  Returns:
  
  the Lin similarity in [0, 1].
- lin
  
  public double lin(Concept x, Concept y)
  
  Computes the Lin semantic similarity between two concept nodes.
  
  Parameters:
  
  x - a concept.
  
  y - the other concept.
  
  Returns:
  
  the Lin similarity in [0, 1].

Class TaxonomicDistance

Distance

Semantic Similarity

References

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface Distance

Constructor Details

TaxonomicDistance

Method Details

toString

d

d

normalizedDistance

wuPalmer

wuPalmer

leacockChodorow

leacockChodorow

lin

lin