Class Taxonomy

java.lang.Object
smile.nlp.taxonomy.Taxonomy

public class Taxonomy extends Object
A taxonomy is a tree of terms (aka concept) where leaves must be named but intermediary nodes can be anonymous.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Taxonomy(String... rootConcept)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    bfs()
    Returns all concept nodes in breadth-first order starting from the root.
    boolean
    contains(String keyword)
    Returns true if the given keyword exists in the taxonomy.
    int
    depth(String keyword)
    Returns the depth of a concept in the taxonomy (root has depth 0).
    void
    Visits every concept node in depth-first pre-order, passing each node to the given consumer.
    getConcept(String keyword)
    Returns the concept node which synset contains the keyword.
    Returns all named concepts in the taxonomy.
    Returns the root node of taxonomy tree.
    int
    Returns the height of the taxonomy tree, i.e.
    boolean
    isAncestor(String ancestor, String descendant)
    Returns true if concept ancestor is an ancestor of concept descendant in the taxonomy.
    boolean
    isDescendant(String descendant, String ancestor)
    Returns true if concept descendant is a descendant of concept ancestor in the taxonomy.
    Returns all leaf concept nodes in the taxonomy (nodes with no children).
    level(int level)
    Returns all concept nodes at the given depth level (root is level 0).
    Returns the lowest common ancestor (LCA) of concepts v and w.
    Returns the lowest common ancestor (LCA) of concepts v and w.
    int
    Returns the total number of concept nodes (anonymous or named) in the taxonomy tree, including the root.
    static Taxonomy
    of(String text)
    Parses a taxonomy from a simple indented-text format.
    Returns the shortest path between two concepts as an ordered list of concept nodes, from v to w (both endpoints inclusive).
    Returns the shortest path between two concept nodes as an ordered list, from v to w (both endpoints inclusive).
    int
    Returns the total number of named concepts (keywords) in the taxonomy.
    subtree(String keyword)
    Returns all keywords in the sub-tree rooted at the given concept (i.e.
    Returns a multi-line string that prints the taxonomy tree using Unicode box-drawing characters, e.g.:

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • Taxonomy

      public Taxonomy(String... rootConcept)
      Constructor.
      Parameters:
      rootConcept - the keyword of root concept.
  • Method Details

    • getRoot

      public Concept getRoot()
      Returns the root node of taxonomy tree.
      Returns:
      root node.
    • getConcept

      public Concept getConcept(String keyword)
      Returns the concept node which synset contains the keyword.
      Parameters:
      keyword - the keyword.
      Returns:
      the concept node which synset contains the keyword.
    • getConcepts

      public List<String> getConcepts()
      Returns all named concepts in the taxonomy.
      Returns:
      all named concepts.
    • lowestCommonAncestor

      public Concept lowestCommonAncestor(String v, String w)
      Returns the lowest common ancestor (LCA) of concepts v and w. The lowest common ancestor is defined between two nodes v and w as the lowest node that has both v and w as descendants (where we allow a node to be a descendant of itself).
      Parameters:
      v - a concept keyword.
      w - the other concept keyword.
      Returns:
      the lowest common ancestor.
      Throws:
      IllegalArgumentException - if either keyword is not in the taxonomy.
    • depth

      public int depth(String keyword)
      Returns the depth of a concept in the taxonomy (root has depth 0).
      Parameters:
      keyword - the concept keyword.
      Returns:
      the depth, or -1 if the keyword is not in the taxonomy.
    • size

      public int size()
      Returns the total number of named concepts (keywords) in the taxonomy.
      Returns:
      the number of keywords.
    • lowestCommonAncestor

      public Concept lowestCommonAncestor(Concept v, Concept w)
      Returns the lowest common ancestor (LCA) of concepts v and w. The lowest common ancestor is defined between two nodes v and w as the lowest node that has both v and w as descendants (where we allow a node to be a descendant of itself).
      Parameters:
      v - a concept.
      w - the other concept.
      Returns:
      the lowest common ancestor.
    • height

      public int height()
      Returns the height of the taxonomy tree, i.e. the maximum depth of any concept node (root has depth 0, so a single-node tree has height 0).
      Returns:
      the tree height.
    • isAncestor

      public boolean isAncestor(String ancestor, String descendant)
      Returns true if concept ancestor is an ancestor of concept descendant in the taxonomy.
      Parameters:
      ancestor - the potential ancestor keyword.
      descendant - the potential descendant keyword.
      Returns:
      true if ancestor is an ancestor of descendant.
      Throws:
      IllegalArgumentException - if either keyword is not in the taxonomy.
    • isDescendant

      public boolean isDescendant(String descendant, String ancestor)
      Returns true if concept descendant is a descendant of concept ancestor in the taxonomy.
      Parameters:
      descendant - the potential descendant keyword.
      ancestor - the potential ancestor keyword.
      Returns:
      true if descendant is a descendant of ancestor.
      Throws:
      IllegalArgumentException - if either keyword is not in the taxonomy.
    • shortestPath

      public List<Concept> shortestPath(String v, String w)
      Returns the shortest path between two concepts as an ordered list of concept nodes, from v to w (both endpoints inclusive). The path goes up from v to their lowest common ancestor, then down to w.
      Parameters:
      v - a concept keyword.
      w - the other concept keyword.
      Returns:
      the ordered list of concept nodes on the shortest path.
      Throws:
      IllegalArgumentException - if either keyword is not in the taxonomy.
    • shortestPath

      public List<Concept> shortestPath(Concept v, Concept w)
      Returns the shortest path between two concept nodes as an ordered list, from v to w (both endpoints inclusive).
      Parameters:
      v - a concept node.
      w - the other concept node.
      Returns:
      the ordered list of concept nodes on the shortest path.
    • subtree

      public List<String> subtree(String keyword)
      Returns all keywords in the sub-tree rooted at the given concept (i.e. the concept itself and all its descendants).
      Parameters:
      keyword - the root of the sub-tree.
      Returns:
      list of all keywords in the sub-tree.
      Throws:
      IllegalArgumentException - if the keyword is not in the taxonomy.
    • bfs

      public List<Concept> bfs()
      Returns all concept nodes in breadth-first order starting from the root.
      Returns:
      the BFS-ordered list of all concept nodes.
    • leaves

      public List<Concept> leaves()
      Returns all leaf concept nodes in the taxonomy (nodes with no children).
      Returns:
      the list of leaf concept nodes.
    • contains

      public boolean contains(String keyword)
      Returns true if the given keyword exists in the taxonomy.
      Parameters:
      keyword - the keyword to look up.
      Returns:
      true if the keyword is registered.
    • nodeCount

      public int nodeCount()
      Returns the total number of concept nodes (anonymous or named) in the taxonomy tree, including the root.
      Returns:
      the node count.
    • level

      public List<Concept> level(int level)
      Returns all concept nodes at the given depth level (root is level 0).
      Parameters:
      level - the depth level.
      Returns:
      all concept nodes at that depth.
    • forEach

      public void forEach(Consumer<Concept> visitor)
      Visits every concept node in depth-first pre-order, passing each node to the given consumer.
      Parameters:
      visitor - the consumer to call for each node.
    • toString

      public String toString()
      Returns a multi-line string that prints the taxonomy tree using Unicode box-drawing characters, e.g.:
      root
      ├── [A]
      │   ├── [B]
      │   └── [C]
      └── [D]
      
      Anonymous nodes are shown as (anon).
      Overrides:
      toString in class Object
      Returns:
      the tree representation.
    • of

      public static Taxonomy of(String text)
      Parses a taxonomy from a simple indented-text format. Each line is indent keyword[, keyword2, ...] where indent is multiples of 4 spaces or tabs. Indent level 0 is the root concept. Multiple keywords on the same line become synonyms of the same concept. Lines starting with # are comments and ignored. Example:
      # animal taxonomy
      animal
          mammal, warm-blooded
              dog, canine
              cat, feline
          reptile
              snake
      
      Parameters:
      text - the indented text.
      Returns:
      the parsed taxonomy.
      Throws:
      IllegalArgumentException - if the text is malformed.