Class BKTree<K,V>

java.lang.Object
smile.neighbor.BKTree<K,V>
Type Parameters:
K - the type of keys.
V - the type of associated objects.
All Implemented Interfaces:
Serializable, RNNSearch<K,V>

public class BKTree<K,V> extends Object implements RNNSearch<K,V>, Serializable
A BK-tree is a metric tree specifically adapted to discrete metric spaces. For simplicity, let us consider integer discrete metric d(x,y). Then, BK-tree is defined in the following way. An arbitrary element a is selected as root. Root may have zero or more subtrees. The k-th subtree is recursively built of all elements a such that d(a,b) = k. BK-trees can be used for approximate string matching in a dictionary.

By default, the query object (reference equality) is excluded from the neighborhood. You may change this behavior with setIdenticalExcluded. Note that you may observe weird behavior with String objects. JVM will pool the string literal objects. So the below variables String a = "ABC"; String b = "ABC"; String c = "AB" + "C"; are actually equal in reference test a == b == c. With toy data that you type explicitly in the code, this will cause problems. Fortunately, the data would be generally read from secondary storage in production.

References

  1. W. Burkhard and R. Keller. Some approaches to best-match file searching. CACM, 1973.
See Also:
  • Constructor Details

    • BKTree

      public BKTree(Metric<K> distance)
      Constructor.
      Parameters:
      distance - the metric used to build BK-tree. Note that the metric must be a discrete distance, e.g. edit distance, Hamming distance, Lee distance, Jaccard distance, and taxonomic distance, etc.
  • Method Details

    • of

      public static <T> BKTree<T,T> of(T[] data, Metric<T> distance)
      Return a BK-tree of the data.
      Type Parameters:
      T - the type of keys and values.
      Parameters:
      data - the data objects, which are also used as key.
      distance - the metric used to build BK-tree. Note that the metric must be a discrete distance, e.g. edit distance, Hamming distance, Lee distance, Jaccard distance, and taxonomic distance, etc.
      Returns:
      BK-tree.
    • of

      public static <T> BKTree<T,T> of(List<T> data, Metric<T> distance)
      Return a BK-tree of the data.
      Type Parameters:
      T - the type of keys and values.
      Parameters:
      data - the data objects, which are also used as key.
      distance - the metric used to build BK-tree. Note that the metric must be a discrete distance, e.g. edit distance, Hamming distance, Lee distance, Jaccard distance, and taxonomic distance, etc.
      Returns:
      BK-tree.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • add

      public void add(K key, V value)
      Adds a datum into the BK-tree.
      Parameters:
      key - the data key.
      value - the data object.
    • add

      public void add(Map<K,V> data)
      Adds a dataset into BK-tree.
      Parameters:
      data - the dataset to insert into the BK-tree.
    • search

      public void search(K q, double radius, List<Neighbor<K,V>> neighbors)
      Description copied from interface: RNNSearch
      Retrieves the neighbors in a fixed radius of query object, i.e. d(q, v) <= radius.
      Specified by:
      search in interface RNNSearch<K,V>
      Parameters:
      q - the query key.
      radius - the radius of search range from target.
      neighbors - the list to store found neighbors on output.
    • search

      public void search(K q, int radius, List<Neighbor<K,V>> neighbors)
      Search the neighbors in the given radius of query object, i.e. d(q, v) <= radius.
      Parameters:
      q - the query object.
      radius - the radius of search range from target.
      neighbors - the list to store found neighbors in the given range on output.