Class KDTree<E>

java.lang.Object
smile.neighbor.KDTree<E>
Type Parameters:
E - the type of data objects in the tree.
All Implemented Interfaces:
Serializable, KNNSearch<double[],E>, RNNSearch<double[],E>

public class KDTree<E> extends Object implements KNNSearch<double[],E>, RNNSearch<double[],E>, Serializable
A KD-tree (short for k-dimensional tree) is a space-partitioning dataset structure for organizing points in a k-dimensional space. KD-trees are a useful dataset structure for nearest neighbor searches. The kd-tree is a binary tree in which every node is a k-dimensional point. Every non-leaf node generates a splitting hyperplane that divides the space into two subspaces. Points left to the hyperplane represent the left subtree of that node and the points right to the hyperplane by the right subtree. The hyperplane direction is chosen in the following way: every node split to subtrees is associated with one of the k-dimensions, such that the hyperplane is perpendicular to that dimension vector. So, for example, if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right subtree.

KD-trees are not suitable for efficiently finding the nearest neighbor in high dimensional spaces. As a general rule, if the dimensionality is D, then number of points in the dataset, N, should be N >> 2D. Otherwise, when kd-trees are used with high-dimensional dataset, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search, and approximate nearest-neighbor methods should be used instead.

By default, the query object (reference equality) is excluded from the neighborhood.

See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    KDTree(double[][] key, E[] data)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    Neighbor<double[],E>
    nearest(double[] q)
    Returns the nearest neighbor.
    static KDTree<double[]>
    of(double[][] data)
    Return a KD-tree of the data.
    void
    search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
    Retrieves the neighbors in a fixed radius of query object, i.e.
    Neighbor<double[],E>[]
    search(double[] q, int k)
    Retrieves the k nearest neighbors to the query key.
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • KDTree

      public KDTree(double[][] key, E[] data)
      Constructor.
      Parameters:
      key - the object keys.
      data - the data objects.
  • Method Details

    • of

      public static KDTree<double[]> of(double[][] data)
      Return a KD-tree of the data.
      Parameters:
      data - the data objects, which are also used as key.
      Returns:
      KD-tree.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • nearest

      public Neighbor<double[],E> nearest(double[] q)
      Description copied from interface: KNNSearch
      Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results.
      Specified by:
      nearest in interface KNNSearch<double[],E>
      Parameters:
      q - the query key.
      Returns:
      the nearest neighbor
    • search

      public Neighbor<double[],E>[] search(double[] q, int k)
      Description copied from interface: KNNSearch
      Retrieves the k nearest neighbors to the query key.
      Specified by:
      search in interface KNNSearch<double[],E>
      Parameters:
      q - the query key.
      k - the number of nearest neighbors to search for.
      Returns:
      the k nearest neighbors
    • search

      public void search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
      Description copied from interface: RNNSearch
      Retrieves the neighbors in a fixed radius of query object, i.e. d(q, v) <= radius.
      Specified by:
      search in interface RNNSearch<double[],E>
      Parameters:
      q - the query key.
      radius - the radius of search range from target.
      neighbors - the list to store found neighbors on output.