Class LSH<E>

java.lang.Object
smile.neighbor.LSH<E>
Type Parameters:
E - the type of data objects in the hash table.
All Implemented Interfaces:
Serializable, KNNSearch<double[],E>, RNNSearch<double[],E>
Direct Known Subclasses:
MPLSH, MutableLSH

public class LSH<E> extends Object implements KNNSearch<double[],E>, RNNSearch<double[],E>, Serializable
Locality-Sensitive Hashing. LSH is an efficient algorithm for approximate nearest neighbor search in high dimensional spaces by performing probabilistic dimension reduction of data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

By default, the query object (reference equality) is excluded from the neighborhood.

References

  1. Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing Algorithms for Near Neighbor Problem in High Dimensions. FOCS, 2006.
  2. Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. 2004.
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected ArrayList<E>
    The data objects.
    protected int
    The size of hash table.
    protected List<Hash>
    Hash functions.
    protected int
    The number of random projections per hash value.
    protected ArrayList<double[]>
    The object keys.
    protected double
    The width of projection.
  • Constructor Summary

    Constructors
    Constructor
    Description
    LSH(double[][] keys, E[] data, double w)
    Constructor.
    LSH(double[][] keys, E[] data, double w, int H)
    Constructor.
    LSH(int d, int L, int k, double w)
    Constructor.
    LSH(int d, int L, int k, double w, int H)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected void
    initHashTable(int d, int L, int k, double w, int H)
    Initialize the hash tables.
    Neighbor<double[],E>
    nearest(double[] q)
    Returns the nearest neighbor.
    void
    put(double[] key, E value)
    Insert an item into the hash table.
    void
    search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
    Retrieves the neighbors in a fixed radius of query object, i.e.
    Neighbor<double[],E>[]
    search(double[] q, int k)
    Retrieves the k nearest neighbors to the query key.
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • keys

      protected ArrayList<double[]> keys
      The object keys.
    • data

      protected ArrayList<E> data
      The data objects.
    • hash

      protected List<Hash> hash
      Hash functions.
    • H

      protected int H
      The size of hash table.
    • k

      protected int k
      The number of random projections per hash value.
    • w

      protected double w
      The width of projection. The hash function is defined as floor((a * x + b) / w). The value of w determines the bucket interval.
  • Constructor Details

    • LSH

      public LSH(double[][] keys, E[] data, double w)
      Constructor.
      Parameters:
      keys - the object keys.
      data - the data objects.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
    • LSH

      public LSH(double[][] keys, E[] data, double w, int H)
      Constructor.
      Parameters:
      keys - the object keys.
      data - the data objects.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
      H - the size of universal hash tables.
    • LSH

      public LSH(int d, int L, int k, double w)
      Constructor.
      Parameters:
      d - the dimensionality of data.
      L - the number of hash tables.
      k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
    • LSH

      public LSH(int d, int L, int k, double w, int H)
      Constructor.
      Parameters:
      d - the dimensionality of data.
      L - the number of hash tables.
      k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
      H - the size of universal hash tables.
  • Method Details

    • initHashTable

      protected void initHashTable(int d, int L, int k, double w, int H)
      Initialize the hash tables.
      Parameters:
      d - the dimensionality of data.
      L - the number of hash tables.
      k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
      H - the size of universal hash tables.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • put

      public void put(double[] key, E value)
      Insert an item into the hash table.
      Parameters:
      key - the key.
      value - the value.
    • nearest

      public Neighbor<double[],E> nearest(double[] q)
      Description copied from interface: KNNSearch
      Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results.
      Specified by:
      nearest in interface KNNSearch<double[],E>
      Parameters:
      q - the query key.
      Returns:
      the nearest neighbor
    • search

      public Neighbor<double[],E>[] search(double[] q, int k)
      Description copied from interface: KNNSearch
      Retrieves the k nearest neighbors to the query key.
      Specified by:
      search in interface KNNSearch<double[],E>
      Parameters:
      q - the query key.
      k - the number of nearest neighbors to search for.
      Returns:
      the k nearest neighbors
    • search

      public void search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
      Description copied from interface: RNNSearch
      Retrieves the neighbors in a fixed radius of query object, i.e. d(q, v) <= radius.
      Specified by:
      search in interface RNNSearch<double[],E>
      Parameters:
      q - the query key.
      radius - the radius of search range from target.
      neighbors - the list to store found neighbors on output.