Class MPLSH<E>

java.lang.Object
smile.neighbor.LSH<E>
smile.neighbor.MPLSH<E>
Type Parameters:
E - the type of data objects in the hash table.
All Implemented Interfaces:
Serializable, KNNSearch<double[],E>, RNNSearch<double[],E>

public class MPLSH<E> extends LSH<E>
Multi-Probe Locality-Sensitive Hashing. LSH is an efficient algorithm for approximate nearest neighbor search in high dimensional spaces by performing probabilistic dimension reduction of data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items). A drawback of LSH is the requirement for a large number of hash tables in order to achieve good search quality. Multi-probe LSH is designed to overcome this drawback. Multi-probe LSH intelligently probes multiple buckets that are likely to contain query results in a hash table.

By default, the query object (reference equality) is excluded from the neighborhood.

References

  1. Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. VLDB, 2007.
  2. Alexis Joly and Olivier Buisson. A posteriori multi-probe locality sensitive hashing. ACM international conference on Multimedia, 2008.
See Also:
  • Field Summary

    Fields inherited from class smile.neighbor.LSH

    data, H, hash, k, keys, w
  • Constructor Summary

    Constructors
    Constructor
    Description
    MPLSH(int d, int L, int k, double w)
    Constructor.
    MPLSH(int d, int L, int k, double w, int H)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    fit(RNNSearch<double[],double[]> range, double[][] samples, double radius)
    Fits the posteriori multiple probe algorithm.
    void
    fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz)
    Fits the posteriori multiple probe algorithm.
    void
    fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz, double sigma)
    Train the posteriori multiple probe algorithm.
    protected void
    initHashTable(int d, int L, int k, double w, int H)
    Initialize the hash tables.
    Neighbor<double[],E>
    nearest(double[] q)
    Returns the nearest neighbor.
    Neighbor<double[],E>
    nearest(double[] q, double recall, int T)
    Returns the approximate nearest neighbor.
    void
    search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
    Retrieves the neighbors in a fixed radius of query object, i.e.
    void
    search(double[] q, double radius, List<Neighbor<double[],E>> neighbors, double recall, int T)
    Search the neighbors in the given radius of query object, i.e.
    Neighbor<double[],E>[]
    search(double[] q, int k)
    Retrieves the k nearest neighbors to the query key.
    Neighbor<double[],E>[]
    search(double[] q, int k, double recall, int T)
    Returns the approximate k-nearest neighbors.
     

    Methods inherited from class smile.neighbor.LSH

    put

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • MPLSH

      public MPLSH(int d, int L, int k, double w)
      Constructor.
      Parameters:
      d - the dimensionality of data.
      L - the number of hash tables.
      k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
    • MPLSH

      public MPLSH(int d, int L, int k, double w, int H)
      Constructor.
      Parameters:
      d - the dimensionality of data.
      L - the number of hash tables.
      k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose an r value that is too large, which will increase the query time.
      H - the number of buckets of hash tables.
  • Method Details

    • initHashTable

      protected void initHashTable(int d, int L, int k, double w, int H)
      Description copied from class: LSH
      Initialize the hash tables.
      Overrides:
      initHashTable in class LSH<E>
      Parameters:
      d - the dimensionality of data.
      L - the number of hash tables.
      k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
      w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
      H - the size of universal hash tables.
    • toString

      public String toString()
      Overrides:
      toString in class LSH<E>
    • fit

      public void fit(RNNSearch<double[],double[]> range, double[][] samples, double radius)
      Fits the posteriori multiple probe algorithm.
      Parameters:
      range - the neighborhood search data structure.
      samples - the training samples.
      radius - the radius for range search.
    • fit

      public void fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz)
      Fits the posteriori multiple probe algorithm.
      Parameters:
      range - the neighborhood search data structure.
      samples - the training samples.
      radius - the radius for range search.
      Nz - the number of quantized values.
    • fit

      public void fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz, double sigma)
      Train the posteriori multiple probe algorithm.
      Parameters:
      range - the neighborhood search data structure.
      samples - the training samples.
      radius - the radius for range search.
      Nz - the number of quantized values.
      sigma - the Parzen window width.
    • nearest

      public Neighbor<double[],E> nearest(double[] q)
      Description copied from interface: KNNSearch
      Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results.
      Specified by:
      nearest in interface KNNSearch<double[],E>
      Overrides:
      nearest in class LSH<E>
      Parameters:
      q - the query key.
      Returns:
      the nearest neighbor
    • nearest

      public Neighbor<double[],E> nearest(double[] q, double recall, int T)
      Returns the approximate nearest neighbor. A posteriori multiple probe model has to be trained already.
      Parameters:
      q - the query object.
      recall - the expected recall rate.
      T - the maximum number of probes.
      Returns:
      the approximate nearest neighbor.
    • search

      public Neighbor<double[],E>[] search(double[] q, int k)
      Description copied from interface: KNNSearch
      Retrieves the k nearest neighbors to the query key.
      Specified by:
      search in interface KNNSearch<double[],E>
      Overrides:
      search in class LSH<E>
      Parameters:
      q - the query key.
      k - the number of nearest neighbors to search for.
      Returns:
      the k nearest neighbors
    • search

      public Neighbor<double[],E>[] search(double[] q, int k, double recall, int T)
      Returns the approximate k-nearest neighbors. A posteriori multiple probe model has to be trained already.
      Parameters:
      q - the query object.
      k - the number of nearest neighbors to search for.
      recall - the expected recall rate.
      T - the maximum number of probes.
      Returns:
      the approximate k-nearest neighbors.
    • search

      public void search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
      Description copied from interface: RNNSearch
      Retrieves the neighbors in a fixed radius of query object, i.e. d(q, v) <= radius.
      Specified by:
      search in interface RNNSearch<double[],E>
      Overrides:
      search in class LSH<E>
      Parameters:
      q - the query key.
      radius - the radius of search range from target.
      neighbors - the list to store found neighbors on output.
    • search

      public void search(double[] q, double radius, List<Neighbor<double[],E>> neighbors, double recall, int T)
      Search the neighbors in the given radius of query object, i.e. d(q, v) <= radius.
      Parameters:
      q - the query object.
      radius - the radius of search range.
      neighbors - the list to store found neighbors in the given range on output.
      recall - the expected recall rate.
      T - the maximum number of probes.