smile.neighbor.MPLSH<E>

Type Parameters:: E - the type of data objects in the hash table.

All Implemented Interfaces:: Serializable, KNNSearch<double[],E>, RNNSearch<double[],E>

public class MPLSH<E> extends LSH<E>

Multi-Probe Locality-Sensitive Hashing. LSH is an efficient algorithm for approximate nearest neighbor search in high dimensional spaces by performing probabilistic dimension reduction of data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items). A drawback of LSH is the requirement for a large number of hash tables in order to achieve good search quality. Multi-probe LSH is designed to overcome this drawback. Multi-probe LSH intelligently probes multiple buckets that are likely to contain query results in a hash table.

By default, the query object (reference equality) is excluded from the neighborhood.

References

Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. VLDB, 2007.
Alexis Joly and Olivier Buisson. A posteriori multi-probe locality sensitive hashing. ACM international conference on Multimedia, 2008.

See Also:

Field Summary

Fields inherited from class smile.neighbor.LSH
data, H, hash, k, keys, w
Constructor Summary

Constructors

Constructor

Description

MPLSH(int d, int L, int k, double w)

Constructor.

MPLSH(int d, int L, int k, double w, int H)

Constructor.
Method Summary

Modifier and Type

Method

Description

void

fit(RNNSearch<double[],double[]> range, double[][] samples, double radius)

Fits the posteriori multiple probe algorithm.

void

fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz)

Fits the posteriori multiple probe algorithm.

void

fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz, double sigma)

Train the posteriori multiple probe algorithm.

protected void

initHashTable(int d, int L, int k, double w, int H)

Initialize the hash tables.

Neighbor<double[],E>

nearest(double[] q)

Returns the nearest neighbor.

Neighbor<double[],E>

nearest(double[] q, double recall, int T)

Returns the approximate nearest neighbor.

void

search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)

Retrieves the neighbors in a fixed radius of query object, i.e.

void

search(double[] q, double radius, List<Neighbor<double[],E>> neighbors, double recall, int T)

Search the neighbors in the given radius of query object, i.e.

Neighbor<double[],E>[]

search(double[] q, int k)

Retrieves the k nearest neighbors to the query key.

Neighbor<double[],E>[]

search(double[] q, int k, double recall, int T)

Returns the approximate k-nearest neighbors.

String

toString()

Methods inherited from class smile.neighbor.LSH
put

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- MPLSH
  
  public MPLSH(int d, int L, int k, double w)
  
  Constructor.
  
  Parameters:
  
  d - the dimensionality of data.
  
  L - the number of hash tables.
  
  k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
- MPLSH
  
  public MPLSH(int d, int L, int k, double w, int H)
  
  Constructor.
  
  Parameters:
  
  d - the dimensionality of data.
  
  L - the number of hash tables.
  
  k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose an r value that is too large, which will increase the query time.
  
  H - the number of buckets of hash tables.
Method Details
- initHashTable
  
  protected void initHashTable(int d, int L, int k, double w, int H)
  
  Description copied from class: LSH
  
  Initialize the hash tables.
  
  Overrides:
  
  initHashTable in class LSH<E>
  
  Parameters:
  
  d - the dimensionality of data.
  
  L - the number of hash tables.
  
  k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
  
  H - the size of universal hash tables.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class LSH<E>
- fit
  
  public void fit(RNNSearch<double[],double[]> range, double[][] samples, double radius)
  
  Fits the posteriori multiple probe algorithm.
  
  Parameters:
  
  range - the neighborhood search data structure.
  
  samples - the training samples.
  
  radius - the radius for range search.
- fit
  
  public void fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz)
  
  Fits the posteriori multiple probe algorithm.
  
  Parameters:
  
  range - the neighborhood search data structure.
  
  samples - the training samples.
  
  radius - the radius for range search.
  
  Nz - the number of quantized values.
- fit
  
  public void fit(RNNSearch<double[],double[]> range, double[][] samples, double radius, int Nz, double sigma)
  
  Train the posteriori multiple probe algorithm.
  
  Parameters:
  
  range - the neighborhood search data structure.
  
  samples - the training samples.
  
  radius - the radius for range search.
  
  Nz - the number of quantized values.
  
  sigma - the Parzen window width.
- nearest
  
  public Neighbor<double[],E> nearest(double[] q)
  
  Description copied from interface: KNNSearch
  
  Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results.
  
  Specified by:
  
  nearest in interface KNNSearch<double[],E>
  
  Overrides:
  
  nearest in class LSH<E>
  
  Parameters:
  
  q - the query key.
  
  Returns:
  
  the nearest neighbor
- nearest
  
  public Neighbor<double[],E> nearest(double[] q, double recall, int T)
  
  Returns the approximate nearest neighbor. A posteriori multiple probe model has to be trained already.
  
  Parameters:
  
  q - the query object.
  
  recall - the expected recall rate.
  
  T - the maximum number of probes.
  
  Returns:
  
  the approximate nearest neighbor.
- search
  
  public Neighbor<double[],E>[] search(double[] q, int k)
  
  Description copied from interface: KNNSearch
  
  Retrieves the k nearest neighbors to the query key.
  
  Specified by:
  
  search in interface KNNSearch<double[],E>
  
  Overrides:
  
  search in class LSH<E>
  
  Parameters:
  
  q - the query key.
  
  k - the number of nearest neighbors to search for.
  
  Returns:
  
  the k nearest neighbors
- search
  
  public Neighbor<double[],E>[] search(double[] q, int k, double recall, int T)
  
  Returns the approximate k-nearest neighbors. A posteriori multiple probe model has to be trained already.
  
  Parameters:
  
  q - the query object.
  
  k - the number of nearest neighbors to search for.
  
  recall - the expected recall rate.
  
  T - the maximum number of probes.
  
  Returns:
  
  the approximate k-nearest neighbors.
- search
  
  public void search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
  
  Description copied from interface: RNNSearch
  
  Retrieves the neighbors in a fixed radius of query object, i.e. d(q, v) <= radius.
  
  Specified by:
  
  search in interface RNNSearch<double[],E>
  
  Overrides:
  
  search in class LSH<E>
  
  Parameters:
  
  q - the query key.
  
  radius - the radius of search range from target.
  
  neighbors - the list to store found neighbors on output.
- search
  
  public void search(double[] q, double radius, List<Neighbor<double[],E>> neighbors, double recall, int T)
  
  Search the neighbors in the given radius of query object, i.e. d(q, v) <= radius.
  
  Parameters:
  
  q - the query object.
  
  radius - the radius of search range.
  
  neighbors - the list to store found neighbors in the given range on output.
  
  recall - the expected recall rate.
  
  T - the maximum number of probes.

Class MPLSH<E>

References

Field Summary

Fields inherited from class smile.neighbor.LSH

Constructor Summary

Method Summary

Methods inherited from class smile.neighbor.LSH

Methods inherited from class java.lang.Object

Constructor Details

MPLSH

MPLSH

Method Details

initHashTable

toString

fit

fit

fit

nearest

nearest

search

search

search

search