Package smile.neighbor
Class MPLSH<E>
java.lang.Object
smile.neighbor.LSH<E>
smile.neighbor.MPLSH<E>
 Type Parameters:
E
 the type of data objects in the hash table.
 All Implemented Interfaces:
Serializable
,KNNSearch<double[],
,E> RNNSearch<double[],
E>
MultiProbe LocalitySensitive Hashing. LSH is an efficient algorithm for
approximate nearest neighbor search in high dimensional spaces
by performing probabilistic dimension reduction of data. The basic idea
is to hash the input items so that similar items are mapped to the same
buckets with high probability (the number of buckets being much smaller
than the universe of possible input items). A drawback of LSH is the
requirement for a large number of hash tables in order to achieve good
search quality. Multiprobe LSH is designed to overcome this drawback.
Multiprobe LSH intelligently probes multiple buckets that are likely to
contain query results in a hash table.
By default, the query object (reference equality) is excluded from the neighborhood.
References
 Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multiprobe LSH: efficient indexing for highdimensional similarity search. VLDB, 2007.
 Alexis Joly and Olivier Buisson. A posteriori multiprobe locality sensitive hashing. ACM international conference on Multimedia, 2008.
 See Also:

Field Summary

Constructor Summary

Method Summary
Modifier and TypeMethodDescriptionvoid
Fits the posteriori multiple probe algorithm.void
Fits the posteriori multiple probe algorithm.void
Train the posteriori multiple probe algorithm.protected void
initHashTable
(int d, int L, int k, double w, int H) Initialize the hash tables.nearest
(double[] q) Returns the nearest neighbor.nearest
(double[] q, double recall, int T) Returns the approximate nearest neighbor.void
Retrieves the neighbors in a fixed radius of query object, i.e.void
Search the neighbors in the given radius of query object, i.e.search
(double[] q, int k) Retrieves the k nearest neighbors to the query key.search
(double[] q, int k, double recall, int T) Returns the approximate knearest neighbors.toString()

Constructor Details

MPLSH
public MPLSH(int d, int L, int k, double w) Constructor. Parameters:
d
 the dimensionality of data.L
 the number of hash tables.k
 the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.

MPLSH
public MPLSH(int d, int L, int k, double w, int H) Constructor. Parameters:
d
 the dimensionality of data.L
 the number of hash tables.k
 the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose an r value that is too large, which will increase the query time.H
 the number of buckets of hash tables.


Method Details

initHashTable
protected void initHashTable(int d, int L, int k, double w, int H) Description copied from class:LSH
Initialize the hash tables. Overrides:
initHashTable
in classLSH<E>
 Parameters:
d
 the dimensionality of data.L
 the number of hash tables.k
 the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
 the size of universal hash tables.

toString

fit
Fits the posteriori multiple probe algorithm. Parameters:
range
 the neighborhood search data structure.samples
 the training samples.radius
 the radius for range search.

fit
Fits the posteriori multiple probe algorithm. Parameters:
range
 the neighborhood search data structure.samples
 the training samples.radius
 the radius for range search.Nz
 the number of quantized values.

fit
public void fit(RNNSearch<double[], double[]> range, double[][] samples, double radius, int Nz, double sigma) Train the posteriori multiple probe algorithm. Parameters:
range
 the neighborhood search data structure.samples
 the training samples.radius
 the radius for range search.Nz
 the number of quantized values.sigma
 the Parzen window width.

nearest
Description copied from interface:KNNSearch
Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results. 
nearest
Returns the approximate nearest neighbor. A posteriori multiple probe model has to be trained already. Parameters:
q
 the query object.recall
 the expected recall rate.T
 the maximum number of probes. Returns:
 the approximate nearest neighbor.

search
Description copied from interface:KNNSearch
Retrieves the k nearest neighbors to the query key. 
search
Returns the approximate knearest neighbors. A posteriori multiple probe model has to be trained already. Parameters:
q
 the query object.k
 the number of nearest neighbors to search for.recall
 the expected recall rate.T
 the maximum number of probes. Returns:
 the approximate knearest neighbors.

search
Description copied from interface:RNNSearch
Retrieves the neighbors in a fixed radius of query object, i.e.d(q, v) <= radius
. 
search
public void search(double[] q, double radius, List<Neighbor<double[], E>> neighbors, double recall, int T) Search the neighbors in the given radius of query object, i.e.d(q, v) <= radius
. Parameters:
q
 the query object.radius
 the radius of search range.neighbors
 the list to store found neighbors in the given range on output.recall
 the expected recall rate.T
 the maximum number of probes.
