Package smile.neighbor
Class LSH<E>
java.lang.Object
smile.neighbor.LSH<E>
 Type Parameters:
E
 the type of data objects in the hash table.
 All Implemented Interfaces:
Serializable
,KNNSearch<double[],
,E> RNNSearch<double[],
E>
 Direct Known Subclasses:
MPLSH
,MutableLSH
public class LSH<E>
extends Object
implements KNNSearch<double[],E>, RNNSearch<double[],E>, Serializable
LocalitySensitive Hashing. LSH is an efficient algorithm for approximate
nearest neighbor search in high dimensional spaces by performing
probabilistic dimension reduction of data. The basic idea is to hash the
input items so that similar items are mapped to the same buckets with high
probability (the number of buckets being much smaller than the universe
of possible input items).
By default, the query object (reference equality) is excluded from the neighborhood.
References
 Alexandr Andoni and Piotr Indyk. NearOptimal Hashing Algorithms for Near Neighbor Problem in High Dimensions. FOCS, 2006.
 Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni. LocalitySensitive Hashing Scheme Based on pStable Distributions. 2004.
 See Also:

Field Summary
Modifier and TypeFieldDescriptionThe data objects.protected int
The size of hash table.Hash functions.protected int
The number of random projections per hash value.protected ArrayList
<double[]> The object keys.protected double
The width of projection. 
Constructor Summary

Method Summary
Modifier and TypeMethodDescriptionprotected void
initHashTable
(int d, int L, int k, double w, int H) Initialize the hash tables.nearest
(double[] q) Returns the nearest neighbor.void
Insert an item into the hash table.void
Retrieves the neighbors in a fixed radius of query object, i.e.search
(double[] q, int k) Retrieves the k nearest neighbors to the query key.toString()

Field Details

keys
The object keys. 
data
The data objects. 
hash
Hash functions. 
H
protected int HThe size of hash table. 
k
protected int kThe number of random projections per hash value. 
w
protected double wThe width of projection. The hash function is defined as floor((a * x + b) / w). The value of w determines the bucket interval.


Constructor Details

LSH
Constructor. Parameters:
keys
 the object keys.data
 the data objects.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.

LSH
Constructor. Parameters:
keys
 the object keys.data
 the data objects.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
 the size of universal hash tables.

LSH
public LSH(int d, int L, int k, double w) Constructor. Parameters:
d
 the dimensionality of data.L
 the number of hash tables.k
 the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.

LSH
public LSH(int d, int L, int k, double w, int H) Constructor. Parameters:
d
 the dimensionality of data.L
 the number of hash tables.k
 the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
 the size of universal hash tables.


Method Details

initHashTable
protected void initHashTable(int d, int L, int k, double w, int H) Initialize the hash tables. Parameters:
d
 the dimensionality of data.L
 the number of hash tables.k
 the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
 the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
 the size of universal hash tables.

toString

put
Insert an item into the hash table. Parameters:
key
 the key.value
 the value.

nearest
Description copied from interface:KNNSearch
Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results. 
search
Description copied from interface:KNNSearch
Retrieves the k nearest neighbors to the query key. 
search
Description copied from interface:RNNSearch
Retrieves the neighbors in a fixed radius of query object, i.e.d(q, v) <= radius
.
