Class LSH<E>
java.lang.Object
smile.neighbor.LSH<E>
- Type Parameters:
E- the type of data objects in the hash table.
- All Implemented Interfaces:
Serializable, KNNSearch<double[],E>, RNNSearch<double[], E>
- Direct Known Subclasses:
MPLSH, MutableLSH
public class LSH<E>
extends Object
implements KNNSearch<double[],E>, RNNSearch<double[],E>, Serializable
Locality-Sensitive Hashing. LSH is an efficient algorithm for approximate
nearest neighbor search in high dimensional spaces by performing
probabilistic dimension reduction of data. The basic idea is to hash the
input items so that similar items are mapped to the same buckets with high
probability (the number of buckets being much smaller than the universe
of possible input items).
By default, the query object (reference equality) is excluded from the neighborhood.
References
- Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing Algorithms for Near Neighbor Problem in High Dimensions. FOCS, 2006.
- Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. 2004.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionThe data objects.protected intThe size of hash table.Hash functions.protected intThe number of random projections per hash value.protected ArrayList<double[]> The object keys.protected doubleThe width of projection. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidinitHashTable(int d, int L, int k, double w, int H) Initialize the hash tables.nearest(double[] q) Returns the nearest neighbor.voidInsert an item into the hash table.voidRetrieves the neighbors in a fixed radius of query object, i.e.search(double[] q, int k) Retrieves the k nearest neighbors to the query key.toString()
-
Field Details
-
keys
The object keys. -
data
-
hash
-
H
protected int HThe size of hash table. -
k
protected int kThe number of random projections per hash value. -
w
protected double wThe width of projection. The hash function is defined as floor((a * x + b) / w). The value of w determines the bucket interval.
-
-
Constructor Details
-
LSH
Constructor.- Parameters:
keys- the object keys.data- the data objects.w- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
-
LSH
Constructor.- Parameters:
keys- the object keys.data- the data objects.w- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H- the size of universal hash tables.
-
LSH
public LSH(int d, int L, int k, double w) Constructor.- Parameters:
d- the dimensionality of data.L- the number of hash tables.k- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
-
LSH
public LSH(int d, int L, int k, double w, int H) Constructor.- Parameters:
d- the dimensionality of data.L- the number of hash tables.k- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H- the size of universal hash tables.
-
-
Method Details
-
initHashTable
protected void initHashTable(int d, int L, int k, double w, int H) Initialize the hash tables.- Parameters:
d- the dimensionality of data.L- the number of hash tables.k- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H- the size of universal hash tables.
-
toString
-
put
Insert an item into the hash table.- Parameters:
key- the key.value- the value.
-
nearest
Description copied from interface:KNNSearchReturns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results. -
search
Description copied from interface:KNNSearchRetrieves the k nearest neighbors to the query key. -
search
Description copied from interface:RNNSearchRetrieves the neighbors in a fixed radius of query object, i.e.d(q, v) <= radius.
-