Package smile.neighbor
Class LSH<E>
java.lang.Object
smile.neighbor.LSH<E>
- Type Parameters:
E
- the type of data objects in the hash table.
- All Implemented Interfaces:
Serializable
,KNNSearch<double[],
,E> RNNSearch<double[],
E>
- Direct Known Subclasses:
MPLSH
,MutableLSH
public class LSH<E>
extends Object
implements KNNSearch<double[],E>, RNNSearch<double[],E>, Serializable
Locality-Sensitive Hashing. LSH is an efficient algorithm for approximate
nearest neighbor search in high dimensional spaces by performing
probabilistic dimension reduction of data. The basic idea is to hash the
input items so that similar items are mapped to the same buckets with high
probability (the number of buckets being much smaller than the universe
of possible input items).
By default, the query object (reference equality) is excluded from the neighborhood.
References
- Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing Algorithms for Near Neighbor Problem in High Dimensions. FOCS, 2006.
- Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. 2004.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionThe data objects.protected int
The size of hash table.Hash functions.protected int
The number of random projections per hash value.protected ArrayList
<double[]> The object keys.protected double
The width of projection. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected void
initHashTable
(int d, int L, int k, double w, int H) Initialize the hash tables.nearest
(double[] q) Returns the nearest neighbor.void
Insert an item into the hash table.void
Retrieves the neighbors in a fixed radius of query object, i.e.search
(double[] q, int k) Retrieves the k nearest neighbors to the query key.toString()
-
Field Details
-
keys
The object keys. -
data
The data objects. -
hash
Hash functions. -
H
protected int HThe size of hash table. -
k
protected int kThe number of random projections per hash value. -
w
protected double wThe width of projection. The hash function is defined as floor((a * x + b) / w). The value of w determines the bucket interval.
-
-
Constructor Details
-
LSH
Constructor.- Parameters:
keys
- the object keys.data
- the data objects.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
-
LSH
Constructor.- Parameters:
keys
- the object keys.data
- the data objects.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
- the size of universal hash tables.
-
LSH
public LSH(int d, int L, int k, double w) Constructor.- Parameters:
d
- the dimensionality of data.L
- the number of hash tables.k
- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
-
LSH
public LSH(int d, int L, int k, double w, int H) Constructor.- Parameters:
d
- the dimensionality of data.L
- the number of hash tables.k
- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
- the size of universal hash tables.
-
-
Method Details
-
initHashTable
protected void initHashTable(int d, int L, int k, double w, int H) Initialize the hash tables.- Parameters:
d
- the dimensionality of data.L
- the number of hash tables.k
- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
- the size of universal hash tables.
-
toString
-
put
Insert an item into the hash table.- Parameters:
key
- the key.value
- the value.
-
nearest
Description copied from interface:KNNSearch
Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results. -
search
Description copied from interface:KNNSearch
Retrieves the k nearest neighbors to the query key. -
search
Description copied from interface:RNNSearch
Retrieves the neighbors in a fixed radius of query object, i.e.d(q, v) <= radius
.
-