Package smile.neighbor
Class SNLSH<K,V>
java.lang.Object
smile.neighbor.SNLSH<K,V>
 Type Parameters:
K
 the type of keys.V
 the type of associated objects.
 All Implemented Interfaces:
Serializable
,RNNSearch<K,
V>
LocalitySensitive Hashing for Signatures.
LSH is an efficient algorithm for approximate nearest neighbor search
in high dimensional spaces by performing probabilistic dimension reduction
of data. The basic idea is to hash the input items so that similar items
are mapped to the same buckets with high probability (the number of buckets
being much smaller than the universe of possible input items).
If we are given signatures for the sets, we may divide them into bands,
and only measure the similarity of a pair of sets if they are identical
in at least one band. By choosing the size of bands appropriately, we can
eliminate most of the pairs that do not meet our threshold of similarity.
By default, the query object (reference equality) is excluded from the
neighborhood. Note that you may observe weird behavior with String objects.
JVM will pool the string literal objects. So the below variables
String a = "ABC";
String b = "ABC";
String c = "AB" + "C";
are actually equal in reference test a == b == c
. With toy data
that you type explicitly in the code, this will cause problems. Fortunately,
the data would be generally read from secondary storage in production.
References
 Moses S. Charikar. Similarity Estimation Techniques from Rounding Algorithms
 See Also:

Constructor Summary

Method Summary

Constructor Details

SNLSH
Constructor. Parameters:
L
 the number of bands/hash tables.hash
 simhash function.


Method Details

put
Adds a new item. Parameters:
key
 the key.value
 the value.

search
Description copied from interface:RNNSearch
Retrieves the neighbors in a fixed radius of query object, i.e.d(q, v) <= radius
.
