smile.neighbor.LSH<E>

Type Parameters:: E - the type of data objects in the hash table.

All Implemented Interfaces:: Serializable, KNNSearch<double[],E>, RNNSearch<double[],E>

Direct Known Subclasses:: MPLSH, MutableLSH

public class LSH<E> extends Object implements KNNSearch<double[],E>, RNNSearch<double[],E>, Serializable

Locality-Sensitive Hashing. LSH is an efficient algorithm for approximate nearest neighbor search in high dimensional spaces by performing probabilistic dimension reduction of data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

By default, the query object (reference equality) is excluded from the neighborhood.

References

Alexandr Andoni and Piotr Indyk. Near-Optimal Hashing Algorithms for Near Neighbor Problem in High Dimensions. FOCS, 2006.
Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. 2004.

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

protected ArrayList<E>

data

The data objects.

protected int

H

The size of hash table.

protected List<Hash>

hash

Hash functions.

protected int

k

The number of random projections per hash value.

protected ArrayList<double[]>

keys

The object keys.

protected double

w

The width of projection.
Constructor Summary

Constructors

Constructor

Description

LSH(double[][] keys, E[] data, double w)

Constructor.

LSH(double[][] keys, E[] data, double w, int H)

Constructor.

LSH(int d, int L, int k, double w)

Constructor.

LSH(int d, int L, int k, double w, int H)

Constructor.
Method Summary

Modifier and Type

Method

Description

protected void

initHashTable(int d, int L, int k, double w, int H)

Initialize the hash tables.

Neighbor<double[],E>

nearest(double[] q)

Returns the nearest neighbor.

void

put(double[] key, E value)

Insert an item into the hash table.

void

search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)

Retrieves the neighbors in a fixed radius of query object, i.e.

Neighbor<double[],E>[]

search(double[] q, int k)

Retrieves the k nearest neighbors to the query key.

String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- keys
  
  protected ArrayList<double[]> keys
  
  The object keys.
- data
  
  protected ArrayList<E> data
  
  The data objects.
- hash
  
  protected List<Hash> hash
  
  Hash functions.
- H
  
  protected int H
  
  The size of hash table.
- k
  
  protected int k
  
  The number of random projections per hash value.
- w
  
  protected double w
  
  The width of projection. The hash function is defined as floor((a * x + b) / w). The value of w determines the bucket interval.
Constructor Details
- LSH
  
  public LSH(double[][] keys, E[] data, double w)
  
  Constructor.
  
  Parameters:
  
  keys - the object keys.
  
  data - the data objects.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
- LSH
  
  public LSH(double[][] keys, E[] data, double w, int H)
  
  Constructor.
  
  Parameters:
  
  keys - the object keys.
  
  data - the data objects.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
  
  H - the size of universal hash tables.
- LSH
  
  public LSH(int d, int L, int k, double w)
  
  Constructor.
  
  Parameters:
  
  d - the dimensionality of data.
  
  L - the number of hash tables.
  
  k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
- LSH
  
  public LSH(int d, int L, int k, double w, int H)
  
  Constructor.
  
  Parameters:
  
  d - the dimensionality of data.
  
  L - the number of hash tables.
  
  k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
  
  H - the size of universal hash tables.
Method Details
- initHashTable
  
  protected void initHashTable(int d, int L, int k, double w, int H)
  
  Initialize the hash tables.
  
  Parameters:
  
  d - the dimensionality of data.
  
  L - the number of hash tables.
  
  k - the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.
  
  w - the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
  
  H - the size of universal hash tables.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- put
  
  public void put(double[] key, E value)
  
  Insert an item into the hash table.
  
  Parameters:
  
  key - the key.
  
  value - the value.
- nearest
  
  public Neighbor<double[],E> nearest(double[] q)
  
  Description copied from interface: KNNSearch
  
  Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results.
  
  Specified by:
  
  nearest in interface KNNSearch<double[],E>
  
  Parameters:
  
  q - the query key.
  
  Returns:
  
  the nearest neighbor
- search
  
  public Neighbor<double[],E>[] search(double[] q, int k)
  
  Description copied from interface: KNNSearch
  
  Retrieves the k nearest neighbors to the query key.
  
  Specified by:
  
  search in interface KNNSearch<double[],E>
  
  Parameters:
  
  q - the query key.
  
  k - the number of nearest neighbors to search for.
  
  Returns:
  
  the k nearest neighbors
- search
  
  public void search(double[] q, double radius, List<Neighbor<double[],E>> neighbors)
  
  Description copied from interface: RNNSearch
  
  Retrieves the neighbors in a fixed radius of query object, i.e. d(q, v) <= radius.
  
  Specified by:
  
  search in interface RNNSearch<double[],E>
  
  Parameters:
  
  q - the query key.
  
  radius - the radius of search range from target.
  
  neighbors - the list to store found neighbors on output.

Class LSH<E>

References

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

keys

data

hash

H

k

w

Constructor Details

LSH

LSH

LSH

LSH

Method Details

initHashTable

toString

put

nearest

search

search