Interface SimHash<T>
- Type Parameters:
T- the data type of set objects.
public interface SimHash<T>
SimHash is a technique for quickly estimating how similar two sets are.
The algorithm is used by the Google Crawler to find near duplicate pages.
-
Method Summary
-
Method Details
-
hash
-
of
Returns theSimHashfor a set of generic features (represented as byte[]).- Parameters:
features- the generic features.- Returns:
- the
SimHash.
-
text
-