Class PerfectHash
java.lang.Object
smile.hash.PerfectHash
- All Implemented Interfaces:
Serializable
A perfect hash of an array of strings to their index in the array.
What is a perfect hash?
A perfect hash function for a setS is a hash function that maps
distinct elements in S to a set of integers with no collisions.
In mathematical terms, it is an injective function. Perfect hash functions
may be used to implement a lookup table with constant worst-case access time.
Hash function
Given a keywordw of length L, the hash value is computed as:
h(w) = (L + sum_i kvals[w[i] - min]) mod tsizewhere
w[i]is thei-th character ofw(or only the characters at the user-suppliedselectpositions when provided),minis the smallest character value that appears in any keyword,kvalsis an integer array indexed by(character - min), andtsizeis the size of the hash table.
Construction algorithm
The construction uses a randomized character-value approach:- Character frequency analysis. The frequency of every character across all keywords is counted. Keywords are sorted in descending order of their aggregate character frequency so that the most-constrained keys are placed first.
- Table-size selection. The initial table size
tsizeis set toceil(1.5 * n), wherenis the number of keywords, giving a load factor of about 2/3. - Random assignment. Each entry of
kvalsis assigned a uniformly random integer in[0, tsize). Allnhash valuesh(w) mod tsizeare then checked for collisions. - Retry. If any two keywords collide, a new random
kvalsassignment is drawn and step 3 is repeated. At load factor 2/3 the probability that a random assignment is collision-free is at leaste^{-n^2/(2*tsize)} ≈ e^{-3n/4}, so the expected number of trials is O(1) per construction. - Table growth. If no collision-free assignment is found within
1 000 attempts,
tsizeis incremented by one and the random search restarts from scratch. This ensures guaranteed termination. - Table construction. Once a collision-free
kvalsis found, the final lookup table of sizetsizeis filled with keyword indices. Slots that correspond to no keyword are left as-1.
O(n * |alphabet|).- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionPerfectHash(int[] select, String... keywords) Constructor.PerfectHash(String... keywords) Constructor. -
Method Summary
-
Constructor Details
-
PerfectHash
Constructor.- Parameters:
keywords- the fixed set of keywords.
-
PerfectHash
Constructor.- Parameters:
select- The character positions in keywords used to calculate the hash.keywords- the fixed set of keywords.
-
-
Method Details
-
get
Returns the index of a keyword. If the string is not in the set, returns -1.- Parameters:
key- the keyword.- Returns:
- the index of keyword or -1.
-