Hash
Hash rather than a data structure, it's more like a algorithm. It provide time complexity to quickly access the elements. We want to find a efficient bi-jection function to store data.
Ingredients to make hash table:
- universe: set of all possible keys very large, otherwise we could use direct access something like array
- For a hash table, T, , is an array and depends, as big as needed.
- a hash function
- load factor , if is too big, we need to rehash. Load factor is the average number of elements in a chain.
That's, we have space , and following method:
search(k):
look for
insert(k):
change in
delete(k):
remove k from
Normally, we create a hash function with division method, multiplication method
- division method:
- multiplication method: , where is the golden ratio, and is the size of the table, is the fractional part of .
For a hash table, we also can do chaining. That is, we have a linked list in each slot of the hash table. We can use a linked list to store the elements that have the same hash value.
In a hashing, we also define the situation open addressing when all elements occupy the hash table itself.
Hashing with Chaining
We define load factor , where is the number of elements in the table, and is the size of the table, or we would like to say the average number of elements in a chain.
Operation | Worst-case Running Time | Average-case Running Time |
---|---|---|
Search | ||
Insert | ||
Delete |
Open Addressing
In open addressing, each table entry contains either an element or NIL. We have three methods to deal with collision:
- linear probing:
- quadratic probing:
- double hashing:
- i is the number of full slots
Linear probing suffer from primary clustering problem where whenever we reach a non-empty slot, we will rise the i which increasing the average searching time.
Quadratic probing is better than linear probing, but it still suffer from Secondary clustering problem where the iniital probe position equal implies their probe sequence will be the same.
Double hashing is the best method among the three, it requires two relative prime hash functions.
Double hashing provide distinct probe sequence where use space which better than space of quadratic probing and linear probing.
In open addressing, the load factor is bounded where since .
Operation | Average-case Running Time |
---|---|
Search | |
Insert | |
Delete |
something more
Even though the worse-case running time of hash table operations are worse than that of AVL trees, the average-case running time of hash table operations are much better than the running time of AVL trees.
You are asked to implement your own hash map. What are good things to do?
- Use a prime number modulus in your hash function
- Design a hash function that spreads hashes evenly
- Implement some sort of method to handle collisions (i.e. chaining, linear probing, etc.)