Problem: Obviously, a mapping from a potentially huge set of
strings to a small set of integers
will not be unique.
The hash function maps keys into indices in many-to-one fashion.
Having a second key into a previously used slot is called a collision.
Collision resolution: deals with keys that are
mapped to same addresses.
Invented by H. P. Luhn, an IBM engineer, in January 1953.
Idea: Keys hashing to same address are kept in lists attached to that address
For each table address, a linked list of the records whose keys hash to that address,
This method is useful for highly dynamic situations,
where the number of the search keys cannot be predicted in advance.
Records: (each character is a key):
Key: A S E A R C H I N G E X A M P L E
Hash (M = 11): 1 8 5 1 7 3 8 9 3 7 5 2 1 2 5 1 5
0 1 2 3 4 5 6 7 8 9 10
= L M N = E = G H I =
A X C P R S =
A = = E = =
Each column is a linked list. Thus we maintain M lists
with M list header nodes.
We can use the list search and insert procedures for
Complexity of separate chaining
The time to compute the index of a given key is a constant.
Then we have to search in a list for the record. Therefore
the time depends on the length of the lists. It has been shown
empirically that on average the list length is N/M
(the load factor L),
provided M is prime and we use a function that gives
Unsuccessful searches go to the end of some list, hence we have
Successful searches are expected to go half the way
down some list. On average the number of comparisons
in successful search is L/2. Therefore we can say that runtime
complexity of separate chaining is O(L).
Note, that what really matters is the load factor rather
than the size of the table or the number of records, taken
How to choose M in separate chaining?
Since the method is used in cases when we cannot predict the
number of records in advance, the choice of M basically
depends on other factors such as available memory.
Typically M is chosen relatively small so as not to use up a
large area of contiguous memory, but enough large
so that the lists are short for more efficient sequential search.
Recommendations in the literature vary form M to be about one
tenth of N - the number of the records to M
to be equal (or close to) N.
Other methods of chaining:
- Keep the lists ordered: useful if there are much more searches than inserts,
and if most of the searches are unsuccessful.
- Represent the chains as binary search tree. Extra effort
needed – not efficient.
Advantages of separate chaining– used when memory is of
concern, easily implemented.
Disadvantages – unevenly distributed keys – long
lists and many empty spaces in the table.