The table is T[0..m-1] where m is the table size, and the input is x_1,...,x_n. The has function h is given.
We will insert x_i in slot h[key[x_i]] in T. However, if the slot is occupied, we will try the next slot, until an empty slot is found.
Example: Say we are given a table with 10 slots. Let our hash function be the last digit of X. Now we input the values (in order) 19, 33, 43, 53, 21.
We apply our hash function to the first value: the last digit is 9, so 19 goes in slot 9. Next is 33: this goes in slot 3. Now we come to a problem: the next number, 43 should also go into slot 3. We have no linked lists, so where does it go? In open addressing, a value that is slated for an occupied slot goes into the next empty slot. 43 then goes into slot 4. 53: slot 3 is filled, go to slot 4. Slot 4 is filled, go to slot 5. Slot 5 is empty, so 53 goes into slot 5.
The table is usually made circular, so that a value meant to be inserted into the last slot (which is occupied) is sent around to the front of the list. So if we were to insert 99 into the table, we see that slot 9 is occupied, and 99 is sent to slot 0, which is empty.
Note that if we have more values entered than slots, we can run out of room in the table, and be unable to make any more insertions. For this reason, the load factor a in open addressing can never exceed 1.0 .
Say we want to delete 43. First we go to slot 3: It's occupied, but not by 43. So we go on to the next occupied slot. We continue to do this until we find either the number we're looking for (success), an empty slot or we arrive back where we started. In this case, the very next slot is occupied by 43, so we remove it and mark the slot as deleted. (We'll see why in a moment.)
Now we will search for 53. We check the first slot indicated, 3. It's occupied, but not by 53, so we move on. Now we come slot 4, which is empty. So we should stop, right? If we hit an empty slot, we know 53 can't be beyond this slot. This is why we marked slot 4 "deleted". We know there was something here, so 53 can be beyond this point. We move on to the next slot, and we have located 53.
The probe sequence is the permutation of 0,1,...,m-1 used for searching for an available slot when inserting an element. For a key k the sequence is denoted by:
Thus, h(k,0) is what we used to call h(k). The probe sequence used in the simple introductory example is simply
h(k,0), h(k,1), ..., h(k,m-1)
all mod m (to force the circular inspection of the array). Clearly, this is a permutation, but not a very good one (we'll see why in a moment).
h(k), h(k) + 1, ..., h(k) + m - 1
This is more or less self-explanatory. In linear probing, the slots in the has table are probed in a linear fashion, in order (ie. check slot 4, then slot 5, then 6, then 7,...etc). We express linear probing as
where c is an integer. While the slots are probed in a linear fashion, they are not necessarily probed in exact order. Note that we use mod m to account for the (usually) circular nature of the table. One of the problems with linear probing is called primary clustering. In this, "blocks" of occupied slots tend to accumulate; the larger a block is, the faster it grows, by the principles of probability. This increases the average search time in the table.
h(k,i)=[h(k) + c*i] mod m
Random probing makes use of a pseudo-random function to "jump around" in the hash table. The random probe sequence is
Where d(0), ..., d(m-1) is a permutation of 0, ..., m - 1. Here, d(i) is an integer, but it is generated uniquely by a recursive function for each i. The recursive definition of d is
h(k,i) = (h[k] + d(i)) mod m
where a is a carefully picked integer. The way a is chosen is to insure each number [0,...,m-1] eventually appears as d(i). It is known that such a recursively defined sequence is a permutation of 0, 1, ..., m - 1 if and only if a - 1 is a multiple of every prime divisor of m, where 4 is considered as prime.
d(0) = 0
d(i+1) = [a*d(i) + 1] mod m
Example 1: m = 10, a = 3. We get the sequence for d = 0 1 4 3 0 1 4 3 0 1 4 3 0... This probe sequence is not a permutation. The prime divisors of 10 are 2 and 5, so (a-1) must be a multiple of 2*5 = 10. If we choose a = 11, then, we get the sequence 0 1 2 3 4 5 6 7 8 9 0 1 2 ... so a = 11 is a reasonable choice.
Example 2: m = 100. Note that m has prime divisors 2, 4 and 5. To have a permutation, we must pick a - 1 as a multiple of 4 x 5 = 20. Thus, a could be 1, 21, 41, 61, 81, 101, and so fourth.
Note: The name "random probing" is a bit of a misnomer, since the sequence is a deterministic one. But to an untrained weye, the sequence appears as random, so the name stuck.
Double hashing makes use of two functions to probe the table. This is usually the best method, since it has many of the characteristics of a random permutation. We express double hashing as
where h2 is an auxiliary hash function. Here we must make sure that h2 and m are relatively prime (ie. gcd = 1) or the probe will only examine part of the table (for the same reasons as random probing, above).
Implementation: an algorithm for searching a hash table (from Cormen, Leiserson and Rivest's Introduction to Algorithms).
HASH-SEARCH(T,k) i <- 0 repeat j <- h(k,i) if T[j] = k then return j i <- i + 1 until T[j] = NIL or i = m return NIL
We assume that:
Unsuccessful search: The time taken for an unsuccessful search is 1 + the number of occupied slots probed until we find an empty slot.
Probability(>= 1 occupied slot) = n/m = a
Probability(>= 2 occupied slots) = (n/m) * ([n-1]/[m-1]) <= a^2 Then, Expected Time =
1 + P(exactly 1 slot) + 2*P(exact. 2) + 3*P(exact. 3) + ...which can be rewritten as:
So we have the sum a + a^2 + a^3 + ..., which we know is <= 1/(1-a)! Just as with chaining, unsucessful search in open addressing takes O(1) expected time under the same computational model (assumption 2).
= 1 + P(1) + P(2) + P(3) + ... = P(>=1 slot) = a + P(2) + P(3) + ... = P(>=2 slots)= a^2 + P(3) + ... = P(>=3 slots) = a^3
1 / (1 - a) > 1 + aopen addressing is slightly less efficient than hashing with chaining. For small a, the difference is negligible. Open addressing is more efficient storage-wise.
The linear hash function is: h(k)=(h[k]+i)mod 10
The random hash function is: h(k)=h[k]+Di, where Di=(7Di+1)mod10