McGill University:
Department of Computer Science
Winter 1997 Class Notes for 308-251B

DATA STRUCTURES AND ALGORITHMS

Topic #12: HASHING BY CHAINING


There are three main implementations of abstract data type dictionnary.
  1. Binary Search Trees (Pointer-Based)
  2. Tries and Suffix trees.
  3. Hashing

Implementation of Hashing

  1. Direct addressing
  2. Hashing with chaining
  3. Open addressing

Hashing with chaining

Hashing with chaining is an application of linked lists and gives an approach to collision resolution. In hashing with chaining, the hash table contains linked lists of elements or pointers to elements (Figure 12.1). The lists are referred to as chains, and the technique is called chaining. This is a common technique where a straightforward implementation is desired and maximum efficiency isn't required. Each linked list contains all the elements whose keys hash to the same index. Using chains minimizes search by dividing the set to be searched into lots of smaller pieces. There's nothing inherently wrong with linear search with small sequences; it's just that it gets slower as the sequences to be searched get longer. In this approach to resolving collisions, each sequence of elements whose keys hash to the same value will stay relatively short, so linear search is adequate.

This Hash Table is an array [0 .. m-1] of linked_list.
The table entries are called buckets or slots, and the linked lists are called chains.

x is placed in linked_list #h[x] (number of hash function).

Choice of Hash Function

Choice of h: h[x]

Choice of m: m approximates n (= 1 item/linked list) n = input size.

e.g.
  1. MOD FUNCTION.
  2. Choice of three digit for school phone number e.g.398-3738

    x is an interger value.
    H[x] = x mod m.

    Choising last three digit(738) is more appropriate than the first three digits (398) as they distribute the data more evenly.
    To do this take mod function:

  3. MIDDLE DIGITS OF AN INTEGER
  4. This often yields unpredictable (and thus good) distributions of the data.
    Assume that you wish to take the two digits three positions from the right of x.

    If x = 539 872 178
    then h[x] = 72

    This is obtained by
    H[x] = (x/1000) mod 100
    Where (/1000) drops three digits and (x/1000) mod 100 keeps two digits.

  5. ORDER - PRESERVING HASH FUNCTIO
  6. PERFECT HASH FUNCTION

Analysis

Here we analyse the number of probes in a table for doing "search". A probe is a check of a table entry. All other components of the algorithm are ignored, but as they are collectively bounded by the number of probes, it suffices to just consider probes.

n = input size, m= table size, and a= LOAD FACTOR = n /m

-WORST CASE : (Number of probes for search /insert)

-AVERAGE CASE : Assume h[key[x]] spreads input independently and randomly over m buckets.
  1. "UNSUCCESSFUL SEARCH"

  2. When the element searched is not in the table.

  3. SPACE

  4. Space is relative to input size.

  5. "SUCCESSFUL SEARCH"
  6. When the element searched figures in the table