McGill University:
School of Computer Science
Winter 1997 Class Notes for 308-251B

DATA STRUCTURES AND ALGORITHMS

Topic #16: HEAPS

Last modified: March 14, 1997


Table of Contents


Introduction

Heaps are implicit data structures for the A.D.T. Priority Queue and allow for fast insertions of new elements and fast deletions of the minimum element. In this respect, a heap is not an A.D.T. per se, but an implementation of the priority queue.

The structure of a heap is that of a complete binary tree in which satisfies the heap ordering principle. The ordering principle can be one of two types:

1) The HEAP property:
The value of each node is greater than or equal to the value of its parent, with the minimum-value element at the root. (Heap ordering principle given in lecture and in Weiss, 1993)
2) The (max)HEAP property:
The value of each node is less than or equal to the value of its parent, with the maximum-value element at the root. This is the heap ordering principle in Cormen et al., 1990. By calling it a (max)HEAP, Weiss, 1993 distinguishes it from the normal HEAP property.

The difference between the HEAP property and the (max)HEAP property does not translate into differences in the overall characteristics of heaps, but merely defines the ordering principle. The ordering principle of the heap must be kept in mind when writing the code for heap operations. A heap only uses one of the ordering principles and because it affects the coding of the heap operations, one of the two ordering principles must be selected in advance, as is appropriate for the task.

For example, if low-value elements (say, job IDs) have a higher priority than high-value elements (job IDs), then the HEAP property is used. On the other hand, if high-values reflect greater importance/priority, then the (max)HEAP property is used.

We will be consistent with the lectures and use the HEAP property in this discussion.

Heaps as Complete Binary Trees:

Because heaps are complete binary trees, each level of the tree is filled completely, except for the bottom level, which may be filled completely or filled partially from the left. Thus, while a Binary Search Tree is ordered left to right and vertically, heaps are only ordered vertically:

Heap depicted as a complete binary tree


If the heap is a full binary tree, about 50% of the elements are in the bottom-most level. If it is not full, it is still complete and about 50% of the elements are leaves and approximately 75% of the elements are in the bottom 2 levels of the heap. Another way of looking at such a structure is:

Heap represented as a pyramid diagram


Return to Table of Contents


Typical Implementation

Because heaps are implicit data structures, they can be implemented using an array. The position of an element in the array relates it to other elements of the HEAP:

Given an element at position i:
Parent of ii/2
Left child of i2i
Right child of i2i+1

These relationships can be verified in the following example:

Heap depicted as a complete binary tree along with the corresponding array


Return to Table of Contents


Procedures

Procedures on Heaps
Atomic Heap Operations PARENT(i) Given index i, returns index of the parent.
LEFT(i) Given index i, returns index of the left child.
RIGHT(i) Given index i, returns index of the right child.
Basic Heap Operations HEAPIFY(i, A) Given index i of HEAP A, make element at i and all its descendants a heap. Used in several other procedures below.
INSERT(x, A) Given value x and HEAP A, put x in the heap. This procedure is necessary for adding new elements to a priority queue. Also, it could be used to build a heap; however, a better method is described below.
DELETE_MIN(A) Given HEAP A, delete the element with the smallest value. Return the value of the deleted element. This procedure is necessary for implementing a priority queue using a heap.
Extra Heap Operations BUILD_HEAP(A) Given any array A, turn it into a heap.
HEAP_SORT(A) Given HEAP A, create a fully sorted/ordered list.
Bad Heap Operations SEARCH(k, A) Given value k and HEAP A, find k in the heap, if present.
DELETE(k, A) Given value k and HEAP A, find and then delete k from the heap.

Note that HEAP_SIZE(A) is a variable containing the array index of the last element of the heap. Elements in the array that are beyond this index are not part of the heap. Thus the last element of the heap is referenced by: A[HEAP_SIZE(A)]).

Return to Table of Contents

Atomic Heap Operations

These operations allow traversal of heaps and are used in many of the other heap operations listed below. They typically are one or two instruction computations.
PARENT(i)
Given index i, returns index of the parent of i.

PARENT(i)
	return i/2


LEFT(i)
Given index i, returns index of the left child of i.

LEFT(i)
	if 2i > HEAP_SIZE(A) 
	   then return i (i is a leaf)
	   else return 2i


RIGHT(i)
Given index i, returns index of the right child of i.

RIGHT(i)
	if 2i+1 > HEAP_SIZE(A)
	   then return i (i is a leaf)
	   else return 2i+1



Return to Table of Contents

Basic Heap Operations

These operations allow the heap data structure to implement a basic priority queue.
They typically run in O(log2n) worst-case time.
HEAPIFY(i, A)
Given index i in HEAP A, makes the subtree rooted at i into a heap. Other procedures that modify the heap may disrupt the heap structure by doing so. HEAPIFY(i, A) is used by these functions to restore the heap structure. HEAPIFY(i, A) does this by comparing the value at i with the children of i and swapping positions with the smaller child. If the value at i is the smallest of the 3, then the procedure stops. This is sometimes called "percolating down". It is assumed that the subtrees rooted at the left and right children of i are already heaps. When coding this procedure, one pays special attention to whether there are an odd or even number of heap elements and maintains a pointer to the last element of the heap so that it can fill any violation of the complete binary tree structure after HEAPIFY(i, A) has run.

The "percolating down" effect of HEAPIFY(i, A) turns the subtree rooted at index i into a heap:

The effect of HEAPIFY is to turn the subtree rooted at i into a heap.


HEAPIFY(i, A)
   while TRUE
         do l <- index of min(A[i], A[LEFT(i)], A[RIGHT(i)])
            if l = i 
               then EXIT
               else exchange A[i] <-> A[l]
	               i <- l


Worse Case # of Comparisons:
It takes 2 comparisons to move down 1 level, i.e. 1 iteration of the loop, comparing i to its left child and then the smaller of those to its right child. The maximum number times the procedure will make these comparisons is the height of the heap, log2n. Therefore, the cost of HEAPIFY(i, A) is approximately 2log2n where n is the number of elements of the heap rooted at i. Thus, the worst case number of comparisons is O(log2n).
INSERT(k, A)
Given a new element with value k, this adds the element to HEAP A in the appropriate location. New elements are initially appended to the end of the heap at position A[HEAP_SIZE(A)+1] and the size of the heap grows by one. Any disruption of the heap structure is repaired by comparing the added element with its parent and moving the added element up a level (swapping positions with the parent) if its value is smaller than that of the parent. This process is sometimes called "bubbling up", "floating up", or "percolating up" (must be important with all these sexy names...). The comparison is repeated until the parent is larger or equal to than the element "percolating up", or stops when the root of the heap is reached.

INSERT(i, A)
	HEAP_SIZE(A) <- HEAP_SIZE(A) + 1
	i <- HEAP_SIZE(A)
	while i > 1 and A[PARENT(i)] > k
	      do A[i] <- A[PARENT(i)]
	         i <- PARENT(i)
	A[i] <- k


Worse Case # of Comparisons:
Since each iteration of the while loop has one comparison and the added element may have to be swapped upwards a maximum of log2n (height) times, the worst case time analysis is O(log2n) where n is the number of elements in the heap.

However, since approximately 50% of the elements in the heap are leaves and 75% of elements are in the bottom two levels, it is likely that new elements will not have to move more than 1 or 2 levels upwards to maintain the heap (the heap is vertically ordered). Therefore, log2n is pessimistic. The average time analysis for INSERT(k, A) is about O(1).

DELETE_MIN(i, A)
Given index HEAP A, this removes the element with the smallest value from the heap and returns that smallest value. The minimum element can be found at the root of the heap, which is the first element of the array. The last element of the heap is moved to the root. Any disruption of the heap structure is repaired.

DELETE_MIN(i, A)
	min <- A[1]
	A[1] <- A[HEAP_SIZE(A)]
	HEAP_SIZE(A) <- HEAP_SIZE(A)- 1
	HEAPIFY(1, A)
	return min


Worse Case # of Comparisons:
Since the assignments are all of O(1), HEAPIFY(1, A) is the main determinant of running time for this procedure. Therefore, the cost of DELETE_MIN(A) is: O(log2n) where n is the number of elements in the heap.

Return to Table of Contents

Extra Heap Operations

These operations provide additional functionality for heaps, but aren't strictly necessary for the heap to implement a priority queue. Note that BUILD_HEAP(A) is better than using INSERT(i, A) repeatedly to build a heap.
BUILD_HEAP(A)
Given an array A, this turns it into a heap. This function has a better running time than INSERT(k, A) and constructs the heap in situ. BUILD_HEAP(A) uses HEAPIFY(i, A) to construct a heap from the array.

BUILD_HEAP(i, A)
	for i <- n/4x2 downto 1
	    do HEAPIFY(i, A)

The n above should be the size of the heap if it were a full binary tree with the same number of levels

Worse Case # of Comparisons:
The elements in the last level of the heap are already single-element heaps and therefore BUILD_HEAP(i, A) starts at the last element of the second last level and works its way back one element at a time until it reaches the first element of the array. HEAPIFY(i, A) uses two comparisons for each call. Therefore, at the second last level, which contains elements, the cost will be: n/4x2. For the third last level, which contains n/8levels, the cost will be n/8x4, since each level is two comparisons. Summing the cost at each level gives:

sum

and the number of terms in the sum is the height, lgn. The equation can be re-written:

sum

The summation term can be written as a sum of convergent geometric series with known sums:

sum of sums

Therefore, the cost of BUILD_HEAP(A) is:

sum = 2n = O(n)

This is much better than using INSERT(k, A) n times, which would be O(n log2n).

HEAP_SORT(A)
Given HEAP A, this procedure totally orders the array storing the heap. It has the advantage of sorting in situ (in place) and requires only a constant amount of storage outside the array.

Before giving the sorting algorithm a few comments about the degree of order of a heap are warranted. Unlike a Binary Search Tree (BST), which only requires an inorder traversal to give a total ordering of the tree elements, heaps require more work to sort. A heap is closer to a random ordering than a BST. If the sorting of stored values is an important and often used procedure in your application, it is recommended that you use a more appropriate structure, like a Binary Search Tree which sorts in linear time. A comparison between the degree of order of a BST and a heap is depicted below:

Heaps are closer to chaos than to order when compared with BSTs


Note that the distance between 'chaos' and a heap is lt2, which arises from the BUILD_HEAP(A) algorithm.

The heap sorting algorithm removes the minimum element from the heap and adds it to the sorted list (i.e. exchanges it with the last element of the heap and decrements HEAP_SIZE). Then it restores the heap structure of the heap from which the minimum element was removed, which results in the placement of the next smallest at the root of the heap. It repeats this, sequentially adding increasing values to the sorted list. The change in HEAP_SIZE(A) during each iteration protects the sort list part of the array from the heap part.

HEAP_SORT sorts in place


HEAP_SORT(A)
	for i <- HEAP_SIZE(A) downto 2
	    do exchange A[i] <-> A[1]
	       HEAP_SIZE(A) <- HEAP_SIZE(A) - 1
	       HEAPIFY(1, A)


Worse Case # of Comparisons:
HEAPIFY(1, A) is the main determinant of running time for this procedure. It is called for each iteration of the loop, and acts on a heap of decreasing size in each iteration. In other words, each iteration adds log2n time. Keeping in mind that n decreases with each iteration, the sum

log2(n-1) + log2(n-2) + log2(n-3)...

describes the contribution of HEAPIFY(1, A) to HEAP_SORT(A). There are n-2 terms in this summation. This equation is strictly less than 2n log2n (recall that tournament sort required n log2n comparisons). Therefore the running time of HEAP_SORT(A) is O(n log2n).

HEAP_SORT(A) thus has a running time similar to that of merge sort, but unlike merge sort, it sorts in situ.

Note that if the array to be sorted by this algorithm is not already a heap, it needs to be passed to BUILD_HEAP(A) to first form the heap in linear times and then passed to HEAP_SORT(A), increasing the running time by 2n (as explanined above).


Return to Table of Contents

Bad Heap Operations

These are functions for which heaps are not well suited and for which other data structures should be used.

SEARCH(k, A) looks for value k in the HEAP A. Heaps don't use any left-to-right ordering, so when looking for a value k in a heap, one will know whether the value lies above or below a specific. If it lies below, the heap ordering principle does not tell us whther the value is accessed via the right or left child of the node. Therefore, both branches of the node must be checked. This means we will need to check every value stored in the heap for a match when searching for a specific k. Therefore, the time to search a heap is Theta(n).

DELETE(k, A) must search for k in the heap and then delete it, using HEAPIFY(i, A) to restore the heap structure. Thus, the running time of DELETE(k, A) is Theta(n) + O(log2n) = Theta(n).

There are other data structures, such as the Binary Search Tree, that provide much better search times.

Return to Table of Contents


Dynamically-Allocated Heaps

One problem with using an array to implement a heap is that the memory for the array must be allocated before the HEAP_SIZE(A) is known. This means that the size of the heap will be limited by the allocated space for the array . If the programmer wishes to have a heap of "infinite" size, the heap should be implemented using pointers.

Each node of a Dynamically-Allocated Heap contains:

In addition, the nodes could contain a pointer to its parent, with the root pointing to itself. And finally, the HEAP_SIZE(A) variable is maintained.

However, the problem of quickly accessing specific nodes arises, especially when considering the storage cost of extra pointers to facilitate Heap traversal. For the array implementation, if we know the index, we can navigate around the tree implicitly. For the Dynamically-Allocated Heap, one could store a pointer back to the parent at each node, as suggested, and use that when "percolating up" during an insert procedure. However, there is another method that allows access to any node given its index:

The Advantages of Binary-Type Indices

Recall that we always know the index of the last element of a heap...it is at index HEAP_SIZE(A). We will now describe a method that uses the size of the heap to quickly locate where a INSERTed element should initially be places.

New elements are INSERTed in the bottom level, at the left-most free position. Naively, in the worst-case we would have to examine every branch of the tree to search for the left-most pointer to NULL. This does not allow for quick INSERTs.

However, imagine that node indices are in binary, even though these indices are not stored anywhere in the Dynamically-Allocated Heap. If they were, though, the heap might be depicted like this:

Heap depicted with binary indices

Note the binary format of the indices.

Now remove all the leading 1-bits from every node-index (you noticed that they all had a leading 1-bit, right?). The resulting diagrammatic representation of this would be:

Heap depicted with binary indices stripped of the leading 1

This should be a familiar-looking structure. It is a TRIE, where going to the left child is indicated by a '0' and going to the right child by a '1'. Therefore, each index is a map from the root to that node.

For example, to get to node-index 5
1) we convert 5 into binary: 101
2) we remove the leading 1: 01
3) we use this sequence as a map:
  1. Start at root.
  2. '0' means go to left child
  3. '1' means then go to right child
  4. Arrived at node-index 5!

As mentioned above, these indices are not stored in the Dynamically-allocated heap. We only know (and can modify when appropriate) the size of the heap. When we need to add on an element, we convert it to binary, drop the leading 1, and follow the map to where it should be INSERTed initially!

In the heap depicted above, a new element would be inserted at node eleven = 1011. To get there we go 0,1,1 or left, right, right. If we need to know how to get to the parent of this node (to "percolate up", for example), we know that the parent is at 11/2 = 5 = 101= map 01, and so we go left, right to access the parent of this node.

Alternatively, we could just drop the leading 1 and the trailing value (be it 0 or 1) off the binary-index to get the map to the parent.

The Bottom Line

The upshot of all this is that we can navigate the heap by bit-shifting. This is very fast and doesn't involve checking to see if one should round up or round down when finding the index of the parent.

In terms of comparisons, the running time of INSERT(X, A) is O(log2log2n).

Return to Table of Contents


Java Heap Applet

To help you better appreciate the heap data structure and its associated operations, we have included an interactive Java Applet that implements a heap and the most prominent heap operations.

You will be able to see the following heap operations in action:

How to Get the Most Out of the Applet:

The are two ways of proceeding:
  1. Generate the heap from a random array:
    1. Click 'Generate Numbers' to get a random array of 12 integers.
    2. You may add up to 3 more integers by entering a value in the box following the 'Insert' button and pressing 'Insert'
    3. You can also simulate the actions of BUILD_HEAP(A):
      • Start at the 2nd-from-the-bottom level and highlight any node by clicking on it.
      • Click 'Heapify'. This checks the 2 immediate children and swaps the one with the smallest value into the node. If the value at the node is smallest, nothing changes. 'Heapify' each node with at least one child, from the bottom up to the root. This process is equivalent to operation "BUILD_HEAP(A)".
    4. Alternatively you can click on 'Build Heap' and turn the whole array into a heap at once.
  2. You can also create a heap by using the 'Insert' button and box to place integers into the heap one a time. 'Heapify' and 'Build Heap' will be of little use.
At any time you can click 'DeleteMin' to remove and return the minimum value of the heap, i.e., the root. The heap structure is maintained.

To see the results of a HEAP_SORT(A), use the 'DeleteMin' repeatedly to empty the entire heap. Notice that the 'output' is in order.

The 'Reset' clears the heap.

The heap will hold a maximum of 15 values.

We have made available the java code for this Applet. If you examine the code, you will see that the heap is implemented via an array. To see this, look at the "class heaptree" section of the code. Within that section are Java-versions of many of the heap procedures described above.

Return to Table of Contents


Links to Related Material & References


Links

Another explanation of heaps (Page authored by Ian Craw)
Want a second opinion? This page also explains the heap structure and the main heap operations. It's mostly text and less organized than this one though. However, there are lots of links to explanations of related topics.

Heap Sort (Authored by Yin-So Chen)
Outlines the HEAP_SORT algorithm, including near-English pseudocode, and includes a Java Applet demonstrating HEAP_SORT.

Comparison of Sorting Algorithms (Authored by Bear Giles)
If you want a "hands on" comparison of Heap Sort to Quick Sort, Bubble Sort, Bi-directional Bubble Sort, Shell Sort, and Selection Sort, check out the Applet on this page. Click the sorting window once to activate that sorting algorithm and then quickly click another so that you can see which "wins". You can compare the use of the algorithms on different sized n.

A Heap Class for Java (Authored by Walter Korman)
Are you looking for a more in-depth implementation of heaps/priority queues for a Java applet or application? (What's wrong with ours??) You could check out Walter's Java Heap Class.

Heap Implementation for C (from Windbase Software Inc's Memory Structures Library Version 3.0 site)
If you are looking for some C code that handles heaps ("priority heaps" at this site), you can look at the one in this company's memory structure library. The code is provided, but may require some deciphering to use. They use an array to implement the priority heap.

Heap Implementation for LISP
Like LISP, do you? Well, if you want some ideas on coding heaps in LISP, this file should do the trick. It's by a fellow named G. D. Weber.

A Totally Different View on the HEAp
You have been warned.

Prolog Heap Operations from SICS (Based on DECsystem-10 Prolog User's Manual)
If you're into Prolog, then check out this page by the Swedish Institute of Computer Science.

Memory Management in C++ Using Heaps (Authored by Nathan Myers)
This article on memeory management was first published in C++ Report in 1993.

University of Ottawa's Heap Page
This is an excellent source for heap information. Highly recommended.

Some graphic representations of heaps (Authored by Donna Reese)
This page, located at Mississippi State University, has some helpful illustrations of heaps and related operations.


Return to Table of Contents

References

  1. Adams RA 1996 Calculus of Several Variables (3rd ed.). Addison-Wesley Publishers Ltd., New York. pp 37-38

  2. Cormen TH, Leiserson CE, Rivet RL 1990 Introduction to Algorithms. McGraw-Hill, New York. pp 140-152

  3. Weiss MA 1993 Data Structures and Algorithm Analysis in C. The Benjamin/Cummings Publishing Company, Inc. pp 179-189, 224-226

Return to Table of Contents


Web page creators

If you have any questions or comments about this web page, please contact one of the authors:

Andrew R. Jackson (ajackson@cs.mcgill.ca)
--wrote the HTML, did some of the graphics, found a bunch of links.
Kareem Mayan (lew@cs.mcgill.ca)
--found Links
Kathleen Botter (angel@cs.mcgill.ca)
--did most of the graphics
Kim Thye Chua (thye@cs.mcgill.ca)
--wrote the Java Applet

Return to Table of Contents


What is the 'BandWidth Crisis'? In Ecology, we would classify it as a "selection pressure". In Economics, we would classify it as a "market demand". From either point of view, it is a force that drives change, innovation and mutation. In conclusion, the 'BandWidth Crisis' is a highly useful state, and a good excuse for government- and industry-sponsored research. Please use more BandWidth to accentuate the 'crisis'. (ARJ)

Last updated March 14, 1997 by Andrew R. Jackson.

Copyright © 1997, Andrew R. Jackson, Kareem Mayan, Kathleen Botter, & Kim Thye Chua. All rights reserved.

Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy. Disclaimer: this collection of notes is experimental, and does not serve as a substitute for attendance in the actual class.

Return to Table of Contents