Last modified: March 14, 1997
Heaps are implicit data structures for the A.D.T. Priority Queue and allow for fast insertions of new elements and fast deletions of the minimum element. In this respect, a heap is not an A.D.T. per se, but an implementation of the priority queue.
The structure of a heap is that of a complete binary tree in which satisfies the heap ordering principle. The ordering principle can be one of two types:
The difference between the HEAP property and the (max)HEAP property does not translate into differences in the overall characteristics of heaps, but merely defines the ordering principle. The ordering principle of the heap must be kept in mind when writing the code for heap operations. A heap only uses one of the ordering principles and because it affects the coding of the heap operations, one of the two ordering principles must be selected in advance, as is appropriate for the task.
For example, if low-value elements (say, job IDs) have a higher priority than high-value elements (job IDs), then the HEAP property is used. On the other hand, if high-values reflect greater importance/priority, then the (max)HEAP property is used.
We will be consistent with the lectures and use the HEAP property in this discussion.
Because heaps are complete binary trees, each level of the tree is filled completely, except for the bottom level, which may be filled completely or filled partially from the left. Thus, while a Binary Search Tree is ordered left to right and vertically, heaps are only ordered vertically:
If the heap is a full binary tree, about 50% of the elements are in the bottom-most level. If it is not full, it is still complete and about 50% of the elements are leaves and approximately 75% of the elements are in the bottom 2 levels of the heap. Another way of looking at such a structure is:
Because heaps are implicit data structures, they can be implemented using an array. The position of an element in the array relates it to other elements of the HEAP:
Given an element at position i: | ||
---|---|---|
Parent of i | ||
Left child of i | 2i | |
Right child of i | 2i+1 |
These relationships can be verified in the following example:
Procedures on Heaps | ||
---|---|---|
Atomic Heap Operations | PARENT(i) | Given index i, returns index of the parent. |
LEFT(i) | Given index i, returns index of the left child. | |
RIGHT(i) | Given index i, returns index of the right child. | |
Basic Heap Operations | HEAPIFY(i, A) | Given index i of HEAP A, make element at i and all its descendants a heap. Used in several other procedures below. |
INSERT(x, A) | Given value x and HEAP A, put x in the heap. This procedure is necessary for adding new elements to a priority queue. Also, it could be used to build a heap; however, a better method is described below. | |
DELETE_MIN(A) | Given HEAP A, delete the element with the smallest value. Return the value of the deleted element. This procedure is necessary for implementing a priority queue using a heap. | |
Extra Heap Operations | BUILD_HEAP(A) | Given any array A, turn it into a heap. |
HEAP_SORT(A) | Given HEAP A, create a fully sorted/ordered list. | |
Bad Heap Operations | SEARCH(k, A) | Given value k and HEAP A, find k in the heap, if present. |
DELETE(k, A) | Given value k and HEAP A, find and then delete k from the heap. |
Note that HEAP_SIZE(A) is a variable containing the array index of the last element of the heap. Elements in the array that are beyond this index are not part of the heap. Thus the last element of the heap is referenced by: A[HEAP_SIZE(A)]).
PARENT(i) return |
LEFT(i) if 2i > HEAP_SIZE(A) then return i (i is a leaf) else return 2i |
RIGHT(i) if 2i+1 > HEAP_SIZE(A) then return i (i is a leaf) else return 2i+1 |
The "percolating down" effect of HEAPIFY(i, A) turns the subtree rooted at index i into a heap:
HEAPIFY(i, A) while TRUE do l <- index of min(A[i], A[LEFT(i)], A[RIGHT(i)]) if l = i then EXIT else exchange A[i] <-> A[l] i <- l |
INSERT(i, A) HEAP_SIZE(A) <- HEAP_SIZE(A) + 1 i <- HEAP_SIZE(A) while i > 1 and A[PARENT(i)] > k do A[i] <- A[PARENT(i)] i <- PARENT(i) A[i] <- k |
However, since approximately 50% of the elements in the heap are leaves and 75% of elements are in the bottom two levels, it is likely that new elements will not have to move more than 1 or 2 levels upwards to maintain the heap (the heap is vertically ordered). Therefore, log_{2}n is pessimistic. The average time analysis for INSERT(k, A) is about O(1).
DELETE_MIN(i, A) min <- A[1] A[1] <- A[HEAP_SIZE(A)] HEAP_SIZE(A) <- HEAP_SIZE(A)- 1 HEAPIFY(1, A) return min |
BUILD_HEAP(i, A) for i <- downto 1 do HEAPIFY(i, A) |
The n above should be the size of the heap if it were a full binary tree with the same number of levels
This is much better than using INSERT(k, A) n times, which would be
Before giving the sorting algorithm a few comments about the degree of order of a heap are warranted. Unlike a Binary Search Tree (BST), which only requires an inorder traversal to give a total ordering of the tree elements, heaps require more work to sort. A heap is closer to a random ordering than a BST. If the sorting of stored values is an important and often used procedure in your application, it is recommended that you use a more appropriate structure, like a Binary Search Tree which sorts in linear time. A comparison between the degree of order of a BST and a heap is depicted below:
Note that the distance between 'chaos' and a heap is , which arises from the BUILD_HEAP(A) algorithm.
The heap sorting algorithm removes the minimum element from the heap and adds it to the sorted list (i.e. exchanges it with the last element of the heap and decrements HEAP_SIZE). Then it restores the heap structure of the heap from which the minimum element was removed, which results in the placement of the next smallest at the root of the heap. It repeats this, sequentially adding increasing values to the sorted list. The change in HEAP_SIZE(A) during each iteration protects the sort list part of the array from the heap part.
HEAP_SORT(A) for i <- HEAP_SIZE(A) downto 2 do exchange A[i] <-> A[1] HEAP_SIZE(A) <- HEAP_SIZE(A) - 1 HEAPIFY(1, A) |
describes the contribution of HEAPIFY(1, A) to HEAP_SORT(A). There are n-2 terms in this summation. This equation is strictly less than
HEAP_SORT(A) thus has a running time similar to that of merge sort, but unlike merge sort, it sorts in situ.
Note that if the array to be sorted by this algorithm is not already a heap, it needs to be passed to BUILD_HEAP(A) to first form the heap in linear times and then passed to HEAP_SORT(A), increasing the running time by 2n (as explanined above).
SEARCH(k, A) looks for value k in the HEAP A. Heaps don't use any left-to-right ordering, so when looking for a value k in a heap, one will know whether the value lies above or below a specific. If it lies below, the heap ordering principle does not tell us whther the value is accessed via the right or left child of the node. Therefore, both branches of the node must be checked. This means we will need to check every value stored in the heap for a match when searching for a specific k. Therefore, the time to search a heap is .
DELETE(k, A) must search for k in the heap and then delete it, using HEAPIFY(i, A) to restore the heap structure. Thus, the running time of DELETE(k, A) is
There are other data structures, such as the Binary Search Tree, that provide much better search times.
One problem with using an array to implement a heap is that the memory for the array must be allocated before the HEAP_SIZE(A) is known. This means that the size of the heap will be limited by the allocated space for the array . If the programmer wishes to have a heap of "infinite" size, the heap should be implemented using pointers.
Each node of a Dynamically-Allocated Heap contains:In addition, the nodes could contain a pointer to its parent, with the root pointing to itself. And finally, the HEAP_SIZE(A) variable is maintained.
However, the problem of quickly accessing specific nodes arises, especially when considering the storage cost of extra pointers to facilitate Heap traversal. For the array implementation, if we know the index, we can navigate around the tree implicitly. For the Dynamically-Allocated Heap, one could store a pointer back to the parent at each node, as suggested, and use that when "percolating up" during an insert procedure. However, there is another method that allows access to any node given its index:
Recall that we always know the index of the last element of a heap...it is at index HEAP_SIZE(A). We will now describe a method that uses the size of the heap to quickly locate where a INSERTed element should initially be places.
New elements are INSERTed in the bottom level, at the left-most free position. Naively, in the worst-case we would have to examine every branch of the tree to search for the left-most pointer to NULL. This does not allow for quick INSERTs.
However, imagine that node indices are in binary, even though these indices are not stored anywhere in the Dynamically-Allocated Heap. If they were, though, the heap might be depicted like this:
Note the binary format of the indices.
Now remove all the leading 1-bits from every node-index (you noticed that they all had a leading 1-bit, right?). The resulting diagrammatic representation of this would be:
This should be a familiar-looking structure. It is a TRIE, where going to the left child is indicated by a '0' and going to the right child by a '1'. Therefore, each index is a map from the root to that node.
As mentioned above, these indices are not stored in the Dynamically-allocated heap. We only know (and can modify when appropriate) the size of the heap. When we need to add on an element, we convert it to binary, drop the leading 1, and follow the map to where it should be INSERTed initially!
In the heap depicted above, a new element would be inserted at
Alternatively, we could just drop the leading 1 and the trailing value (be it 0 or 1) off the binary-index to get the map to the parent.
The Bottom LineThe upshot of all this is that we can navigate the heap by bit-shifting. This is very fast and doesn't involve checking to see if one should round up or round down when finding the index of the parent.
In terms of comparisons, the running time of INSERT(X, A) is O(log_{2}log_{2}n).To help you better appreciate the heap data structure and its associated operations, we have included an interactive Java Applet that implements a heap and the most prominent heap operations.
You will be able to see the following heap operations in action:To see the results of a HEAP_SORT(A), use the 'DeleteMin' repeatedly to empty the entire heap. Notice that the 'output' is in order.
The 'Reset' clears the heap.
The heap will hold a maximum of 15 values.
We have made available the java code for this Applet. If you examine the code, you will see that the heap is implemented via an array. To see this, look at the "class heaptree
" section of the code. Within that section are Java-versions of many of the heap procedures described above.
If you have any questions or comments about this web page, please contact one of the authors:
What is the 'BandWidth Crisis'? In Ecology, we would classify it as a "selection pressure". In Economics, we would classify it as a "market demand". From either point of view, it is a force that drives change, innovation and mutation. In conclusion, the 'BandWidth Crisis' is a highly useful state, and a good excuse for government- and industry-sponsored research. Please use more BandWidth to accentuate the 'crisis'. (ARJ)
Last updated March 14, 1997 by Andrew R. Jackson.
Copyright © 1997, Andrew R. Jackson, Kareem Mayan, Kathleen Botter, & Kim Thye Chua. All rights reserved.
Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy. Disclaimer: this collection of notes is experimental, and does not serve as a substitute for attendance in the actual class.