McGill University:
School of Computer Science
Winter 1997 Class Notes for 308-251

DATA STRUCTURES AND ALGORITHMS

Topic #9: BINARY SEARCH TREES


Abstract Data Type Dictionary

An Abstract Data Type (ADT) is a mathematical model together with operations defined on this model. Abstract data types that can be modified by some of the ADT operations such as insert, delete, and makenull are called dynamic. For the ADT dictionary, the mathematical model used is a set of elements that can be ordered. Here are several operations on general sets that may be required by applications:

Note that the ADT dictionary is mainly based on the MAKENULL, INSERT, DELETE, and SEARCH operations. The other ones in the list are of secondary importance.

Below are some of the good implementations of Dictionary operations:


Binary Search Trees

A tree is defined as a nonempty finite set of labeled nodes such that there is only one node called the root of the tree, and the remaining nodes are partitioned into subtrees. If the tree is either empty or each of its nodes has not more than two subtrees, it is called a binary tree. Hence each node in a binary tree has either no children, one left child, one right child, or a left child and a right child, each child being the root of a binary tree called a subtree.

Every node (object) in a binary tree contains information divided into two parts. The first one is proper to the structure of the tree, that is, it contains a key field (the part of information used to order the elements), a parent field, a leftchild field, and a rightchild field. The second part is the object data itself. It can be endogenous (that is, data resides inside the tree) or exogenous (this means that nodes only contains a references to the object's data). The root node of the tree has its parent field set to nil. Whenever a node does not have a right child or a left child, then the corresponding field is set to nil.

A binary search tree is a binary tree with more constraints. If x is a node with key value key[x] and it is not the root of the tree, then the node can have a left child (denoted by left[x]), a right child (right[x]) and a parent (p[x]). If each node of a tree has the following Binary Search Tree properties:

  1. for all nodes y in left subtree of x, key[y] < key[x]
  2. for all nodes y in right subtree of x, key[y] > key[x]
Fig.1

then this binary tree is called a BINARY SEARCH TREE.

Here is an example:

Notice that if we draw the nodes at positions (x,y), where x is an increasing function of the key of the node, then the vertical projections of the nodes result in a sorted set of keys (see figure 2).


Properties of Binary Search Trees

Inorder Traversal is an algorithm which visits each node of a tree after its left subtree and before its right subtree. Therefore, inorder traversal shows the values stored in a binary search tree in order. The following is the procedure to print all the elements of a binary search tree in order:

The inorder procedure is called recursively twice for each element (once for the left child and once for its right child), and the element is visited right between them. Therefore, the construction time is equal to (n).

Given a set of unordered elements, the following method can be used to sort the elements: construct a binary search tree whose keys are those elements, and then perform an inorder traversal of this tree. If the data is already stored in binary search tree, only a traversal is needed in the construction. Binary tree traversal takes (n) time, and sorting an unordered set of n elements takes at least (n log n) time (and effort) in the worst case. If we loosely identify unordered sets with "chaos", and ordered sets with "order", then the distance (measured in units of effort) from chaos to order is about n log n. Yet, the distance from a binary search tree (BST) to order is only n. So, constructing a binary search tree is basically equivalent to sorting. Incidentally, a heap (to be studied later) can be constructed from an unordered set in time (n), and is thus much closer to chaos than to order. The following diagram shows the scale of effort needed to go from one data structure to another:

Fig. 3

Operations

Minimum and Maximum

The minimum element of a binary search tree is the last node of the left roof, and its maximum element is the last node of the right roof. Therefore, the minimum and the maximum can always be found by tracking on the left child and the right child, respectively, until an empty subtree is reached.

Fig. 4

Search

The following is the pseudocode to search for the element k in a binary search tree whose root is x:

If k is not an element of the binary search tree x, the function returns NIL; otherwise, it returns the pointer to the node with the key value equal to k. This algorithm follows the binary search tree property. If the value of k is smaller than that of x, then it should be in the left subtree of x; if the value of k is larger than that of x, then it should be in the right subtree of x; otherwise k is not an element of the binary search tree. The running time of the search operation is O(h), where h is the height of the tree. Therefore, if the tree is balanced and in its minimum height (i.e. if the tree is almost complete), then the worst-case run time, being directly proportional to the height of the tree, is log n. For example, for a binary search tree with n elements, with n = 1000, it needs about 10 comparisons for the search operation; with n = 1,000,000, it needs about 20 comparisons. However, if the binary search tree is unbalanced and is elongated, the run time of a search operation is longer.

Insert

Both insertion and deletion operations cause changes in the data structure of the dynamic set represented by a binary search tree. For a standard insertion operation, we only need to search for the proper position that the element should be put in and replace NIL by the element. If there are n elements in the binary search tree, then the tree contains (n+1) NIL pointers. Therefore, for the (n+1)th element to be inserted, there are (n+1) possible places available. In the insertion algorithm, we start by using two pointers: p and q, where q is always the parent of p.

Starting from the top of the tree, the two pointers are moved following the algorithm of searching the element to be inserted. Consequently, we end with p=NIL (assuming that the key is not already in the tree). The insertion operation is then finished by replacing the NIL key with the element to be inserted.

Fig.5

Delete

To delete a given element from a binary search tree, we use two pointers: q and p such that q is always the parent of p.

There are several cases in a binary search tree for the deletion operation:

  1. A node with no children:

    Fig. 6.1 Fig. 6.2

    We replace the value of the parent pointer with NIL.

  2. A node with one child. Here we have four cases for the four symmetric situations:

    In these cases, we replace p by its only child. The diagram below shows an example of the second case:

    Fig. 7.1 Fig. 7.2

    Here, p is the node to be deleted. After making the right child of p equal to the left child of q, the delete operation is done.

  3. A node with two children:

    To replace the node to be deleted (and preserve the binary search tree property), we must choose a node which is larger than any node in p's left subtree and smaller than any node in p's right subtree. There are only two candidates for minimizing the changes that the operation will cause. One of them is the element in p's left subtree with the largest key value; the other one is the element in p's right subtree with the smallest key value. Since both of them have at most one child, we can delete them more easily. Suppose the smallest element in the right subtree of p is the one chosen (note that always choosing that element will lead to an unbalanced tree, since the repeated use of the algorithm will move nodes "to the left"). The following is the procedure to delete an element with two children:

      Step 1: delete min[right[p]]

      Step 2: store key of min[right[p]] in p

    Fig. 8

    Successor and predecessor

    To find the successor of a node x in a binary search tree is the same as finding the smallest element greater than the key value of x. Similarly, to find the predecessor of a node x is the same as finding the largest element smaller than the key value of x. Consider the following procedure in finding the predecessor of the node x in a binary search tree:

    Following the binary search tree properties, if the left subtree of x is not empty, then the predecessor must be the largest element of the left subtree of x. Otherwise, we have to climb up the tree to find the predecessor. If y is a node in the same tree such that it is always the parent of x, the two pointers, x and y, can be moved upward one after the other starting from the original node x. If y is equal to NIL, then x is without a predecessor, otherwise, y itself is a predecessor of x. If the left subtree of x is empty and x is the left child of y, then y is the highest descent of x. Therefore, the procedure above returns y in any case. The algorithm for successor is symmetric to the predecessor operation.


    Treap

    Treaps were invented by several researchers. The name was coined by Aragon and Seidel. In the standard data structure of trees, the object of the node contains a key field and pointer fields. In treap, the node's object contains a key field and a time index (the time at which the node is inserted), and the pointer fields. Fig. 9 displays the treap structure of the BST shown in Fig. 2.

    Properties of treap

    1. Binary search tree property with respect to key: a node must be larger than its left child and smaller than its right child. Therefore, the keys are ordered from left to right.
    2. Heap property with respect to time index: the insertion time of a node is always smaller than the one of its children.

    Insert

    One can replace the time indices by independent random numbers. This will have the effect of randomly shuffling the nodes, which is always a good thing for binary search trees (see next lecture). So, it may happen that upon insertion of a node with a random time index, this node becomes the new root. In some cases (as in splay trees), the last node accessed is systematically made a root. This is the principle of a cache: place the most frequently used or accessed objects nearest to the user (in this case, the root).

    A general insertion of a node (k,t) (where k is the key and t is the time index) proceeds in two stages:

    1. Insert (k, ) (so, (k, ) is a leaf: this is our standard insert operation).

      Fig. 10
    2. Let the node be rotated upwards towards the root. If x is the key of the node to be inserted and t is its time index, the following is the algorithm of the insert operation:
      1. Insert (x, t)
      2. Standard insert of (x, ) into tree
      3. Element moves up:
        • p: pointer to the current node holding (x, t)
        • while (time index of parent of p > t)
          • do rotation

    A rotation is an operation that preserves the binary search tree property (ordering of keys). Figure 11 shows a right rotation: we need to interchange the node (x,t) and its parent (with time index > t) but we want the subtrees to stay "inorder". The point is that the downward projection should not change, so we keep the same left-to-right order for the subtrees, rearranging them as children of (x,t) and its ex-parent (which is now the right child of (x,t)).

    Fig. 11
  4. The total time of insert (in treap) is the time to insert (in a binary search tree) the node and push it up in the proper position. The worst-case time is thus equal to a constant times the path length, which is O(h) (where h is the height of the tree).

Delete

Recall that the ordinary delete operation leads to unbalanced trees. Here, the time index allows us to fix that: we remove the element, and then "rebuild" the tree as if the element had never existed. Consider a binary tree with p as the element to be deleted. Note that one of the advantages of using this operation is that the only part of the tree to be affected by the operation is the subtree of p (the shaded triangle), which consists of all the nodes inserted after p, falling between the two ropes of p.

Fig. 12

fig. 13

fig. 14

Now, consider the rotated representation of the tree (Fig. 13). That is, the representation wherein the right and left roof-nodes are given primary consideration. If the root of the tree p is to be deleted, then there is only one choice in choosing a new root, that is the element with the smallest time index (Fig. 14).

After replacing the original root with the node (5), the tree is then sorted according to the time indices. Note that the subtrees of each node remain unchanged. This is called the "zipper" algorithm: you zip the left and right roofs together (following the black arrows in figure 14).

fig. 15


Question

Write a simple rotation-based recursive algorithm for the deleting the root of a treap.

Java Applet

To use this applet, just enter the key value of the node in the little box and click on one of the three binary search trees operations. The newest manipulated node will be in the color blue with a blue track showing the routine of the operation from the root to the node. A brief explanation of the insert or delete operation will be displayed in the big box beside the tree diagram.


Your browser does not support Java, hence the BSTApplet is not currently available to you. Try again using a Java enabled browser.

The source code of the Binary Search Tree applet.

The source code of the Binary Search Tree class.


Links to Related Material

Lecture notes

Data structures course DSS T26 Prof. robert C. Holte and Dr Denys Duchier, University of Ottawa;
Data structures course CP2001 by Olivier de Vel and Curtis Dyreson, James Cook University of North Queensland; go see week 11 for basic introduction to binary search trees;
Data structure course CMSC 420 by David Mount, University of Maryland; see lectures 3, 4 and 5;
Analysis of Algorithms, course CSE 373/578 by Steven S. Skiena, State University of New York at Stony Brook; see lecture notes 8;
Algorithms and Data Structures, course CS2001 by Mike Atkinson, University of St. Andrews;
Analysis of Algorithms, course 570 by Prof. Doug Ierardi, University of Southern California (consult problem set 4)
Data Structures and Advanced Programming, course CS61B by Kathy Yelick, University of California at Berkeley.

Java applet

Rotations (and others) by Doug Ierardi and Ta-Wei Li;
Applet by Michael Bahl, University of Hartford;
Gamelan's applets
Splay tree applet by Darell Kindred

Code

Java Code by Chuck McManis;
Java Code by Chuck McManis;
Java Code by Chuck McManis;
Java Code by MegaSyte
Pascal Code from the course CS 122 by F.D.Lewis, University of Kentucky.


References

Most of the books about Data Structures and Algorithms contain a chapter or a section on binary search trees; however, you can consult Data Structures and their implementation by Robert J. Baron and Linda G. Shapiro, Van Nostrand Reinhold, 1980 for non-recursive algorithms (using stacks) and further references (over 50 articles on trees and graphs are listed.)


Web page creators

  • Ethan Holda - eholda@cs.mcgill.ca (graphics)
  • Eunice Wong - eunice@cs.mcgill.ca (HTML page)
  • Evelyne Robidoux - evelyne@cs.mcgill.ca (links and references)
  • Francois Rivest - frives@po-box.mcgill.ca (Java applet)

  • Copyright © 1997, by the Fab Four (Ethan Holda, Eunice Wong, Evelyne Robidoux, Francois Rivest). All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy. Disclaimer: this collection of notes is experimental, and does not serve as - is as a substitute for attendance in the actual class.

    This web page was last updated Feb 17, 1997 by Evelyne Robidoux.