McGill University:
School of Computer Science
Winter 1997 Class Notes for 308-251B

DATA STRUCTURES AND ALGORITHMS

TOPIC #10: Random binary search trees and Quicksort


RANDOM BINARY SEARCH TREES

All basic operations on a binary search tree run in O(h) time, where h is the height (or depth) of the tree. If you are unaware of this or are unfamiliar with binary search trees in general, it would be wiser for you to get acquainted with the concept of Binary search trees before proceeding any further.

A randomly built binary search tree on n distinct keys is a binary search tree that arises from inserting the keys in random order into an initially empty tree, where each of the n! permutations of the input keys is equally likely. Therefore, the construction of the actual binary search tree is done by the standard "insertion" method. The only difference is that the input is entered at random.

The depth of a particular node is defined to be the length of the path from the root to the node in question. Therefore, it is obvious from this definition, that there exists a relation between the depth of a node and time. Also, the height of a particular node is the maximal distance from that node to any of the leaves in its subtree. The height of a tree is the height of the root of the tree, and it is equal to the maximum depth of any node in the tree. This proves that there is a connection between height and depth. Now, if Dn is the depth of the nth inserted node, we will show that the expected value of Dn is not more than 2+2log(n) where log is the natural logarithm.

Let us start by constructing a binary search tree with the following, randomly chosen, list of integers:

3 2 14 15 12 5 6 11 9 10 8.

Using this input sequence, we want to determine which keys (integers) are on the path from the root to the last node. In this example, we arbitrarily chose the number 8 to be the last node. We will denote the last node (8) as x n.

First, observe that every node to the left of x n must be less than x n and every node to the right of x n must be greater than x n . This property must hold because it is the very definition of a binary search tree. Therefore, let's divide the random list into two sublists:

Although this may seem obvious, it is important to note that a subset of a random permutation is, in itself, a random prmutation. Table 1 (below) shows the results of the divisions.

RANDOM LIST 3 2 14 15 12 5 6 11 9 10 8
LEFT OF X n 3 2 - - - 5 6 - - - -
RIGHT OF X n - - 14 15 12 - - 11 9 10 -
Table 1. Separation of keys into two sublists about the key x n.

Now, let's go through the list one node at a time.

This process is continued until we get to the end of the list. Therefore, the above sequence of random numbers corresponds to the following binary search tree:

Nodes on the path that are to the left of x n make up the up-records to x n and those that are on the path to the right of x n make up the down-records to x n.

In this case, it is easy to find the depth of x n by just looking at the sublists in Table 1 (above). Analyze the left sublist first. Go through this list and circle all keys that are on x n's path. These keys are 3, 5, and 6. Do the same with the right sublist. These keys are 14, 12, 11, and 9. We now have the following:

If we count the number of circled keys, this will tell us the depth of our node (the node with the number 8). In this case, we have 7 circled keys. Therefore, the depth of our node, D 8, is 7.

Mathematically, the expected number of up-records in a random permutation is equal to (count probabilities for each one of the numbers):

Prob ( X1 is a record ) + Prob ( X2 is a record ) +. . . + Prob ( Xn is a record )

= 1 + 1/2 + 1/3 + 1/4 + . . . . . + 1/n

* Notice that this series is a harmonic series !

< 1 + log(n) ( This is the natural Log)

Therefore, the number of up-records in the set "LEFT OF Xn" is not more than 1+ log (n). Similarily, the number of down-records in the set "RIGHT OF Xn" is not more than 1+ log (n) also.

This implies that the expected value of Dn is less than 2 ( 1 + log(n) ) .

Thus, Expected Value of Dn < ( 2 + 2log(n) )

*The 1+log(n) bound (above) was obtained by a method of summation called "comparison with an integral". This method is described in the "Summations" section of Analysis of O, Theta, Omega and Summations.


QUICKSORT

Quicksort is one of the fastest known sorting algorithms in practice. Despite its' inconveniently slow worst-case running time of (n2), quicksort remains a popular choice among sorting algorithms because of its' "expected" (or average) running time of (n*lg(n)).

Quicksort is a divide-and-conquer recursive algorithm. The basic algorithm one must follow in order to sort a subarray X[i..j] consists of the following steps:

Therefore, assuming that our input values ( x1, x2, .. ..., xn ) are in random order, the following procedure specifies an implementation of quicksort:

Quicksort(X, i, j)
1.    if i < j  
2.       pivot = X[i] 
3.       k = Partition(X, i, j, pivot) 
4.       Quicksort(X, i, k-1) 
5.       Quicksort(X, k+1, j) 
                                                                                           
Notice: since the input is random, we chose the pivot in line #2 
                  to be the first element of the subarray.

Let's examine this algorithm step by step. First of all, Quicksort takes in an array X[] along with the integers i and j which correspond to the i th and jth elements in the array. Therefore, what we want to do is to sort the elements xi through to xj in array X[]. (See figure directly below)

Line #1 in the algorithm checks to see that i is less than j. If i is equal to j, this implies that we are sorting a sublist that contains only one element, but a sublist of one element is already sorted! Therefore, we want to execute quicksort only when i is less than j.

Line #2 assigns the element X[i] as the pivot point. It is safe to do this because, as was stated earlier, the input elements are listed randomly!

Line #3 partitions the elements of X[i..j] into two -- about the pivot element. All elements that are less than the pivot are stored in subarray X1 [i..k-1] (to the left of the pivot) and all elements greater than the pivot are stored in subarray X2[k+1..j] (to the right of the pivot) where k is defined to be the position of the pivot in array X[]. (See figure directly below)

Lines #4 and 5 call Quicksort in order to sort the two subarrays created in the Partition procedure of line #3. (See figure directly below)

Clearly, the split may be performed in j-1 comparisons between elements. Moreover, the algorithm does not require extra storage. By carefully swapping elements of the original subarray X[i..j], we can achieve the partition.

ANALYSIS OF QUICKSORT

The key to the quicksort algorithm is the Partition procedure (line #3) in which every element is compared to the pivot element once. We consider a computational model in which one comparison between two data values is equal to one time unit. Therefore, line #3 is the only line in the algorithm that actually "costs" and since everything is compared to the pivot once, the number of comparisons in Partition is (j-i). Notice, also, that line #2 is not considered a comparison.

Now, let's say we were to construct a binary search tree from the same input values as were given above, we will see that there is a relation between quicksort and random binary search trees. Observe:

Notice that this is a randomly built binary search tree and that the diagram is similar in form to that of Figure #4! Obviously, a relation between random binary search trees and quicksort exists.

Therefore, let us now look at an arbitrary example of a binary search tree which can be derived from the same input as that entered into a quicksort procedure:

Notice that the node labeled x was compared to every node on its' path from the root -- each time it enters another subtree there is a comparison! Therefore, the number of comparisons in quicksort is equivalent to

i=1 to n Di     
(where Di is the depth of the ith node in the binary search
tree that we constructed in our analysis)
and the expected number of comparisons is equal to
EXP(i=1 to n Di).
Now, from the random binary search tree results stated above, we get:

EXP(i=1 to n Di)   
<= i=1 to n (2 + 2*log i)
<= 2n*(1 + log n)
=~ 1.386944*n*log2n.

The complexity is only 38% worse than the best possible complexity under the same computational model -- that is, no one can hope to sort in expected time less than n*log2n.


K-D Trees
Quad Trees
a quad tree is a tree that has no nodes or has either all or one of a left subtree, left middle subtree, right middle subtree, right subtrees and a parent node.

If we are to take a random set of points in (x y). We can organize them in this fashion bye selecting a point as the origin . Then dividing it into a (x y) Cartesian coordinate system. where all the other points are in one of the four quadrants. Then we can take another point give it make it another origin and give it a Cartesian system. we can thus divide all the remaining points in this fashion making a quad tree.

We could use this type of tree in a database were one would need to search the data using different keys. eg their name and their Id number. This would be called a range sort were we are looking for a selection of data. We could pass this range down thought the nodes of the tree deciding which path to take until we come to the bottom of the tree.


K-D Binary Trees
A binary tree is either a null tree or consists of a left subtree ,a right subtree and a parent node . These trees can then either be null themselves or be composed of a left subtree and a right subtree. Thus we have a recursive definition of a binary tree.

In K dimensions. we can organize a tree in this fashion. If we were to take a point give it either a vertical line or a horizontal line. eg take the point (x,y) give it a vertical line. Then all the points are either on the left side of the vertical line or the right side. Then selection another point we can give it a horizontal line that divides. it's space into a left and right space. then if we were to select one last point on the left of this horizontal line we then give it a vertical line. We could continue in this fashion to devise a binary tree.


Binary Space Partition Tree
A binary Space Partition Tree or BSD for short. Is often used in image scanning. To determine the position of an image in a picture. A BSD tree is a binary tree composed the following way: If we were to take a selection of images and divide them in haft with a line ,and we continued this dividing we would eventually have each image in it's own section. We could then label the lines 1 to N. We then could compose a binary tree of the image taking each picture as a leaf of the tree, and each line as a node.

If a light sources was to strick the picture. we can then use the "painters alogrithm" to draw the picture. we could travel down through the tree deciding which part of the tree was hit buy the light last. Idicating that with a + sign. With this data we could then easily draw the image. Starting from the rear to the front.


Java Applet

This applet generates a list of up to ten randomly chosen numbers (the user can decide how many numbers the applet must generate). It illustrates the quicksort algorithm by constructing a corresponding random binary search tree. The user must click on the "next" button in order for the applet to illustrate the next step in the process. NOTICE: if your browser does not support java, you will not be able to participate in this interaction!


Links to related material

Histogram of RBST , This is a Histogram of a Random Binary Search Tree
Sorting Illustrations , Comparisons between quicksort and other sorting algorithms. Speed of quicksort becomes obvious after viewing this sight.
Quicksort Notes , Complete notes on quicksort (including code representation) by Indiana University students.
BSP Trees , Notes (including java applet) on Binary Space Partition Trees.

References

1. Cormen, Thomas H. / Leiserson, Charles E. / Rivest, Ronald L. Introduction to Algorithms. Cambridge, Massachusetts: The MIT Press / McGraw-Hill Book Company, 1996.

2. Weiss, Mark Allen. Data Structures and Algorithm Analysis in C. Redwood City, California: The Benjamin/Cummings Publishing Company, Inc., 1993.


Web page creators


Be gentle -- there were some hard acts to follow!

Last updated February 25, 1997 by Gaetano Zullo , Chris Foote , Gabriel Kevorkian .

Copyright © 1997, Gaetano Zullo, Gabriel Kevorkian, Chris Foote. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in in any copy. Disclaimer: this collection of notes is experimental and does not serve as-is as a substitute for attendance in the actual class.