McGill University:
School of Computer Science
Winter 1997 Class Notes for 308-251B

DATA STRUCTURES AND ALGORITHMS

Topic #4: RECURSION AND RECURRENCES

Last modified: February 7, 1997


Recursion and Recurrence

In this section, we will describe the two steps to recursion which include setting up the recursion and then solving the recurrence. In a program, a recursion is a method that is defined in terms of itself, in other words it is another way to perform a loop whereas a recurrence is a mathematical relationship which deals with summations which was dealt with in topic #3. We will present examples of recursion hopefully some of which you have not seen before!

First we define: Tn = worst case time (model of computation) for any input of size n. Here we intend to show and describe methods for analyzing recursions and computing bounds on Tn of the type Tn= O(?).


Examples of Recurrence

These are examples of recurrence equations and how to set up the recurrence and how to derive them.

1. Chip Testing: The worst-case time taken for the chip testing (see topic#3) algorithm follows the recurrence:

Also, T1=0. Recall that this time was obtained under a special computational model.

2. Binary Search: We are searching for a value in a sorted table of size n. We examine the middle value. Either this is the value we are looking for and we are done, or we must search for the value in either the left or the right half of the table. Thus we obtain a smaller problem of the same type as we had before (which we can solve recursively). The recurrence we obtain is

under the computational model 1 comparison = 1 time unit. Note that in the recurrence there are two contributing terms, one related to recursive calls ( Tn/2 ), and the other related to the other work done in one iteration ( 1 ). This is true of recurrences in general.

3. MergeSort: We wish to sort a set A of n elements. We use the principle of divide and conquer once again. By recursively breaking the set into half its original size, we arrive at a base case from which we are able to sort the set which takes time of O(n) with a computational model of 1 comparison = 1 time unit. However if the size of the set to sort has a size of 1, no sorting is needed and therefore T1 = 0

Algorithm
Mergesort(A) ( |A| = n )

    if |A|<=1, return A
    B <--Mergesort of first half of A
    C <--Mergesort of second half of A
    Merge (B,C) and return merged list BUC

Counting the time it takes to merge the set (i.e. n-1) and the time it takes to sort each half of the two parts recursively (i.e. Tn/2 ), the resulting equation can easily be deduced:

4. Convex Hull: A convex hull is the smallest convex polygon in which the set of points are contained within that polygon. It is similar to placing a rubber band around the set and letting go! The following algorithm to find a convex hull uses a divide and conquer method similar to mergesort (We use Ch as a shorthad for Convex hull)


A Convex Hull

In this case, the lists in mergesort are replaced with two sets of points kept in clock wise order starting from the north. Two convex hulls (kept in this manner) can be merged in linear time (O(n) time), where n is the number of points involved in the merge (see Cormen et al). Thus, the recurrence is exactly that of mergesort with Cn instead of n-1 for some constant C

5. Multiplication: In the RAM model (see topic #2), any operation on two numbers is said to require 1 time unit. It is more realistic, however, to assume that operations on large numbers take longer to perform than operations on small numbers. So to see how good a multiplication algorithm is, we can use the computational model: 1 bit operation requires 1 time unit.

Suppose that we are given two arrays of size 2n bits (i.e. a and b as shown in the figure below) containing numbers that we wish to multiply together.

we could multiply them naively (each bit with each bit ) giving us a time of O(n2). How can we do better than this? We can apply divide-and-conquer, and some arithmetic.

We divide each array into two equal parts (i.e. size n shown in the above figure ). We can write the product of the two numbers in terms of products of the smaller parts as follows:

a* b=(a1 * b1) * 22n + a2* b2 + (a1* b2 + a2* b1) * 2n

Each addition would take linear time and each multipliction would necessitate a smaller version of the previous problem where we would have four recursions with the computational model would be 1 bit operation = 1 time unit which would make the time proportional n. The following running time equation would result:

T2n= 4Tn + n

However, solving this recurrence (see Master Theorem below) will give a time of

Tn = O(n2)

which is no better than what we had before. However, by clever manipulation, we can rewrite the coefficient of 2n and come up with the following:

a1 * b2 + a2 * b1 = (a1 - a2)*(b2 - b1) + (a1 * b1 )+ ( a2 * b2)

With this formula, computing this coefficient requires just one multiplication since the last two terms are already computed. So our algorithm now uses the recurrence T2n = 3Tn+ n which has solution

(see master theorem below) and therefore gives us a much better performance since log23 is between 1 and 2.

Suppose now that we cut the integer of 3n bits into 3 parts of length n, then naive multiplication would give a recurrence T3n = 9Tn + n or something equivalent. Show by rearranging the terms as done above, so that the recurrence becomes T3n = 5Tn + n by replacing the 9 multiplications with only 5. The master theorem (to follow) would tell us the n that

Tn = O(nlog35)

which is better than Tn = O(nlog23). Can you find such a rearrangement? Also see if you can get Tn = O(n1+e) (where e is an arbitrary small positive number). This is extremely hard and beyond the scope of this course, but it is nevertheless nice to know that a multiplication can be done marginally slower (because of the e) than an addition.

6. Guessing 3 cards: Here is a game. A volunteer selects 3 cards from a deck. We can then show cards in k piles to the volunteer, who will point to all the piles containing one or more of the three cards. The computational model we will use is that it takes 1 time unit to show 1 pile of cards to the volunteer. How can we determine which three cards the volunteer picked? If we show the volunteer 2 piles, and they both have mystery cards, then we are no further ahead. But, if we show 4 piles, then at least one of them must not contain one of the cards, since the 3 cards can be found in at most 3 piles. All the cards in a pile not pointed to are no longer candidates for the 3 cards to guess, so they can be put aside. Thus, by showing 4 piles, we are sure to eliminate at least 1/4 of the remaining cards. So, an algorithm for determining the 3 mystery cards is to repeat showing 4 piles and eliminate the cards in piles not pointed to, reducing the size of the problem by at least 1/4 each time, until each pile contains only 1 card. The recurrence for this algorithm will be

The 3 in the equation is the greatest possible number of piles that could contain a mystery card, that is, the number of cards chosen. The n/4 signifies that once the cards have been divided into 4 piles, each pile contains n/4 cards. The 1 represents a constant amount of time to mark the piles that have a card in them.

Exercise: Design an algorithm to solve the problem in time O(log n).


Solving Recurrences

Many of the recurrences that we have encountered are of the same basic form. There are others, but this is the most important one. Solving recurrences of this type is facilitated by the use of the Master Theorem (below). This theorem applies to recurrences of the form

Tn= a * Tn/b + f(n)

where a>=1 is an integer and b>1 is a real number. To actually solve the recurrences, we must know what the value of the base case (T1 or T0 ). For computational simplicity, the precise equation is not important.

Master Theorem

This is a method for solving recurrences of the above form. f(n) competes with nlogba where the bigger between the two wins. There are three possibilities

Remarks about the master theorem:

If we were to reconstruct a recursion tree using the problem from the card guessing game, the formula Tn = 3 Tn/4 + n can be interpreted in the following manner. The coefficient "a" ( from the general equation ) is the number of cards to be chosen therefore it is the number of piles the deck is to be split with each recursion call.. The "b" would refer to how much the problem would be reduced by with each recursion call.

To get an idea of why the master theorem works, we can consider recursion trees.


Recursion Trees

A discussion on recursion trees will intuitively illustrate why the master theorem is valid. It will also give a second method that may be used when the master theorem fails to apply.

A recursion tree is a tree that depicts the entire recursion process.

To illustrate, we take as the prototype examples

Tn = aT3n/4 + n

with a=3, a=4 and a=5. With each choice of a, the recurrence has a totally different solution!

In each node, we write the size of the problem. Above each node, we write the work done (the f(n)) during an iteration on a problem of size written in that node.

Case a=3
Case a=4
Case a=5

At each level, we write the work associated with that level on the right. In each case, we obtain a sequence showing the totals for each level. They go like:

Case a=3: n, (3/4)n, (3/4)2n, ...
Case a=4: n, (4/4)n, (4/4)2n, ...
Case a=5: n, (5/4)n, (5/4)2n, ...

Notice that these sequences are decreasing, level, and increasing respectively. In the first case, most of the work is done in the first iteration, while in the last case, the bottleneck is at the bottom, when we exit the recursion by "bottoming out" (such as when we take care of cases like n=1 or n=0).

To find the total time, we must add up all the terms of the sequence. We see that the sum is a geometric series, and we can therefore calculate the total.

Case a=3

The infinite sum is 4n so Tn <= 4n. Thus, the time is of the same order of magnitude at the time at the top level.

Case a=4

This is the case in which the extra log n factor appears (the 3rd case of the master theorem).

Let L = # levels, then


Therefore the rest of the tree is the same as that for the first part i.e. first recursion call!!

Case a=5

5log 4 n = nlog 4 5

Each value of a corresponds to a different case of the Master Theorem. a=3 is case 2, a=4 is case 3 and a=5 is case 1.


Our Examples Revisited

Example

Time

f(n)

nlogba

Chip Testing

Tn = Tn/2 + 3n/2

3n/2

n0

3n/2

Binary Search

Tn = Tn/2 + 1

1

n0

log n

Merge Sort

Tn = 2Tn/2 + n

n

n1

n (log n)

Convex Hull

Tn = 2Tn/2 + Cn

Cn

n1

n (log n)

Multiplying Integers

Tn = 3Tn/2 + n

n

nlog23

nlog23

Guessing 3 Cards

Tn = 3Tn/4 + 1

1

nlog43

nlog43


Ulam's problem

A volunteer picks a number from a set of n elements. We must guess the number by presenting a subset of any size to the volunteer and then asking the question: "is x in the set?" to which we are replied either "yes" or "no". We can figure out what x is by asking log2n questions (ex. 20 questions for a set of about 1 million) if we use a naive halving strategy. Ulam's problem is to find x supposing that the volunteer is allowed to lie once.

First attempt: A naive solution would be to ask each question twice where upon finding the lie, we must ask the question an extra time. Using this method, at most 41 questions will identify x for a set of 1 million. However, it may be possible to solve the problem by asking fewer questions.

Second attempt: By sub-dividing the group into four equal parts (each of size n/4), we can eliminate 1/4 of the group by asking two questions in the following manner. We ask one question concerning 1/2 of the group (Ask: "is x in groups [1] and [2]?" Suppose the answer is "yes." That is a strike against [3] and [4]. ) and another question concerning 1/2 of that group along with 1/2 of the rest (Ask: "is x in groups [1] and [3]?" Suppose the answer is "no." That is a strike against [1] and [3]. ). In the diagram, X beside a group represents a "strike" against that group, indicating that the group was declared NOT to contain x.

[1] (X) [2]
[3] (XX) [4] (X)

This way, we make sure that we have been told x is not in 1 of the four groups twice (group [3] in the example). Since there can only be one lie, we know that x cannot be in that group. Thus we can now eliminate the group with two strikes against it! The recurrence for this method of solving the problem is Tn = T3n/4 + 2 => T (3/4)2n + 4, under the computational model: 1 question = 1 time unit. The solution is about (not exactly)

2 log4/3 n

This means that for a set of 1 million, about 96 questions will be required. But this is even worse than before. So what can we do?

Best Solution Pelc (UQAH, Hull) showed that with a set of 1 million, 25 questions suffice. It is known that if 2 lies are allowed, 29 questions will do, and with 3 lies, 33 questions. Here is how one should proceed with one lie: keep a list A of elements that have no strikes against them, and a list B of elements with just 1 strike. The elements with 2 strikes are junked. The sizes of the lists are as follows:

A B garbage
n 0 0 (at start)
n/2 n/2 0 (after 1 question)
n/4 2n/4 rest (after 2 questions)
n/8 3n/8 rest (after 3 questions)
n/16 4n/16 rest (after 4 questions)
.
.
.
.
.
.
.
.
.
.
.
.

This assumes that n is a power of 2. Also, each question involves one half of A and one half of B. Very quickly, A and B will converge to 0 or 1.

Problems:

  1. Set up a recurrence for the size of B, and show why this formula for the size of B is correct: after k questions, it is of size kn/2k.

  2. Work out the details on how to solve the problem with n=1,000,000 in 25 questions.
    (Hint: things get tricky when |A|=1.)


Java applet

This java applet explains how the Mergesort algorithm works. The "first" button starts the applet over. The arrows indidcate a "back" or "forward" action.

Sorting a List Using Mergesort


Links to related material

David Eppstein, Department of Information and Computer science
Edward S. Blurock, Johannes Kepler University, Austria
Li Zhong, Southeast University, China
Rhys Price Jones, Fritz Ruehr, Richard Salter, Alexei Barchenkov, Dan Hutchings, Michael Klingbei.
Copyright 1996, David Neto. Professor Al Borodin aided in the design of the applet.
Connie Peng


References


Web page creators

  • Brian Chan
  • bchan1@po-box.mcgill.ca
  • Brian Fenandes
  • bferna@po-box.mcgill.ca
  • Benjamin Rich
  • ben@cs.mcgill.ca

    Brian Chan wrote the document based on the class notes using Netcape Gold Editor
    Brian Fernandes provided the drawings and the some of the equations using Corel Draw, Adobe Photoshop and WordPerfect
    Ben Rich programmed the Java applet and made the drawings with Adobe Photoshop and helped with HTML.


    This page was a great learning experience and taught new ideas and ways of thinking! Last updated February 2, 1997 by Brian Chan.

    Copyright © 1997, Benjamin Rich, Brian Chan, Brian Fernandes. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy. Disclaimer: this collection of notes is experimental, and does not serve as-is as a substitute for attendance in the actual class.