Here is cryptic collection of answers for
the 1997 final of CS251B. Refer to the
final, which is posted in the main page
for the course.

Question 1.

Part A.

This part has nothing to do with caterpillars.
The adjacency list format for a graph
uses an array[1..n] of pointers to linked lists,
where each linked list consists of
something like 
     struct{ whatever key;
             list  *next}  list

Part B.

First of all, it is easy to verify in O(|V|)
time that 
  (1) all degrees are <= 3;
  (2) the graph is connected.
Most students checked then for acyclicity,
but that is not enough. Indeed, the complete
binary tree with 4 levels of nodes is acyclic,
but it is not a caterpillar!

There are two solutions:
 A. (Very clever, but not noted by the students):
    a connected graph is a caterpillar if and
    only if the number of degree-1 nodes is equal
    to the number of degree-3 nodes plus 2.
    Clearly, this can be checked in O(|V|) time.
 B. Define an appendix as a degree-1 node
    followed by any number of degree-2 nodes.
    By a careful DFS, all appendices may be colored red.
    Then check that the remaining uncolored
    nodes form a chain.

Question 2.

Part A.

Many possible solutions.
Here is one: call the following function as
postorder(root); it returns the postorder
number of the subtree at root;
set counter=0 where counter is external.

function postorder (root) (returns index);
 nrright := nrleft := 0;  (0 plays role of nil)
 if l[root] != nil then nrleft := postorder(l[root]);
 if r[root] != nil then nrright := postorder(r[root]);
 counter += 1;
 key[counter] := k[root];
 left[counter] := nrleft;
 right[counter] := nrright;
 return (counter);

Part B.

This is easy, as we already have a postorder
sequence. So we just go:

for i=1 to n do
  if (left[i]>0) or (right[i]>0)  (there is at least one child)
    then height[i] = max (height[left[i]], height[right[i]]) +1
    else height[i] := 0;

Part C.

Note that node n is the root. So call diameter(n) where
we have:

function diameter(i): returns diameter for subtree at i;
  if i has two children:
    D = max( diameter(left[i]), diameter(right[i]),
	               height[left[i]]+height[right[i]+2 )
  if i has no children: D = 0
  if i has only a left child:
    D = max( diameter(left[i]), height[left[i]]+1 )
  if i has only a right child:
    D = max( diameter(right[i]), height[right[i]]+1 )
  return D

Question 3.

I was looking for a table size m which had to
be a prime number in the rough range 5000...20000.

I also needed a primary hash function,
something like (rg + gb + br) mod m. There
were many possible answers, but I needed at least
a suggestion to be able to give you marks.

The secondary hash function for double hashing could
have been anything. For example, f mod m would have
been OK if f mod m was not zero.

Question 4.

Part A: direct addressing; in other words, an array of size N.
Part B: a balanced binary search tree.

The answers "heap" ad "linked list" are not good here,
because duplicate checking in both structures is very
costly.

Question 5.

I asked you to do one of two things. So, let's
answer (ii). Let us merge all lists pairwise (not
the smallest first!). So, after one round of merges,
we are left with d/2 lists. We repeat this for
about log_2 d rounds. Each round takes time n if we
count comparisons. The overall cost is thus about
n log_2 d.

Now, we can do (i). Indeed, the Huffman strategy
(merge the two smallest remaining lists) is optimal
with respect to the number of comparisons. So, it must
beat the one I explained in the previous paragraph!
Thus, the number of comparisons is bounded from above
by about n log_2 d. 

Note: the Huffman tree for merging the files does
not necessarily have height O(log d), so any argument based
on the height of this tree was wrong.

Question 6.

To convince you I have a median of 5 elements, I must
exhibit to you the median and the two smaller elements.
So, how many possible answers can I give you? Back-of-the-
envelope calculations show that I can give you 30 answers.
Now, if we use comparisons between elements, the
decision tree argument would give us a lower bound of
ceiling(log_2 30) = 5. I gave maximum marks for this
answer.

Remark:
In fact, a better lower bound is 6. The way to get this
is by performing one conmparison, and noting that 
each subtree of the decision tree has 18 possible
answers. So the lower bound is one plus ceiling(log_2 18) = 6. 

Question 7.

Let (a,b) and (c,d) be the crossing edges.
We argue by contradiction. We assume that the EMST
passes through (a,b) and (c,d). Note then that
since it is a spanning tree, then removing both
(a,b) and (c,d) leaves 3 disconnected trees.
One of these trees thus has two of our four nodes,
either (a,c), (a,d), (b,c) or (b,d) (but not (a,b) or (c,d)).
Assume that it is (a,c).
So, I will now reconnect the pieces by
adding (a,d) and (b,c). This makes a spanning tree 
(note that I have to be careful to avoid cycles).
But length(a,d) + lenght(b,c) < length(a,b) + length(c,d)
by geometrical considerations (sum of diagonals of a
quadrilateral is < sum of two opposite sides).
So, I am able to construct a better spanning tree.
Contradiction!

Question 8.

This was a midterm question.
The simplest counterexample is 0^n 1^n, n zeros
followed by n ones.
The responses "n zeros", "n ones", and
"consider a string that makes the suffix trie
a complete binary tree" are all wrong.

Question 9.

The worst-case time of 2^n occurs
when V is much larger than all elements, so that
the algorithm will have to try all possible (2^n)
subsets of n elements.

Question 10.

A. n-1  (n also accepted)
B. 1    (2 also accepted)
C. 2n-1  (2n also accepted)
D. 2    (3 also accepted)
E. (n-1)(n-2)/2  (n(n-1)/2 also accepted)

Question 11.

The Huffman tree was not unique. The correct
answer was 172 bits, but I gave partial credits
if I could see that you knew how to construct
the Huffman tree and how to derive a Huffman code.