Computer Science 308-251B April 19, 2000 Winter 2000 -- Final examination
 Question 1 Searching. Given is a hash table T[0], ..., T[m-1] and a hash function h. Open addressing with linear probing is used for storing data. Give the algorithm for searching for a key k in this table.
 Answer We assume that linear probing with jump "1" is used. With jump "c", the code below is easily modified, but make sure that gcd(c,m)=1. ``` hashsearch(T,k) i=0 repeat j=h(k)+i mod m if T[j]=k then return j else i=i+1 until i=m or T[j]=nil return "not found" ```
 Question 2 Mathematical exactness. When I say that an algorithm A takes worst-case time Omega (n^2), what do I mean?
 Answer It means that there exist finite constants c > 0 and N such that for all n > N, we can find an input (x_1,...,x_n) (possibly dependent upon n) such that the time taken by the algorithm on that input is at least c n^2.
 Question 3 Binary search trees. Give an algorithm that takes a binary search tree rooted at t (not a red-black tree, only a tree with left[.], right[.] and key[.] attributes), and leaves a tree rooted at s consisting of all the nodes in the original tree with key value less than or equal to k, where k is a parameter. The algorithm should be recursive and take time O(h), where h is the height of the tree. Note: be very careful with what is passed and returned in the recursive program.
 Answer As with most of these questions, we need only look at the root node and apply a careful recursion step. ``` smalltree(x,k) [ returns root of the subtree of x consisting of all nodes with key at most k AND... it fixes that subtree in the process.] if x=nil then return nil else if key[k] > k then return smalltree(left[x],k) else right[x]=smalltree(right[x],k) return x ``` Note that in each recursive step we go down one level. The number of levels does not exceed h, and thus, the worst-case time is O(h).
 Question 4 Tree algorithms. Given is a binary tree implemented in the standard manner with left[.], right[.] and p[.] pointers. In addition, each node has a key value key[.] that is a positive integer, 0 included. We define the priority of a node as its own key value plus the sum of all keys of its ancestors. We would like to traverse the tree in order of priorities, from smallest to largest. Ties can be broken in any way. One might call this priority traversal (as opposed to preorder or other traversals). Give an efficient worst-case algorithm.
 Answer We must keep the possible nodes to be visited in a priority queue Q, which we will implement as a binary heap. The operations are makenull, deletemin and insert. We pump Q until there is nothing left to visit. Initially, the root is placed on Q, which is organized by keeping the minimum priority p on top. Thus, we will keep pairs (x,p) together, where x is a tree node and p is the priority of that node. ``` makenull(Q) insert( (root, key[root]), Q) while not empty (Q) do: (x,p) < - deletemin (Q) visit (x) if left[x] != nil then insert((left[x], p+key[left[x]]),Q) if right[x] != nil then insert((right[x], p+key[right[x]]),Q) ``` The algorithm is efficient as it visits all nodes once, and performs n inserts and n deletemins, where n is the size of the tree. The complexity is thus O(n log n).
 Question 5 Huffman trees. Let an n-symbol Huffman tree be constructed based on observed symbol probabilities p_1,...,p_n. Let L_i be the length of the codeword for symbol i. Prove that the sum of p_i L_i is less than or equal to 1 + floor( log_2 n). Thus, the average codeword length grows at worst as O(log n).
 Answer The Huffman tree has the property that it minimizes the sum of p_i L_i (where i ranges over all leaves) over all possible trees. (Note that distance from leaf i to root is L_i.) Thus, any other tree has a worse sum. Let us take the complete binary tree on n leaves, and note that is has at most 2n nodes, and thus that its height h is at most floor( log_2 (2n)) = 1 + floor( log_2 n). But then, the sum of p_i L_i is at most h times the sum of the p_i, which is h, as the sum of the p-i's is one.
 Question 6 Graphs. The vertices of a graph have strange labels that can be submitted to comparisons only (that is, we have a black box function that can tell us if two labels are equal or if one is less than the other in some ordering scheme). No other operations are allowed or possible on the labels. Labels below are denoted by u and v. In class, we dealt mostly with graphs that have labels from 1 to n. This is no longer the case. The ADT dynamic graph requires the following operations on a pointer-based model: insertedge (u,v): inserts edge from u to v; deleteedge (u,v): deletes edge from u to v; searchedge (u,v): searches for edge (u,v); insert (u): creates vertex u; delete (u): deletes vertex u; search (u): searches for vertex u. If there are never more than n vertices, show how to implement this so that each operation takes O(log n) worst-case time. A drawing of your data structure, a careful definition of all fields of all cells, and a concise explanation (without algorithms) of all operations are required. Hint: the adjacency list data structure would be fine for this, except that the worst-case time is unsatisfactory.
 Answer One possible solution involves a red-black tree of adjacency lists. To be more precise, we place the node labels in a red-black tree, where each node has the following fields: ``` key[x] : label (these are the "u"'s in the question) left[x], right[x], p[x], color[x] (as usual) list[x] : points to the adjacency list ``` This set-up allows us to perform insert, delete and search in O(log n) time. However, as each adjacency list nmay hold up to n-1 nodes, the edge operations are Omega (n) in the worst-case. Therefore, each adjacency list too is implemented as a red-black tree. That is, list[x] will point to the root of a red-black tree. These adjacency-list red-black trees have cells/nodes with the following fields: ``` key[x] : label (these are the "v"'s in the question) (that is, if we are in the red-black tree for u, then the presence of v indicates the presence of edge (u,v)) left[x], right[x], p[x], color[x] (as usual) ``` It is clear how to perform insertedge(u,v) and searchedge(u,v): first find u in the main red-black tree, which leads to the root of the red-black tree for u. In this tree, find v for a search and insert v for insertedge. For insertedge, we also insert u in the tree for v. deleteedge(u,v) deletes v from the u-tree and u from the v-tree. All operations take O(log n) in the worst case. We now return to delete(u). It is the only remaining problem: deleting the tree for u is simple and takes O(1) time. However, from each v-tree we need to eliminate the u-entry. This is too much work. Instead, we leave the node in the tree, and check the legality of a node during a searchedge by checking whether u is indeed a valid label (in O(log n) time, as the labels are picked from a set of size at most n). As a second solution, we may replace the small adjacency list red-black trees by one red-black tree for all edges. The ordering needed to do this is as follows: first order the nodes of an edge pair, so that u < v for edge (u,v). Then (u,v) is < (w,z) if u < w or if u=w and v < z. (This is called lexicographical ordering.) This tree can have up to n^2 elements, but its height is still O(log n).
 Question 7 Algorithm design. Assume that we are given polynomials of order n by their coefficients, stored in arrays (a_0, a_1 , ... , a_n) and (b_0, b_1 , ... , b_n). The first polynomial thus is sum_{i=0}^n a_i x^i. The objective is to compute the product polynomial's coefficients (c_0, c_1, ... c_{2n}). If integer multiplication and addition are available at unit cost, show how this multiplication can be achieved in worst-case time better than O(n^2), and identify the complexity in Theta notation.
 Answer This is done as for integer multiplication. Call the polynomials A(n) and B(n). Then note that formally A(n) = A_1 (n/2) + x^(n/2) A_2 (n/2) and B(n) = B_1 (n/2) + x^(n/2) B_2 (n/2). Note next that ``` A(n) B(n) = A_1 (n/2) B_1 (n/2) + x^n A_2 (n/2) B_2 (n/2) + x^(n/2) ( A_1 (n/2) B_2 (n/2) + A_2 (n/2) B_1 (n/2)). ``` The time T_n needed to compute the product polynomial is cn + 3 T_(n/2) where the cn term takes care of the additions and the shifts, and the 3 T_(n/2) accounts for the three recursive calls to "matrix multiplication". The "3" is there because the x^(n/2) term can be computed by using one extra multiplication, not two, as shown by this trick (dropping arguments): ``` compute A_1 B_1 (coefficient of x^0) compute A_2 B_2 (coefficient of x^n) compute (A_1 +A_2)(B_1 + B_2) compute A_1 B_2 + A_2 B_1 (coefficient of x^(n/2)) as (A_1 +A_2)(B_1 + B_2) - A_1 B_1 - A_2 B_2 (no more multiplications needed!) ``` The recurrence ``` T_n = 3 T_(n/2) + cn , T_1 = 1, ``` has solution T_n = Theta (n^(log_2 3)) by the master theorem.
 Question 8 Lower bounds. Consider n coins that are all of the same unknown weight except one. The following computational model is assumed: we have a scale on which we may place any number (say, k) of coins on both sides. The scale tells us which side is heavier, if any. Each such use of the scale takes one time unit. The answers below should be exact (no big oh, for example). A. What is the decision tree lower bound for the number of time units (scale uses) needed to identify the unequal-weight coin? (No proof needed.) B. And what is the decision tree lower bound for the number of time units (scale uses) needed to identify the unequal-weight coin if it is known beforehand that this coin is heavier than the others? (No proof needed.)
 Answer In both parts, the decision tree has a fanout of 3 (scale tips left, tips right, or stays balanced). The number of answers for part A is 2n (identify the coin, and tell me whether it is heavier or lighter). So the bound is ``` ceiling ( log_3 (2n) ) ``` For part B, we have only n answers, and an accompanying lower bound of ``` ceiling ( log_3 (n) ). ```