CS251- Winter 1999- Final examination (April 27, 1999)

Question 1	A simple recursive algorithm. Given is a standard min-heap H with n elements implemented as an array. Write a very simple recursive procedure that returns all the keys in the heap with value less than a given number z. If k elements are returned, the worst-case time should be O(k+1).
Answer	Perform preorder traversal started with report(1) (as 1 is the root in the array implementation of H). We are also given the heap size, n. report(x): (reports all elements in subtree at x with key < z) if key[x] < z then output key[x] if 2x <= n then report (2x) if 2x+1 <= n then report (2x+1)
Question 2	Mathematical exactness. Define precisely what is meant when we say that an algorithm takes polynomial worst-case time on an input of size n.
Answer	Usually what is meant is that the worst-case time is polynomial in the O(.) sense: thus, there exist finite constants C and K such that the maximal execution time of an algorithm over any input of size n is bounded by C n^K for all n. A similar answer for Theta(.) was also accepted.
Question 3	Huffman trees. We wish to construct the Huffman tree for n symbols, with observed frequencies integer-valued between 1 and n (with possible duplicates). We have seen that the Hu-Tucker algorithm can construct the Huffman tree in time O(n log n). However, in this particular instance, show how you can construct the tree in time O(n).
Answer	First, sort the (symbol, frequency) pairs by frequency by placing them in a direct hash table with chaining (of size n): the hash function is h(frequency)=frequency. This takes O(n) time in the worst-case, even if all elements collide in one chain, as insertions are at the fronts of the chains. For convenience, concatenate the chains into one long queue L, sorted from small to large by frequency with cells holding pairs (i, p_i) where i is the i-th symbol/leaf, and p_i is the weight; set up an initially empty second queue Q for pairs (j, q_j), where j stands for an internal node and q_j is its weight. The algorithm: for j= n+1 to 2n+1 do: Delete the 2 smallest weight items from (L, Q) (these may both come from the same queue) Let (u,p_u), (v,p_v) be these symbol/weight pairs enqueue ( (j,p_u+p_v) , Q) left[j] <- u, right[j] <- v
Question 4	Algorithm on trees. Given is a parent pointer tree in array form: the array p[i], 1 <= i <= n, holds the pointers (integers) to the parents, and the nodes are numbered 1 through n. Give a linear time algorithm for making this into an ordered tree implemented by the oldest child-next sibling method. The ordering of the children is unimportant. Do this by filling in the arrays oldest[i], next[i], 1 <= i <= n.
Answer	oldest[.] is a pointer to one child, and next[.] is a pointer to the next sibling. The children of any node are kept in a single linked list (as in the adjacency lists of graphs). Initially, we set oldest[x]=next[x]=NIL for all x. for i = 1 to n do: y=p[i] (y is the parent of i) if oldest[y]=nil then oldest[y]=i else next[i]=oldest[y], oldest[y]=i Note that the order in which the nodes are processed does not matter.
Question 5	Algorithm on graphs. A directed graph is given in which each of the nodes has one neighbor, nb[.] (thus, (x,nb[x]) is an edge from x to nb[x], and loops (nb[x]=x) are not allowed). Part A. Why does this graph have a cycle? Part B. Write an O(n) worst-case time algorithm that outputs one (any) cycle. Give the details, but make the algorithm as "minimal" as possible (no extraneous variables, no unnecessary segments, etcetera).
Answer	Part A: Pick a node x(1). Start walking to neighbors and obtain x(1),...,x(n+1). In this sequence, there must be a repetition, say at x(i)=x(j). Thus, x(i),...,x(j)=x(i) is a cycle. Part B: We perform DFS (walking to neighbors) until a gray node is encountered. To fix the notation, let the DFS be at u and let the gray neighbor of u be v. Then, walk from v to u again, and add the edge (u,v) to obtain the cycle. The first step takes O(n) worst-case time as our graph has n vertices and n edges. The second step takes O(n) time because the size of a cycle is O(n).
Question 6	Sorting. The input is a sequence of n integers with many duplications, such that the number of distinct integers in the sequence is O(log n). Design a sorting algorithm (based on comparisons only) to sort such sequences using at most O(n log log n) comparisons in the worst case. Hint: A red-black tree may be useful here!
Answer	This question was taken from Manber's textbook, page 178. We make a red-black tree of distinct keys, keeping track of multiplicities in each node via an augmented field count[.]. As the tree has size O(log n), each dictionary operation takes time O(log log n), hence the red-black tree may be constructed in O(n log log n) time. A few details: insert (x): y=search (x) (returns position of x in tree, or NIL) if y=NIL then perform red-black-tree-insert (x) set count[.]=1; else count[y] += 1 (no insert needed) main program: makenull (tree) for all x do insert (x) inordertraversal (tree) (note: repeat element at y count[y] times in output)
Question 7	Analysis. How many nodes are there in a complete 17-ary tree with n nodes that have the property that none of their ancestors is a youngest child (note: every litter has only one youngest child)? Give the answer (N) as a function of n in Theta notation. Point out what algorithm could be used to achieve this in time O(N)? Hint: This is an exercise on the master theorem.
Answer	The following recurrence is obvious, by considering what happens at the root: T(n) = 16 T(n/17) + 1, T(1) = 1. By the master theorem, the solution is Theta(n^a), where a = log(base 17) 16. To exhibit all the required nodes, do a preorder traversal, avoiding all youngest children (and thus, their subtrees).
Question 8	Hashing. We are given an array A of size n with records of shipments. Shipment number i (the shipments are numbered from 1 to n) has the following fields: weight[i]: the weight in tons of a shipment; descr[i]: the description of the contents of a shipment; address[i]: the address for the shipment; invoice[i]: a 9-digit invoice number; sender[i]: the name of the sender. There is room for more fields. We wish to implement hashing with chaining in order to perform the operations delete (a shipment given by shipment number) in O(1) worst-case time, and search (for an invoice number) in O(1) expected time (assuming a good hash function). Describe the data structure in detail---define all arrays, tables, and fields, and give the size of each table (if any). A drawing is necessary.
Answer	One possibility is to use direct addressing, building on the given arrays. However, to locate by invoice number, we make a second table, also of size n, with a hash function based on the invoice number only (!!!), in which we employ chaining. One might use something like: h(invoicenumber) = invoicenumber mod n. Thinking about this, we see that each node requires a next[.] field, pointing to the next node in the same chain. To perform delete in O(1) time, the chains must be double-linked, and we introduce an additional pointer previous[.]. So, here is the proposal: Data: array of size n with fields: weight[i] descr[i] address[i] invoice[i] sender[i] next[i]: next in chain previous[i]: previous in chain Additional hash table H of size n, consisting of n pointers to headers of chains, so search can be done in the chain with first element H[h(invoicenumber)]. Search thus takes O(1) expected time. A note about deletions: Deletion of a non-header element clearly takes O(1) worst-case time, thanks to the double linking. If we delete a shipment j, where j is the header of a list (recognizable when previous[j]=NIL), then there is a new header (pointed to by next[j]), say i. We must update H as follows: H[h(invoice[j])]=i. This is a very tricky point indeed!