Given a set of "n" unordered numbers we want to find the "kth" smallest number. (k is an integer between 1 and n)
In the case of problems involving sorting, searching, and selection, a good indicator of time is the number of comparisons. Therefore, we will set
Algorithm Step | Description of the algorithm step | Running Time |
1 | Sort the n elements using merge-sort or heapsort | O(n log2n) |
2 | Return the kthsmallest element | 0 |
Total Running Time of the algorithm= | O(n log2n) |
Selection can be performed in O(1) time in a sorted array. However, data are not always sorted with respect to the key that is used for selection. Example: we may sort data by name, and ask for the kth smallest phone number
In this lecture, we will study two algorithms with linear worst-case and average-case complexities. The naive algorithm above is simply not comptetitive.
This algorithm will find the kthsmallest in O(n) worst case time.
It is also called the median-finding algorithm.
The method uses a divide & conquer strategy to solve the problem. Like the problem with the good and bad chips where we "threw away" 50% of the chips, if we can throw away 30% or even 10% without too much complication, we're cutting down the running time substantially. Therefore, we must efficiently eliminate as much as we can from the competition.
Algorithm Step | Description of the Algorithm step | Bound on the running time |
0 | If n is small, for example n<6, just sort and return the k thsmallest number. | 7 |
1 | If n>5, then partition the numbers into groups of 5. | n/5 |
2 | Sort the numbers within each group. Select the middle elements (the medians) | 7n/5 |
3 | Call your "Selection" routine recursively to find the median of n/5 medians and call it m. | Tn/5 |
4(a) | Compare all n-1 elements with the median of medians m and determine the sets L and R, where L contains all elements <m, and R contains all elements >m. Clearly, the rank of m is r=|L|+1 (|L| is the size or cardinality of L) | n |
4(b) | If k=r, then return m | 4(c) | If k<r, then return kth smallest of the set L | T7n/10 |
4(d) | If k>r, then return k-rth smallest of the set R |
If n is small, it obviously doesn't make any sense to "juggle" the values. Let us set the small threshold at 5 elements. To find the "kth" item from n elements, here are the least number of comparisons needed for values where n is less than 6.
Show how five elements can be sorted using at most seven comparisons.
The Oxford Dictionary defines Median: -adj situated in the middle. 2.Middle value of a series.
When n is not an exact mutiple of 5, we divide the n elements into
groups of 5 elements each, and at
most one group with the remaining n mod 5 elements. Also, when there
are an even number of medians, select the larger of the two median of
medians.
T(n) is an upper bound on the worst-case time
We will prove that
for all n and for some (yet unknown)
constant C.
Luckily for us, it turns out that the third step
, is true provided
This concludes the proof. We have actually shown that
There is a slight error in the derivation as 7n/10 and n/5 may not always be integer-valued! However, it still remains true if we take truncations into consideration, that Tn=On.
Even numbers don't work since there won't be a median. On the other hand 5, 7, 11, 13, etc. work but why won't 3, the smallest odd number? When we try to use the value of 3 in the algorithm then we obtain the inequality
Justify each term in the inequality
What is the best we can conclude from
The algorithm uses one concept from quicksort and returns the kth smallest of n.
If n=1 | then return the element |
If n>1 | then grab a random element from your set and call it x. Cut the group into two sections. The left side (L) will contain all elements smaller than x, the right side(R) will contain all elements greater than x. Let r=|L|+1 be the rank of x. |
If k=r | then return the element. |
If k<r | then return the kth smallest from the set L. |
If k>r | then return the k-rth smallest from the set R. |
The worst case of this algorithm happens when x is either the smallest or largest element in the set. There will be no splitting and the time will become
T(n) is for the same computational model but now shows the expected or average time
Case 1: | x(recall: the random element) is in the middle half. Then you recurse on the set of size at most 3n/4 |
Case 2: | x is not in the middle half. Taking a pessimistic view, you recurse on a set of size at most n. (that means, we assume that no progress was made) |
As each case occurs with probability 50%, we get:
Rewrite this inequality as:
The algorithm of the previous section had
in the worst case. Using the inequality above, show that for the average case,
.
Hint: draw the recursion tree and sum over all levels.
Many data structures and algorithms courses are ``on the web''. Some useful pointers to get you started in your search are given below.
Algorithms Information and Course Materials on the Net | Courses on algorithms. |
Princeton Data Structures | Bob Sedgewick's course on data structures and algorithms at Princeton. |
Sedgewick's transparencies | Bob Sedgewick's lecture notes (copies of transparencies) on data structures and algorithms at Princeton. For our course, of particular interest are sections 10 (tries), 11 (strings), 12 (more strings), and 13 (compression), as this material is not covered in detail by Cormen, Leiserson and Rivest. |
Sugihara's lecture notes on strings | Sugihara's lecture notes on strings (storage, coding, compression, search) are a nice supplement to Cormen as well. Sugihara teaches at the University of Hawaii. |
Index of Sugihara's lecture notes. | Index of Sugihara's lecture notes. |
Java applets for five sorting algorithms | Animation for sorting algorithms. |
Algorithms at UMass |
|
C++, Algorithms, and Data Structures (CS202) Notes |
|
Rivest's bibliography on algorithms | Ron Rivest's bibliography on algorithms. |
MergeSort demo | Mergesort demo from University of Toronto. |
Linear Time Selection | Lecture notes |
Hoare's homepage | C.A.R. Hoare's homepage |
The problem of selection (finding the kth largest of n elements) goes back to Rev. C. L. Dodgson's (aka Lewis Carroll) essay (Carroll 1883, p.5-6) on how prizes were awarded unfairly in tennis tournaments. Knuth (1973, p.209)
A linear method to solving this problem was always sought after and only after Manuel Blum discovered a method with O(nlog logn) steps in 1971 were R.Rivest and R.Tarjan (Rivest, 1973, p.448-461) able to create the alorithm which we now call "Linear Time Selection". (Knuth 216)
Hoare, Charles Antony Richard published a method called quicksort in 1962 which is used in the algorithm of finding the kth smallest of n elements.
Cormen, Thomas H, Charles E. Leirson, Ronald L. Rivest
"Introduction to Algorithms". Montreal: McGraw-Hill Book Company, 1991.
Section 10.3
St.James Gazette. 1 August, 1883
"Communications of the ACM" 1961.
Vol.4, pp.321-322
"Volume 3: Sorting and Searching" Reading, Massachusetts: Addison-Wesley Publishing Company, 1973.
Section 5.3.3. Minimum-Comparison Selection
Sams.net. (1997, p.365-469)Rivest, Ronald L., and Robert E. Tarjan
Journal of Computers & Sytem Sciences 7: 1973 Vol.7, p.448-461
Daichi Saeki worked on the HTML coding and creating the equations using LATEX2HTML, Chien Liao worked on the JAVA applet and Silva Baeva worked on research.