It is known that to sort n numbers takes at least n*log_2(n) comparisons.
This statement is rather vague for, in some cases, under some other
computational models, one could sort in time less than n*log_2(n) . In this
case, we have
to be really clear how that is taken into account. So, we begin our discussion
on lower bounds:
An Example: Sorting 3 elements:
What we would like to do is to sort 3 elements using just a scale. Basically, we can put two of them on a scale and then we get a response back - what is heavier than the other. So, that's one comparison (1 comparison = 1 time unit). This model can be used to gain order information about any input sequence (x1, x2, x3, ... , xn), given all the input elements are distinct.
Every algorithm for sorting can be viewed as a decision tree, which visualizes and models an algorithm. Here is a decision tree built for this particular example of sorting 3 elements:
We start by comparing a and b - the outcomes are a < b and a > b - bit like the old flow chart. If you have an algorithm, it means in both cases you have a second move. In this case, the second move is to compare a and c, so we have a < c, a > c down both subtrees. (In any comparison- based model, the whole structure is organized like it is described here.) If a < b, and a < c, we do not know if b > c, so we have to, in a clever algorithm, ask which one of b and c is larger - that would be our third comparison. So, we have b < c and b > c. Then we can stop and say a < b < c, and a < c < b. And we can keep going. In some cases, we have three comparisons (e.g. a < b < c, a < c < b, a > c > b, and a > b > c), and some two (e.g. c < a < b, c > a > b). Any algorithm can be built in the form of a decision tree. One could have stupid algorithms, in which one could ask a totally irrelevant question, that would not give any information whatsoever. The point is that a good algorithm should give a relatively shallow decision tree.
Link Between the Number of Leaves in the
tree & the Height of the Tree
When you stop in a decision tree, you have a leaf. So, how many leaves are there in a decision tree? In our example, 6. In general, if you have n elements to sort, then the number of leaves is at least n!, the number of possible permutations of n numbers (e.g. 3! = 6). Now, we may ask what is the connection between the height of the tree and the number of leaves? The number of leaves in a tree of height h will be shown to be no more than 2^h, then we can make the following chain:
which leads to the fundamental result:
Why is this true? How do you prove that?
number of leaves (2^0 = 1)
(If the height is 0 then you just have a root. A tree with just one leaf is a root also. So, certainly, this statement is true.)
number of leaves
2 * number of leaves in a tree of height (h-1)
2 * 2^(h-1) (assumed)
If you are still not convinced, then another way of thinking is the following:
Let's strip a layer of leaves like painters strip a layer of paint. So, you take a tree of height h; let's strip all the leaves away. The number of leaves you have stripped away, is no more than 2 times the number of leaves in a tree of height (h-1). If this tree had height h, and we stripped away all the leaves, then the tree must now have height (h-1). So, number of leaves 2 * 2^(h-1) = 2^h, that is what we want to show.
What we have here is a table of worst case lower bound values for some small values of n, the number of elements to be sorted. For n = 1, 2, 3, 4, & 6, it is relatively easy. But, it is hard to show that the lower bound for sorting 5 elements is log_2(5!)=7 in the worst case, and the reason is the following:
* A TOUGH ONE because 5! = 120, 2^7 = 128. Thus, we will have to make a tree with 120 leaves, the maximum number of possible leaves being 128. Assume that we are starting in the decision tree and we go to both subtrees. Suppose we have 50 leaves in one subtree and 70 in the other (as shown in the diagram below), which leads to an intolerable imbalance. A better choice would be a figure 64, which balances left and right in terms of leaves so we come out just right. That is why it is so hard in this case, when n = 5.
A Remark About n*log_2(n)
= log_2(n) + log_2(n-1) + ... + log_2(1)
1) The result says that the number of comparisons has to be more than the ceiling of log_2(n!) in the worst case. And, of course, decision trees also show best case times (e.g. the shortest path to a leaf), which are much better than log_2(n!). But, we are only concerned with the worst case times.
2) If there are k possible outcomes of a "comparison" (instead of two as in the case of a primative scale), then the decision tree is k-ary, and the lower bound is the ceiling of log_k(n!). Here is a good example, say, we make a super chip that allows us to compare 5 elements in parallel. This time, instead of two, five elements are sorted as a result of a "comparison". Now, what would be the fan-out of this decison tree? - 5! = 120.
3) If we are not sorting, but have to report one of N possible number of outcomes, then the lower bound is the ceiling of log_k(N). In the case of sorting, N = n!, the number of permutations of n numbers.
Are there algorithms that come close to log_2(n!)?
|Algorithm||Number of Comparisons|
|HeapSort||2n * log_2(n)|
|QuickSort||1.38...n * log_2(n) - on average|
|Tournament Sort||~ n * log_2(n)|
|Merge Sort||~ n * log_2(n)|
You have 6 coloured pegs (white, black, red, green, blue, yellow) and 4 holes in a board. You choose any 4 of the 6 colours and place them in any order in the 4 holes, hiding your layout from your opponent. The objective of the game is for your opponent to guess correctly your colour layout by sticking 4 coloured pegs in his/her board. After each guess, you respond. For example, you use a yellow, black, black, blue layout. Your opponent guesses black, yellow, green, green. You then respond with 2 white pegs, meaning your opponent has 2 colours right, namely black and yellow.
The 14 possible outcomes are: -, W, B, WW, WB, BB, WWW, WWB, WBB, BBB, WWWW, WWWB, WWBB, BBBB. And a decision tree for the game of Mastermind is the following:
|PROBLEM||LOWER BOUND||GOOD ALGORITHM|
|unsuccessful search in a sorted list: x1 < x2 < ... < xn||binary search|
|successful search in x1 < x2 < ... < xn||binary search|
|find all elements x in an unsorted set of size n||= n||compare every element with x|
|merge two sorted sets of sizes m, n||
x's and y's cannot be permuted. The final result is a sequence of length (m+n), e.g. x1 x2 y1 x3 x4 x5 y2 y3 ... xn ym. So, all we need to know are the positions of the y's in the sequence, e.g. (3, 7, 8). Now, it is a question of counting how many ways there are that one can fix a set of places for y - a combinatorial problem - (m+n) choose m.
|The algorithm is totally open. For example:
|Investigating The Run Time Complexity Of Insertion and Selection Sort - Best, Worst and Average Cases||Sorting Algorithms - 09 May 95, Generated with CERN WebMaker|
|Lower Bound Analysis of Sorting Algorithms||C++, Algorithms and Data Structures(cs202) Notes, Computer Science, University of Dundee|
|Order Analysis Summary||School of Computing, Information Systems and Mathematics, South Bank University, London|
|A Lower Bound on Sorting||Mike Atkinson's Home Page, School of Mathematical and Computational Sciences, University of St. Andrews|
Last updated 0:10a.m. April 23, 1997 by Yi Zhang
Copyright © 1997 by M.S.Y.Y. Assoc. - Mensah Nunya, Song Pu, Yi Zhang, Ying Li. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy.Disclaimer: this collection of notes is experimental, and does not serve as-is as a substitute for attendance in the actual class.