It is known that to sort n numbers takes at least n*log_2(n) comparisons.
This statement is rather vague for, in some cases, under some other
computational models, one could sort in time less than n*log_2(n) . In this
case, we have
to be really clear how that is taken into account. So, we begin our discussion
on lower bounds:
Computational Model
An Example: Sorting 3 elements:
What we would like to do is
to sort 3 elements using just a scale. Basically, we can put two of them
on a scale and then we
get a response back - what is heavier than the other. So, that's one
comparison (1 comparison = 1 time unit). This model can be used to
gain order information about any input sequence (x1, x2, x3, ... , xn),
given all the input elements are distinct.
The Decision-Tree
Every algorithm for sorting can be viewed as a decision tree, which visualizes and models an algorithm. Here is a decision tree built for this particular example of sorting 3 elements:

We start by comparing a and b - the outcomes are a < b and a > b - bit like the old flow chart. If you have an algorithm, it means in both cases you have a second move. In this case, the second move is to compare a and c, so we have a < c, a > c down both subtrees. (In any comparison- based model, the whole structure is organized like it is described here.) If a < b, and a < c, we do not know if b > c, so we have to, in a clever algorithm, ask which one of b and c is larger - that would be our third comparison. So, we have b < c and b > c. Then we can stop and say a < b < c, and a < c < b. And we can keep going. In some cases, we have three comparisons (e.g. a < b < c, a < c < b, a > c > b, and a > b > c), and some two (e.g. c < a < b, c > a > b). Any algorithm can be built in the form of a decision tree. One could have stupid algorithms, in which one could ask a totally irrelevant question, that would not give any information whatsoever. The point is that a good algorithm should give a relatively shallow decision tree.
Link Between the Number of Leaves in the
tree & the Height of the Tree
When you stop in a decision tree, you have a leaf. So, how many leaves are there in a
decision tree? In our example, 6. In general, if you have n
elements to sort, then the number of leaves is at least
n!, the number of possible permutations of n numbers (e.g. 3! = 6). Now, we may ask what is the connection between the height
of the tree and the number of leaves? The number of leaves in a
tree of height h will be shown to be no more than 2^h, then we can make the following chain:
which leads to the fundamental result:
Why is this true? How do you prove that?
Induction
number of leaves (2^0 = 1)
(If the height is 0 then you just have a root. A tree with
just one leaf is a root also. So, certainly, this statement
is true.)
number of leaves
2 * number of leaves in a tree of height (h-1)
2 * 2^(h-1) (assumed)
= 2^h
If you are still not convinced, then another way of thinking is the following:
Let's strip a layer of leaves like painters strip a layer of paint. So, you take a tree
of height h; let's strip all the leaves away. The number of leaves you have stripped away,
is no more than 2 times the number of leaves in a tree of height (h-1). If this
tree had height h, and we stripped away all the leaves, then the tree must now have height (h-1).
So, number of leaves 2 * 2^(h-1) = 2^h,
that is what we want to show.

| n | |
|---|---|
| 1 | 0 |
| 2 | 1 |
| 3 | 3 |
| 4 | 5 |
| * 5 | 7 |
| 6 | 10 |
What we have here is a table of worst case lower bound values for some small values of n, the number of elements to be sorted. For n = 1, 2, 3, 4, & 6, it is relatively easy. But, it is hard to show that the lower bound for sorting 5 elements is log_2(5!)=7 in the worst case, and the reason is the following:
* A TOUGH ONE because 5! = 120, 2^7 = 128. Thus, we will have to make a tree
with 120 leaves, the maximum number of possible leaves being 128. Assume
that we are starting
in the decision tree and we go to both subtrees. Suppose we have 50 leaves in one subtree
and 70 in the other (as shown in the diagram below), which leads to an intolerable imbalance.
A better choice would be a figure 64, which balances
left and right in terms of leaves
so we come out just right. That is why it is so hard in this case, when n = 5.

A Remark About n*log_2(n)
log_2(n!)
= log_2(n) + log_2(n-1) + ... + log_2(1)
n*log_2(n)
~ n*log_2(n)

1) The result says that the number of comparisons has to be more than the ceiling of log_2(n!) in the worst case. And, of course, decision trees also show best case times (e.g. the shortest path to a leaf), which are much better than log_2(n!). But, we are only concerned with the worst case times.
2) If there are k possible outcomes of a "comparison" (instead of two as in the case of a primative scale), then the decision tree is k-ary, and the lower bound is the ceiling of log_k(n!). Here is a good example, say, we make a super chip that allows us to compare 5 elements in parallel. This time, instead of two, five elements are sorted as a result of a "comparison". Now, what would be the fan-out of this decison tree? - 5! = 120.
3) If we are not sorting, but have to report one of N possible number of outcomes, then the lower bound is the ceiling of log_k(N). In the case of sorting, N = n!, the number of permutations of n numbers.
Are there algorithms that come close to log_2(n!)?
| Algorithm | Number of Comparisons |
|---|---|
| HeapSort | |
| QuickSort | |
| Tournament Sort | ~ n * log_2(n) |
| Merge Sort | ~ n * log_2(n) |
Top
You have 6 coloured pegs (white, black, red, green, blue, yellow) and 4 holes in a board. You choose any 4 of the 6 colours and place them in any order in the 4 holes, hiding your layout from your opponent. The objective of the game is for your opponent to guess correctly your colour layout by sticking 4 coloured pegs in his/her board. After each guess, you respond. For example, you use a yellow, black, black, blue layout. Your opponent guesses black, yellow, green, green. You then respond with 2 white pegs, meaning your opponent has 2 colours right, namely black and yellow.

The 14 possible outcomes are: -, W, B, WW, WB, BB, WWW, WWB, WBB, BBB, WWWW, WWWB, WWBB, BBBB. And a decision tree for the game of Mastermind is the following:

| PROBLEM | LOWER BOUND | GOOD ALGORITHM |
|---|---|---|
| unsuccessful search in a sorted list: x1 < x2 < ... < xn | binary search | |
| successful search in x1 < x2 < ... < xn | ![]() |
binary search |
| find all elements |
= n |
compare every element with x |
| merge two sorted sets of sizes m, n | x's and y's cannot be permuted. The final result is a sequence of length (m+n), e.g. x1 x2 y1 x3 x4 x5 y2 y3 ... xn ym. So, all we need to know are the positions of the y's in the sequence, e.g. (3, 7, 8). Now, it is a question of counting how many ways there are that one can fix a set of places for y - a combinatorial problem - (m+n) choose m. |
The algorithm is totally open. For example:
|
Top
| Investigating The Run Time Complexity Of Insertion and Selection Sort - Best, Worst and Average Cases | Sorting Algorithms - 09 May 95, Generated with CERN WebMaker |
| Lower Bound Analysis of Sorting Algorithms | C++, Algorithms and Data Structures(cs202) Notes, Computer Science, University of Dundee |
| Order Analysis Summary | School of Computing, Information Systems and Mathematics, South Bank University, London |
| A Lower Bound on Sorting | Mike Atkinson's Home Page, School of Mathematical and Computational Sciences, University of St. Andrews |
Last updated 0:10a.m. April 23, 1997 by Yi Zhang
Copyright © 1997 by M.S.Y.Y. Assoc. - Mensah Nunya, Song Pu, Yi Zhang, Ying Li. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy.Disclaimer: this collection of notes is experimental, and does not serve as-is as a substitute for attendance in the actual class.