It is known that to sort n numbers takes at least n*log_2(n) comparisons.
This statement is rather vague for, in some cases, under some other
computational models, one could sort in time less than n*log_2(n) . In this
case, we have
to be really clear how that is taken into account. So, we begin our discussion
on lower bounds:
Computational Model
An Example: Sorting 3 elements:
What we would like to do is
to sort 3 elements using just a scale. Basically, we can put two of them
on a scale and then we
get a response back  what is heavier than the other. So, that's one
comparison (1 comparison = 1 time unit). This model can be used to
gain order information about any input sequence (x1, x2, x3, ... , xn),
given all the input elements are distinct.
The DecisionTree
Every algorithm for sorting can be viewed as a decision tree, which visualizes and models an algorithm. Here is a decision tree built for this particular example of sorting 3 elements:
We start by comparing a and b  the outcomes are a < b and a > b  bit like the old flow chart. If you have an algorithm, it means in both cases you have a second move. In this case, the second move is to compare a and c, so we have a < c, a > c down both subtrees. (In any comparison based model, the whole structure is organized like it is described here.) If a < b, and a < c, we do not know if b > c, so we have to, in a clever algorithm, ask which one of b and c is larger  that would be our third comparison. So, we have b < c and b > c. Then we can stop and say a < b < c, and a < c < b. And we can keep going. In some cases, we have three comparisons (e.g. a < b < c, a < c < b, a > c > b, and a > b > c), and some two (e.g. c < a < b, c > a > b). Any algorithm can be built in the form of a decision tree. One could have stupid algorithms, in which one could ask a totally irrelevant question, that would not give any information whatsoever. The point is that a good algorithm should give a relatively shallow decision tree.
Link Between the Number of Leaves in the
tree & the Height of the Tree
When you stop in a decision tree, you have a leaf. So, how many leaves are there in a
decision tree? In our example, 6. In general, if you have n
elements to sort, then the number of leaves is at least
n!, the number of possible permutations of n numbers (e.g. 3! = 6). Now, we may ask what is the connection between the height
of the tree and the number of leaves? The number of leaves in a
tree of height h will be shown to be no more than 2^h, then we can make the following chain:
which leads to the fundamental result:
Why is this true? How do you prove that?
Induction
If you are still not convinced, then another way of thinking is the following:
Let's strip a layer of leaves like painters strip a layer of paint. So, you take a tree
of height h; let's strip all the leaves away. The number of leaves you have stripped away,
is no more than 2 times the number of leaves in a tree of height (h1). If this
tree had height h, and we stripped away all the leaves, then the tree must now have height (h1).
So, number of leaves 2 * 2^(h1) = 2^h,
that is what we want to show.
n  

1  0 
2  1 
3  3 
4  5 
* 5  7 
6  10 
What we have here is a table of worst case lower bound values for some small values of n, the number of elements to be sorted. For n = 1, 2, 3, 4, & 6, it is relatively easy. But, it is hard to show that the lower bound for sorting 5 elements is log_2(5!)=7 in the worst case, and the reason is the following:
* A TOUGH ONE because 5! = 120, 2^7 = 128. Thus, we will have to make a tree with 120 leaves, the maximum number of possible leaves being 128. Assume that we are starting in the decision tree and we go to both subtrees. Suppose we have 50 leaves in one subtree and 70 in the other (as shown in the diagram below), which leads to an intolerable imbalance. A better choice would be a figure 64, which balances left and right in terms of leaves so we come out just right. That is why it is so hard in this case, when n = 5.
A Remark About n*log_2(n)
log_2(n!)
= log_2(n) + log_2(n1) + ... + log_2(1)
n*log_2(n)
~ n*log_2(n)
1) The result says that the number of comparisons has to be more than the ceiling of log_2(n!) in the worst case. And, of course, decision trees also show best case times (e.g. the shortest path to a leaf), which are much better than log_2(n!). But, we are only concerned with the worst case times.
2) If there are k possible outcomes of a "comparison" (instead of two as in the case of a primative scale), then the decision tree is kary, and the lower bound is the ceiling of log_k(n!). Here is a good example, say, we make a super chip that allows us to compare 5 elements in parallel. This time, instead of two, five elements are sorted as a result of a "comparison". Now, what would be the fanout of this decison tree?  5! = 120.
3) If we are not sorting, but have to report one of N possible number of outcomes, then the lower bound is the ceiling of log_k(N). In the case of sorting, N = n!, the number of permutations of n numbers.
Are there algorithms that come close to log_2(n!)?
Algorithm  Number of Comparisons 

HeapSort  2n * log_2(n) 
QuickSort  1.38...n * log_2(n)  on average 
Tournament Sort  ~ n * log_2(n) 
Merge Sort  ~ n * log_2(n) 
Top
You have 6 coloured pegs (white, black, red, green, blue, yellow) and 4 holes in a board. You choose any 4 of the 6 colours and place them in any order in the 4 holes, hiding your layout from your opponent. The objective of the game is for your opponent to guess correctly your colour layout by sticking 4 coloured pegs in his/her board. After each guess, you respond. For example, you use a yellow, black, black, blue layout. Your opponent guesses black, yellow, green, green. You then respond with 2 white pegs, meaning your opponent has 2 colours right, namely black and yellow.
The 14 possible outcomes are: , W, B, WW, WB, BB, WWW, WWB, WBB, BBB, WWWW, WWWB, WWBB, BBBB. And a decision tree for the game of Mastermind is the following:
PROBLEM  LOWER BOUND  GOOD ALGORITHM 

unsuccessful search in a sorted list: x1 < x2 < ... < xn  binary search  
successful search in x1 < x2 < ... < xn  binary search  
find all elements x in an unsorted set of size n  = n  compare every element with x 
merge two sorted sets of sizes m, n  x's and y's cannot be permuted. The final result is a sequence of length (m+n), e.g. x1 x2 y1 x3 x4 x5 y2 y3 ... xn ym. So, all we need to know are the positions of the y's in the sequence, e.g. (3, 7, 8). Now, it is a question of counting how many ways there are that one can fix a set of places for y  a combinatorial problem  (m+n) choose m. 
The algorithm is totally open. For example:

Top
Investigating The Run Time Complexity Of Insertion and Selection Sort  Best, Worst and Average Cases  Sorting Algorithms  09 May 95, Generated with CERN WebMaker 
Lower Bound Analysis of Sorting Algorithms  C++, Algorithms and Data Structures(cs202) Notes, Computer Science, University of Dundee 
Order Analysis Summary  School of Computing, Information Systems and Mathematics, South Bank University, London 
A Lower Bound on Sorting  Mike Atkinson's Home Page, School of Mathematical and Computational Sciences, University of St. Andrews 
Last updated 0:10a.m. April 23, 1997 by Yi Zhang
Copyright © 1997 by M.S.Y.Y. Assoc.  Mensah Nunya, Song Pu, Yi Zhang, Ying Li. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy.Disclaimer: this collection of notes is experimental, and does not serve asis as a substitute for attendance in the actual class.