McGill University:
Department of Computer Science
Winter 1997 Class Notes for 308-251B

DATA STRUCTURES AND ALGORITHMS

Topic #3: Analysis of O, THETA, OMEGA. Summations.


Landau Symbols

Landau symbols have mathematically precise definitions. They have many uses, such as in the analysis of algorithms. These symbols are used to evaluate and to concisely describe the performance of an algorithm, in time and in space.

Five different symbols were discussed in class. They are :

But only three of them were properly defined in class :

This graph shows the bounds on a given function.
This function can depict the running time of an algorithm.

A few remarks :

  1. We only care what happens for large n (n no)
  2. O, , "hide" the constant C from us.
  3. If an is the worst-case time of an algorithm, then O(.) gives worst-case guarantess (good news), while gives a lower bound (bad news)

Here is a simple example to show why C is not needed.

Suppose we know the running time of a given algorithm to be (n2) (as in Insertion Sort or Bubble Sort of n elements). Say we timed the algorithm at n = 1000, and we find that it takes 4 seconds. If we want to find the time for n = 1 million, we don't have to run the algorithm over again. Since the time grow quadratically in this case, the estimated time for the problem should be close to (1000)2 * 4 seconds, or 4 million seconds. Had this algorithm been linear ( (n) ), the time would have been (1000) * 4 or 4000 seconds only! In both of these cases (quadratic and linear), the constant C was not needed.


Complexity Classes

This is a table of Complexity Classes. The row entries are in increasing order of growth.

ClassFunction/OrderExamplesRemarks
1Constant timeA[i]The running time of the algorithm doesn't depend on the value of n.
log*nLog Star time...

log*n = min {i : log2 log2 ... log2n 1, such that i is the number of times n is taken to log2 which makes the equation whole}

For practical purposes, log*n is never more than 5. For example, 265536 > 1080, the number of atoms in the universe!

FIGURE 1 shows the values of log star from 0 to 5.

loganLogarithmic timebinary searchBinary Search:

Every time you go through the recursion (binary search uses recursion), the problem is reduced in half. So the most amount of times you can go through the recursion is log2n.

nLinear timeprinting, listingThe running time of the algorithm is proportional to the value of n.
n * log(n)...heapsort, mergesort...
n2Quadratic timebubblesort, insertion sort

Why is insertion sort quadratic?

Comparisons :
(1 + 2 + ... + n) = i=1 to n i = n(n + 1)/2 = n2/2 + n/2 = O(n2)

naPolynomial time...Hypothesis : Exponential is larger than polynomial :

Compute the limit of cn / nk for c > 1. To do this, find a lower bound for cn.

cn = en*log c (n*log c)2k / (2k)!

This was found by the Taylor series expansion of ex. Clearly, for c > 1 and a proper choice of a constant, this function is (n2k).

Substituting n2k for cn in the original problem, shows that the limit we desire is the limit of n2k / nk. This limit is clearly .

As a result, Exponentials clearly swamps Polynomials.

2n, 10n, anExponential time security codes

Hans Bremermann was an early computer scientist in Berkeley. In 1962, based upon physical reasoning, he stated that no computer, biological or artificial, could ever process more than (2 * 1047) bits of information per second-gram.

Let's assume that this is true. Now, assume that we protect a computer system with a code consisting of 40 decimal digits. Breaking this code by brute force would in the worst case require trying out 1040 combinations. In a computer working with Bremermann's speed, that could be done in less than a second, even if the computer weighed only 1 gram.

Now, if we protected it with 100 digits, we would be safe, even if we had a 10 ton computer, for at least

10100 * log2(10) / (2 * 1047 * 107) seconds.

That is basically until eternity.

n!Factorial timeTSP

TSP is the Travelling Salesman Problem. Let's say you are a salesperson and you have to visit each of n cities to sell your product. You want to visit all the cities and return to the city you came from. You want to find the shortest path possible, visiting all the cities.

Obviously, it would be naive to try all the permutations. There are (n-1)! permutations, and each permutation requires computation worth n (add n distances), for a total of n!. Many tried their hand on this problem, but few have found a good solution for it.

en1/2Betweem Poly. and Exp....

Is there life between polynomial and exponential?

en1/2 is one such function.

For all n and for all integers a

FIGURE 1
nlog*n
10
21=21
22=42
24=163
216=655364
2655365


Summations

Summations are used extensively in computer science as a mathematical tool to help analyze the running time of algorithms. In computer science, most algorithms are completed through the repitition of simple actions, or loops. Loops are created in two different ways: first, by constructs such as repeat,while,and for, and second, by recursion. In order to analyze the times taken by looping programs, we have to deal with summations, as one must sum the times taken in various iterations. In order to analyze the times taken by recursive programs, we have to deal with recurrences, which is the subject of the next lecture.

There are five basic methods for dealing with summations.

MethodExample Easily Solved by the Given Method
Rough Bounding i=1 to n i
Exact Computations i=1 to n i
Induction i=0 to n 3i C * 3 n
Telescoping Seriesi=1 to n 1 / (i * (i + 1))
Comparison With an Integral i=1 to n 1/i
  1. Rough Bounding

    Given a summation, we would like to bound each term by something simple, and thus make the summation more managable. While this method will not determine an exact measure of the running time of the algorithm which the summation is supposed to represent, it is useful in achieving bounds to solve Big Oh notation questions. In the example below, the summation represents the running time of many different algorithms, among them selection sort.

  2. Exact Computations

    As the name of the method suggests, we would like to compute an exact numerical value for the summation.

  3. Induction

    Induction can be used to prove a certain statement. We say it is "induction with respect to a parameter n", if we first show it for n = 0. Then we assume the statement to be true up to but not including n, and finally we verify that statement for n.

  4. Telescoping Series

    A telescoping series is a series where each term can be expressed as a difference in such a way that most of the terms cancel each other out.

  5. Comparison With An Integral

    The basic principle behind this method is that i=1 to n f(i) i=1 to n f(x) dx


    Chip Testing Problem

    This is Problem 4.7 from Cormen et al (our course textbook).

    This problem is motivated by the VLSI industry, where millions of VLSI chips must be tested for defaults. The idea is to let the chips test each other by built-in circuitry. Note that VLSI stands for very large scale integration which is the integrated-circuit chip technology used to fabricate most microprocessors today.

    There are two kinds of chips - good and bad. Good chips always tell the truth, i.e. will report accurately whether the tested chip is good or bad. Bad chips say anything about the other chip, thus cannot be trusted. Our goal is to identify all good chips. The problem must be solved in O(n) time.

    It is assumed that there are more good chips than bad. In other words, |G| > |B|. Our computational model states that each chip tested counts for one time unit.

    It suffices to define one good chip in time O(n). This is because one good chip can identify all good chips in O(n) time.

    A naive method

    We are going to test each chip individually by comparing it against all remaining chips. We take votes from the remaining chips and let the majority decide. In case of equeality, good wins. If a chip is declared bad, it is junked and we continue.

    Time Analysis :

    (n-1)+(n-2)+(n-3)+...+(n/2) n + n + .... + n = (n2)/2 = O(n2)

    However such time is unacceptable, for it greatly exceeds O(n) time.

    Divide and Conquer Paradigm

    Definitions :

    • G : set of all good chips
    • B : set of all bad chips
    • B-B : pair of chips that has been declared "bad, bad"
    • B-G : pair of chips that has been declared "bad, good"
    • G-G : pair of chips that has been declared "good, good"

    Here is one iteration of the divide and conquer paradigm:

    If n = 1 or n = 2,

    then return one chip (it is good).

    If n is odd,

    test odd chip.

    If odd chip is good,

    return the chip (it is good).

    If odd chip is bad,

    with remaining chips do :
    • pair all chips
    • junk all B-G, B-B pairs
    • junk one chip of every G-G pair
    • find (recursively) one good chip among the remaining chips

    Remarks :

    1. If before the iteration, |G| > |B|, then after the iteration, |G| > |B|. Before the pairing step, we have |G| |B| + 2. Each B-B, B-G pair has at one bad chip. So we junk at least as many bad chips as good ones. So, we still have |G| |B| + 2. Each G-G pair has either two good chips or two bad ones; what matters is that both chips are alike. Thus, the halving step leaves us with |G| |B| + 1.

    2. Time Complexity :
      If T(n) is the worst-case time for n chips, then T(2) = T(1) = 0 and

      T(n) n - 1 + floor(n/2) + T(floor(n/2))
      < 3n/2 + T(floor(n/2))
      < 3n/2 + 3n/4 + T(floor(n/4))
      < 3n/2 + 3n/4 + 3n/8 + T(floor(n/8)) ...

      So that clearly

      T(n) < 3n/2 (1 + 1/2 + 1/4 + 1/8 + ...) = 3n

      where we used a summation formula for the geometric series seen above


    Java Applet on Chip Testing Problem

    Here is the source code of the java applet : ShowChip.java


    Links to related material


    References


    Web page designers

    This section is provided to give readers the opportunity to contact us, to point out errors or omissions, to ask questions, and to suggest improvements. Explanations on the role of each team member is included in the creation of the web page.

    The creators are, in no particular order of importance except alphabetical :


    Last updated February 17, 1997 by Andrew Ma.

    Copyright © 1997, Anastassios Soumas, Anatoly Likhatchev, Andrew Bierbryer and Andrew Ma. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy. Disclaimer: this collection of notes is experimental, and does not serve as-is as a substitute for attendance in the actual class.