Last modified: April 22, 1997
We are given as a starting point a weighted connected graph(V,E), in which edges have weights or lengths.
Spanning tree: a free tree on V (thus having |V|-1 edges that are a subset of|E|).
Minimum Spanning tree: the spanning tree with minimal total weight, where the weights of the edges picked are summed to obtain a total weight..
The objective of this chapter is to study efficient algorithms for finding minimal spanning trees (or MSTs) given that the graph is stored in adjacency list format.
Particular attention will be paid to the Euclidean minimal spanning tree (EMST) for a set of nodes V that each represent a d-dimensional vector, which is the MST for the complete graph on V where edge weights are Euclidean distances between vectors.
In this graph, the red lines are the minimum spanning tree and the black lines are the edges which exist but are not part of the minimum spanning tree..
- Network design:
- Visualizing multidimensional data:
As points that are close to each other are connected, multidimensional data (vectors) may be connected by an MST to see how they relate to each other. Each point on the figure below represents a high-dimensional vector of observations or measurements (on bones and bone structure).
From such a graphs or others like it, one may deduce patterns in evolution. One may also group animals in certain ways. For example, by removing the longest edge in the MST, one obtains two groups, also called clusters. If we remove the k longest edges, we obtain k+1 clusters. This may be useful for clustering objects.
- Clustering. Classification. Taxonomy.
Biology and science in general is preoccupied with grouping objects, plants or animals. We refer to this as classification or taxonomy. One way to proceed is to make the objects into vectors as in the example of mammal's milk ( see the table ), to find the EMST, and to remove the k longest edges. This yields the grouping shown in the mammal's milk example.
ANIMAL |
WATER | PROTEIN | FAT | LACTOSE |
HORSE | 90.1 | 2.6 | 1.0 | 6.9 |
DONKEY | 90.3 | 1.7 | 1.4 | 6.2 |
MULE | 90.0 | 2.0 | 1.8 | 5.5 |
CAMEL | 87.7 | 3.5 | 3.4 | 4.8 |
LLAMA | 86.5 | 3.9 | 3.2 | 5.6 |
ZEBRA | 86.2 | 3.0 | 4.8 | 5.3 |
SHEEP | 82.0 | 5.6 | 6.4 | 4.7 |
BUFFALO | 82.1 | 5.9 | 7.9 | 4.7 |
FOX | 81.6 | 6.6 | 5.9 | 4.9 |
PIG | 82.8 | 7.1 | 5.1 | 3.7 |
RABBIT | 71.3 | 12.3 | 13.1 | 1.9 |
RAT | 72.5 | 9.2 | 12.6 | 3.3 |
DEER | 65.9 | 10.4 | 19.7 | 2.6 |
REINDEER | 64.8 | 10.7 | 20.3 | 2.5 |
WHALE | 64.8 | 11.1 | 21.2 | 1.6 |
Observe that MSTs need not be unique, so we will speak of an MST, not the MST. If S is a subset of the edges which taken together form a forest (two or more trees), then we say that S is a partial solution if by adding edges to S, we may obtain an MST.
If S is a partial solution, then the fundamental property of MSTs is the following:
Partition the vertices into two nonempty sets L and R (for left and right) such that all edges of S are between nodes of L, or between nodes of R. Let B (for bridges) be the collection of all bridges between L and R, i.e., all edges (u,v) with u in L, v in R. Define the minimal bridge as the bridge in B of shortest weight or length, and call it (u,v). Then (u,v) is a safe edge, that is, S union (u,v) is also a partial solution.
Safe edge def: An edge that can be added to the MST while respecting the properties of the MST
Proof:
Proof is by contradiction. You have a partial solution S, which consists of several connected components. Then, you put a separator (the vertical red line) so that there is at least one component on each side of this separator. Assume you have a MST (blue lines) that extends the partial MST and that does not include the shortest bridge. Add the shortest bridge so now you have exactly one cycle with at least 2 bridges. Replace another bridge i.e. remove it from the MST. So, again, you have a tree but now of shorter length. So, the original MST (the proposed one) was not really a MST which completes the proof.
In class we were asked to to show that each node is connected to its nearest neighbour, this is what we have proved in the graph found above. This example may be generalized to every node. If we repeat this proof to every node we will have provred that every node is connected to its closest neighbour.
ALGORITHM:
MAKEQUEUE(Q)
Q <= {2,3,4,....,n}
dist(2,1),...,dist(n,1)
For all n belonging to Q:
| d[u] <= dist(u,1)
| p[u] <= 1
d[u] <= infinity
p[u] <= nil
| 1 if u is connected to 1
| NIL
MST <= { }
While not empty (Q) Do
| u <= DELETEMIN(Q)
| MST <= MST + (u,p[u])
| for all neighbours v of u do:
if v is an element of Q and if dist(u,v) < d[v] then
| d[v] <= dist[u,v]
| p[v] <= u
| DECREASEKEY (v,d[v])
The main contributors to the complexity are the following operations:
The following is a table showing the amount of running time required for the algorithm, if Q were an unsorted list of size |V| (1st column), or if it were a standard heap (2nd column). Note that ( ) is implied in the table:
DELETEMIN | |V|times * |V| time each = |V|² | |V|times * log|V| time each = |V|log|V| |
DECREASEKEY | |E|times * 1 time each = |E| | |E|times * log|V| time each = |E|log|V| |
OVERHEAD | |V|+|E| | |V|+|E| |
From this, we can conclude that, overall:
Euclidean MST's (EMST's) are utilized in the event of the need to find an MST of a number of nodes in Euclidean space, where each node is actually a vector. It can be conceived as a graph with all possible edges existing, and the distance between two nodes is defined as:
dist (u,v) = || u - v ||
This is a well-known problem. The object is to find the shortest route that a travelling salesman has to take to tour a number of towns.
The salesman starts in a given town, knows all the distances between towns (This is an EMST), and is to tour all the towns, and end up in the same place in which he started.
To find the best possible route we will use the EMST, but the problem is so complex that we will have to use an approximation algorithm to find the solution. The solution to the TSP take at leastexponential time in |v|.
An approximate solution with a near-optimal solution may be obtained at small complexity. The more accurate the solution needed, the greater the complexity.
The following diagrams illustrate the algorithm for solving the TSP, where the edge lengths measure the distance in the plane, and the nodes represent points in the plane.
Initial graph with points (cities) only.
Step 1
Find MST ( time = (|E|log|V|) for an ordinary heap)
(Time = (|V|²) for an unsorted list)
Step 2
Depth-First Search (DFS) => Connections only to new cities
The cities are numbered in the order that they are to be visited.
" DFS-order tour" ( time =(|V|))
Step 3
Connect the nodes in the DFS tour, and close the tour. (Time = (|V|)
The length of this approximate tour is guaranteed to be less than, or equal to, 2* the length of the optimal TSP.
Proof:
This applet will go through the steps of building an MST step by step, following prompts by the user:
Kingston, Canada, August 1995 Proceedings
© Springer-Verlag Berlin Heidelberg, 1995
Page 440: "A Simpler MST Verification Algorithm".
Ottawa, Canada, August 1995 Proceedings
© Springer-Verlag Berlin Heidelberg, 1991
Section 12B, Page 392
Utrecht, The Netherlands, September 1994 Proceedings
© Springer-Verlag Berlin Heidelberg, 1994
Page 331: "An O(n) Work EREW Parallel Algorithm For Updating MST"
Author: Jeffery H. Kingston
© Addison-Wesley Publishers Ltd., 1990
Section 10.9, Page 254
Author: David Harel
© Addison-Wesley Publishers Ltd., 1987
Page 85: "Greedy Algorithms & Railroad Contractors"
Authors: Kenneth A. Berman & Jerome L. Paul
© PWS Publishing Company, 1997
Section 12.5, Page 420
Author: Justin R. Smith
© Oxford University Press, Inc., 1993
Section VI.2.6, Page 350
Author: F. Thomas Leighton
© Morgan Kaufman Publishers, Inc.
Section 1.5.5, Page 136
Author: Luc Devroye
© Birkhäuser Inc., 1986
Pages 83, 92
Graphs and digraphs | A page that contains information, algorithms and examples on graphs and minimal spanning trees. Choi Joong-Min . |
Kruskal's algorithm. | A nice presentation of the Kruskal's algorithm on minimal spanning trees. Richard M. Salter, Oberlin College. |
Minimal spanning tree for electrical engineers | An application of minimal spanning trees for electrical engineers. Richard M. Salter, Oberlin College. |
Surface reconstruction | A project on surface reconstruction. It is another appliccation of the minimal spanning trees. Hugues Hoppe. |
Notes on graph theory | Presentation of minimal spanning trees and their related topics. J E Beasley, Imperial College. |
Course on MST | A very good site on MST. Site of a course similar to CS251 on data structures and algorithms. John Morris, University of Western Australia. |
Minimal spanning tree analysis of fungal spore spatial patterns | Another application of the MST. Cameron L. Jones, Greg T. Lonergan and David E. Mainwaring, entre for Applied Colloid and BioColloid Science, Swinburne University of Technology, School of Chemical Sciences, GPO Box 218, Hawthorn, Victoria, 3122, Australia and Department of Applied Chemistry, Royal Melbourne Institute of Technology, Melbourne, Victoria, 3001 Australia |
This web page was created by
Sebastien typed the document. Simon Pierre wrote the Java applet. Wassim found
the book references and proofread the page. The text is based on class
notes taken by Sebastien. The figures were drawn with
Copyright © 1997, Simon Pierre Desrosiers, Sebastien Emard, Wassim
Mikati, Simon Roy. All rights reserved. Reproduction of all or part of
this work is permitted for educational or research use provided that this
copyright notice is included in any copy. Disclaimer: this collection of
notes is experimental, and does not serve as-is as a substitute for attendance
in the actual class.