## Topic #28: MINIMUM SPANNING TREES

#### Introduction

We are given as a starting point a weighted connected graph(V,E), in which edges have weights or lengths.

Spanning tree: a free tree on V (thus having |V|-1 edges that are a subset of|E|).

Minimum Spanning tree: the spanning tree with minimal total weight, where the weights of the edges picked are summed to obtain a total weight..

The objective of this chapter is to study efficient algorithms for finding minimal spanning trees (or MSTs) given that the graph is stored in adjacency list format.

Particular attention will be paid to the Euclidean minimal spanning tree (EMST) for a set of nodes V that each represent a d-dimensional vector, which is the MST for the complete graph on V where edge weights are Euclidean distances between vectors.

##### This is a minimum spanning tree

In this graph, the red lines are the minimum spanning tree and the black lines are the edges which exist but are not part of the minimum spanning tree..

#### Applications

- Network design:

Design networks that connect n sites at a minimal total cost. Clearly, edge weights now represent costs of connections.

- Visualizing multidimensional data:

As points that are close to each other are connected, multidimensional data (vectors) may be connected by an MST to see how they relate to each other. Each point on the figure below represents a high-dimensional vector of observations or measurements (on bones and bone structure).

##### This is a MST showing what mst tree can show us based upon bone measurements

From such a graphs or others like it, one may deduce patterns in evolution. One may also group animals in certain ways. For example, by removing the longest edge in the MST, one obtains two groups, also called clusters. If we remove the k longest edges, we obtain k+1 clusters. This may be useful for clustering objects.

- Clustering. Classification. Taxonomy.

Biology and science in general is preoccupied with grouping objects, plants or animals. We refer to this as classification or taxonomy. One way to proceed is to make the objects into vectors as in the example of mammal's milk ( see the table ), to find the EMST, and to remove the k longest edges. This yields the grouping shown in the mammal's milk example.

##### Mammal's Milk Table - (Taken from the Handbook Of Biological Data, 1956, Saunders)
 ANIMAL WATER PROTEIN FAT LACTOSE HORSE 90.1 2.6 1.0 6.9 DONKEY 90.3 1.7 1.4 6.2 MULE 90.0 2.0 1.8 5.5 CAMEL 87.7 3.5 3.4 4.8 LLAMA 86.5 3.9 3.2 5.6 ZEBRA 86.2 3.0 4.8 5.3 SHEEP 82.0 5.6 6.4 4.7 BUFFALO 82.1 5.9 7.9 4.7 FOX 81.6 6.6 5.9 4.9 PIG 82.8 7.1 5.1 3.7 RABBIT 71.3 12.3 13.1 1.9 RAT 72.5 9.2 12.6 3.3 DEER 65.9 10.4 19.7 2.6 REINDEER 64.8 10.7 20.3 2.5 WHALE 64.8 11.1 21.2 1.6

#### Fundamental Property

Observe that MSTs need not be unique, so we will speak of an MST, not the MST. If S is a subset of the edges which taken together form a forest (two or more trees), then we say that S is a partial solution if by adding edges to S, we may obtain an MST.

If S is a partial solution, then the fundamental property of MSTs is the following:

Partition the vertices into two nonempty sets L and R (for left and right) such that all edges of S are between nodes of L, or between nodes of R. Let B (for bridges) be the collection of all bridges between L and R, i.e., all edges (u,v) with u in L, v in R. Define the minimal bridge as the bridge in B of shortest weight or length, and call it (u,v). Then (u,v) is a safe edge, that is, S union (u,v) is also a partial solution.

Safe edge def: An edge that can be added to the MST while respecting the properties of the MST

Proof:

Proof is by contradiction. You have a partial solution S, which consists of several connected components. Then, you put a separator (the vertical red line) so that there is at least one component on each side of this separator. Assume you have a MST (blue lines) that extends the partial MST and that does not include the shortest bridge. Add the shortest bridge so now you have exactly one cycle with at least 2 bridges. Replace another bridge i.e. remove it from the MST. So, again, you have a tree but now of shorter length. So, the original MST (the proposed one) was not really a MST which completes the proof.

In class we were asked to to show that each node is connected to its nearest neighbour, this is what we have proved in the graph found above. This example may be generalized to every node. If we repeat this proof to every node we will have provred that every node is connected to its closest neighbour.

#### The Prim-Dijkstra Algorithm:

• This algorithm grows only one component: The separator is an abstract line drawn so as to separate the main tree from the islands, the nodes not connected to the tree.
• The set, at any point in the algorithm, consists of two subsets: the partial MST (Minimal Spanning Tree), and the subset Q.
• Initially, MST is empty, and Q contains the entire set.
• Q in the algorithm is defined as a priority queue for a set of nodes namely those nodes not yet in S. It is meant to shrink as nodes are added to the partial MST by adding edges, until it becomes the null set, and the MST becomes complete.
• The function dist(u,v) is defined as the distance, or the length of the edge, between vertices u and v. If no such edge exists, then dist(u,v) is defined as infinity.
• Every node in Q has two unique qualities:
• d[u]: The distance of node u in queue Q to the nearest node not in Q (The length of the smallest bridge between u and any node in the partial MST)
• p[u]: A pointer from u that corresponds to d[u], in that it points to the node in S that is closest to node u.
• There are three defined functions on the points in Q:
• DELETEMIN(Q): Returns the node u with the smallest d[u], and removes u from Q.
• DECREASEKEY(u,d[u]): Decreases the value of the key.
• MAKEQUEUE(Q): Turns the array into a priority queue. It is formed in the beginning of this algorithm.
• The main concept of this algorithm is that if, for all the elements v in Q, if d[u] is the smallest, then (u,p[u]), the smallest possible bridge between Q and MST, is classified as a safe edge. The node u is then transferred from Q to MST, and the process continues until Q is emptied out.
• ALGORITHM:

MAKEQUEUE(Q)

Q <= {2,3,4,....,n}

dist(2,1),...,dist(n,1)

For all n belonging to Q:

If (u, 1) is an edge

| d[u] <= dist(u,1)

| p[u] <= 1

else

d[u] <= infinity

p[u] <= nil

| 1 if u is connected to 1

| NIL

MST <= { }

While not empty (Q) Do

| u <= DELETEMIN(Q)

| MST <= MST + (u,p[u])

| for all neighbours v of u do:

if v is an element of Q and if dist(u,v) < d[v] then

| d[v] <= dist[u,v]

| p[v] <= u

| DECREASEKEY (v,d[v])

##### Complexity

The main contributors to the complexity are the following operations:

• DELETEMIN is executed |V| times, since there are |V| elements in Q to begin with, and with every execution of DELETEMIN, Q has one less element;
• DECREASEKEY is executed at MOST |E| times, since when it is executed, it goes along an edge to the next node, and the algorithm never traverses an edge twice;
• All other parts of the algorithm take approximately O(|V| + |E|) time, because the line "MST <= MST + (u,p[u])" is done as many times as is DELETEMIN (~|E|), and d[u] and p[u] have to be initialized for all u's (~|V|).

The following is a table showing the amount of running time required for the algorithm, if Q were an unsorted list of size |V| (1st column), or if it were a standard heap (2nd column). Note that (  ) is implied in the table:

 DELETEMIN |V|times * |V| time each = |V|² |V|times * log|V| time each = |V|log|V| DECREASEKEY |E|times * 1 time each = |E| |E|times * log|V| time each = |E|log|V| OVERHEAD |V|+|E| |V|+|E|

From this, we can conclude that, overall:

• Time for an unsorted list: (|V|²)
• Time for a standard heap: (|E|log|V|)

#### Euclidean MST

Euclidean MST's (EMST's) are utilized in the event of the need to find an MST of a number of nodes in Euclidean space, where each node is actually a vector. It can be conceived as a graph with all possible edges existing, and the distance between two nodes is defined as:

dist (u,v) = || u - v ||

#### Example of the travelling salesman problem

This is a well-known problem. The object is to find the shortest route that a travelling salesman has to take to tour a number of towns.

The salesman starts in a given town, knows all the distances between towns (This is an EMST), and is to tour all the towns, and end up in the same place in which he started.

To find the best possible route we will use the EMST, but the problem is so complex that we will have to use an approximation algorithm to find the solution. The solution to the TSP take at leastexponential time in |v|.

An approximate solution with a near-optimal solution may be obtained at small complexity. The more accurate the solution needed, the greater the complexity.

The following diagrams illustrate the algorithm for solving the TSP, where the edge lengths measure the distance in the plane, and the nodes represent points in the plane.

Initial graph with points (cities) only.

Step 1

Find MST ( time = (|E|log|V|) for an ordinary heap)

(Time = (|V|²) for an unsorted list)

Step 2

Depth-First Search (DFS) => Connections only to new cities

The cities are numbered in the order that they are to be visited.

" DFS-order tour" ( time =(|V|))

Step 3

Connect the nodes in the DFS tour, and close the tour. (Time = (|V|)

The length of this approximate tour is guaranteed to be less than, or equal to, 2* the length of the optimal TSP.

Proof:

• Imagine a path drawn all around the MST (A boat tour). This path is always greater than or equal to the length of the DFS tour (By triangle inequality)
• Length (DFS-order tour) <=; Length (Boat Tour)
• = 2 * Length(MST)
• <=; 2 * Length(Optimal Tour)
• Therefore, Length (DFS-order tour) <=; 2 * Length(optimal Tour).

### The MST Demonstrator Applet.

This applet will go through the steps of building an MST step by step, following prompts by the user:

• When the "Start" button is pressed, a randomly generated set of seperate nodes and the available links (in black), along with the lengths of these edges (numbers next to the edges), will appear on the window.
• The applet will always start the algorithm from the top-left node.
• Click on "Next Edge". The edge between the starting point and its nearest neighbour, along with the length of the edge and the two points, will be highlighted in red. All of the edges between the second point in the MST and its blue neighbours will be highlighted in blue. This is to show the multiple possible moves from here.
• Every time the user clicks on "Next Edge", another edge will be colored red, and added to the MST, until the MST is complete.
• When done, click on the "OK" button to eliminate the blue edges, so that the user can better see the completed MST.

Source code for this applet.

### References

• Algorithms And Data Structures, 4th International Workshop, WADS '95
• Kingston, Canada, August 1995 Proceedings

Page 440: "A Simpler MST Verification Algorithm".

• Algorithms And Data Structures, 2nd Workshop, WADS '91
• Ottawa, Canada, August 1995 Proceedings

Section 12B, Page 392

• Algorithms - ESA '94; Second Annual European Symposium
• Utrecht, The Netherlands, September 1994 Proceedings

Page 331: "An O(n) Work EREW Parallel Algorithm For Updating MST"

• Algorithms And Data Structures: Design, Correctness, Analysis
• Author: Jeffery H. Kingston

Section 10.9, Page 254

• Algorithmics: The Spirit Of Computing
• Author: David Harel

Page 85: "Greedy Algorithms & Railroad Contractors"

• Fundementals Of Sequential And Parallel Algorithms
• Authors: Kenneth A. Berman & Jerome L. Paul

Section 12.5, Page 420

• The Design And Analysis Of Parallel Algorithms
• Author: Justin R. Smith

© Oxford University Press, Inc., 1993

Section VI.2.6, Page 350

• Introduction To Parallel Algorithms And Architectures - Arrays, Trees, Hyperboles
• Author: F. Thomas Leighton

Section 1.5.5, Page 136

• Lecture Notes On Bucket Algorithms
• Author: Luc Devroye

Pages 83, 92

 Graphs and digraphs A page that contains information, algorithms and examples on graphs and minimal spanning trees. Choi Joong-Min . Kruskal's algorithm. A nice presentation of the Kruskal's algorithm on minimal spanning trees. Richard M. Salter, Oberlin College. Minimal spanning tree for electrical engineers An application of minimal spanning trees for electrical engineers. Richard M. Salter, Oberlin College. Surface reconstruction A project on surface reconstruction. It is another appliccation of the minimal spanning trees. Hugues Hoppe. Notes on graph theory Presentation of minimal spanning trees and their related topics. J E Beasley, Imperial College. Course on MST A very good site on MST. Site of a course similar to CS251 on data structures and algorithms. John Morris, University of Western Australia. Minimal spanning tree analysis of fungal spore spatial patterns Another application of the MST. Cameron L. Jones, Greg T. Lonergan and David E. Mainwaring, entre for Applied Colloid and BioColloid Science, Swinburne University of Technology, School of Chemical Sciences, GPO Box 218, Hawthorn, Victoria, 3122, Australia and Department of Applied Chemistry, Royal Melbourne Institute of Technology, Melbourne, Victoria, 3001 Australia

### Web page creators

This web page was created by

Sebastien typed the document. Simon Pierre wrote the Java applet. Wassim found the book references and proofread the page. The text is based on class notes taken by Sebastien. The figures were drawn with xfig by Simon.

For all comments or suggestions, please write to us at: Sebastien Emard.

Copyright © 1997, Simon Pierre Desrosiers, Sebastien Emard, Wassim Mikati, Simon Roy. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in any copy. Disclaimer: this collection of notes is experimental, and does not serve as-is as a substitute for attendance in the actual class.