School of Computer Science

Winter 1997 Class Notes for 308-251B

DATA STRUCTURES AND ALGORITHMS

Undirected Graphs The DFS Algorithm Complexity Searching for Cycles Directed Graphs Properties of DFS Algorithm and Tree |
Euler Tours Java Applet Links References Credits |

We have already seen various tree traversals, for example preorder, inorder and postorder; now we look at graph traversals. We will use the notation (V,E) for a graph, where V represents the set of vertices and E the set of edges. Graph traversals are used constantly when dealing with graphs, e.g. when looking for sinks or cycles, or when colouring. There are two types of graph traversals: depth-first search and breadth-first search, often abbreviated DFS and BFS respectively. In some instances one type of search or the other may be more desirable. With the Adjacency List implementation of the graph, both types of traversals give O( |V| + |E| ) time for a complete traversal. This time is optimal, since in order to completely traverse a graph using any method, you must visit all of the vertices and edges. The key to a fast traversal is avoiding loops - you don't want to get stuck going around and around a cycle endlessly.

Phase |
Condition of Node |
Colour |

1 | undiscovered | white |

2 | discovered and being processed | grey |

3 | finished | black |

Colouring the nodes acts as a memory device - it is like dropping bread crumbs in a maze in order to know where they've already been.

Each node u has two times associated with it: the discovery time d[u] and the finish time f[u]. The discovery time corresponds to the colour change from white to grey and the finish time corresponds to the colour change from grey to black. Time starts at 1, and with each labelling of an event (discovery or finish) the time is incremented by one. One drawback of this is that with many nodes, the storage required for all of these times becomes huge; however, even if they are not stored, the times are useful as abstract concepts to aid us in understanding the algorithm.

Each connected component of the graph can be made into a tree by removing some of the edges. Each node u in the tree will have a pointer to its parent p[u], making this a parent-pointer tree.

[Back to Top]for all nodes u do { colour[u]=white; p[u]=NULL; } time = 0; for all nodes u do if colour[u] == white then DFS(u);This is the algorithm itself, which performs the depth-first search:

DFS(u): visit(u); time = time + 1; d[u] = time; colour[u] = grey; for all nodes v adjacent to u do { if colour[v] == white then { p[v] = u; DFS(u); } } time = time + 1; f[u] = time; colour[u] = black;

For a convention, the neighbours in the following example are considered in alphabetical order. The traversal starts at the I.

- I turns grey, d[I] = 1
- E turns grey, d[E] = 2
- E has two neighbours, B and F. By the convention B is visited first: B turns grey, d[B] = 3
- Similarly, A, C, & D are visited
- D has 4 neighbours: A, C, G, H. A and C are grey so G is visited first, then F.
- All of F's neighbours are grey so F is finished: f[F] = 9, F turns black.
- Similarly G is finished
- H is visited because we have not finished visiting all of D's neighbours
- The remaining nodes are finished in the reverse order, backtracking along the original path, and ending with I.

The resultant parent-pointer tree has its root at I, since this is the first node visited. Each new node visited becomes the child of the most recently-visited node. Also, note that while I is the first node to turn grey, it is the last node to turn black (within the main connected component). This is due to the recursion - each of I's neighbours must be processed and finished (turned black) before I can be finished (turned black). As well, all edges of the graph which are not included in the tree (not used in the traversal) are between a node and its ancestor (because all neighbours of a node are visited from that node); these are known as vertical edges. This property of the DFS tree contrasts it with the BFS tree.

The maximum time index in the example is 18, equal to 2 times the number of nodes. This is because time is incremented exactly when a discovery time or a finish time is assigned, and each node is assigned one of each during its lifetime. The time should not be confused with the complexity of the algorithm, which we will discuss soon.

If u is discovered before v, then there are two possibilities:

1) If v is discovered while u is being processed (is grey), then v will be finished before u. This is called the Nested Interval Property, and is a consequence of the recursive nature of the algorithm: when v is discovered it is "pushed" on to the recursion stack, and u being processed at that time means that u is lower down in the stack. Therefore v will be "popped" before u, i.e. finished before u.

2) The other possibility is that u is finished before v is discovered. In the sample graph above, F and H are an example of this: their start and finish times are 8, 9 and 11, 12 respectively.

Note that the situation depicted here is impossible, by the argument in (1) above:

DFS trees are usually elongated, but are useful because they display the entire recursion path. As well, to determine if any node is a descendant of any other node, one need only compare their times d[u] and f[u]. This is much faster and easier than performing the same task for ordinary binary trees without additional information.

Here is a snapshot of a DFS tree in mid-traversal:

The grey nodes in the tree always form a chain. This represents the current chain of recursion. Note that the black hanging pieces cannot be connected to other parts of the grey chain, since the black section is finished - anything connected to it which is not above it in the tree (below it in the recursion stack) must also be black.

An important property is that:

*The white path property states that v is a descendant of u
if and only if, at the time of discovery of u, there is a white
path from u to v.*

If v is a descendant of u, then at time d[u], there must be at least one white path from u to v for the DFS algorithm to follow. Conversely, if there is a white path from u to v at time d[u], then the last node on that path is a neighbour of v which has not yet been processed.

One useful aspect of the DFS algorithm is that it traverses connected components one at a time, and thus it can be used to identify the connected components in a given graph. As an example, if a handwriting sample is scanned into a pixelized image and DFS is performed on that image, all connected components below a certain threshold size (which represent unwanted dirt in the sample) can be removed.

[Back to Top]There are two loops in the algorithm; the first one loops through all vertices and the second loops through all neighbours of a particular vertex. All other operations performed within these loops, such as changing times and colours, are considered O(1). The time spent on assignment statements in the outer loop is O(|V|). The loop through all neighbours of all vertices takes

Therefore the total complexity is O(|V| + |E|), which is optimal, as mentioned in the introduction.

We start by looking at a cycle in an undirected graph.

Assume that d[u] < d[v]. The straight edge between u and v is looked at twice during DFS, once from u and once from v (while looping through neighbours). At time d[u], v is white and at time d[v], u is grey (keep in mind that there is no way to ever look at a black node - this follows from the comment about the snapshot of a DFS tree).

More generally, we have the following theorem:

An undirected graph has a cycle if and only if at some point during the traversal, when u is grey, one of the neighbours v of u considered in the inner loop of the DFS is coloured grey and is not p[u].

Proof: If we arrive at v and find that u is grey (and u is not p[v]) then u must be an ancestor of v, and since we travelled from u to v via a different route, there is a cycle in the graph. Conversely, suppose u and v are included in a cycle:

Since d[u] < d[v], when v is reached and becomes grey, u is already grey so v has a grey neighbour. The result can be summarized as:

If we apply this theorem to the problem posed above, the search for a cycle in an undirected graph becomes the search for grey neighbours. If there is no cycle then the maximum number of edges is |V| - 1, and if there is a cycle, it will be found before the |V|th edge is looked at, so in both cases the DFS algorithm will terminate its search for a cycle in O(|V|) time.

[Back to Top]The situation is now completely different. The same DFS algorithm can be used, however previously derived properties are no longer necessarily applicable to this new case. For example, with directed graphs DFS visits each connected component in turn; in fact, DFS can be used to find the connected components, such as tiny "islands" of dirt in a handwriting sample. With directed graphs, connected components do not even exist in the sense that they are not well-defined (explanation below).

We now define 4 types of DFS tree edges found in the DFS tree of a directed graph. Each edge is classified at the time it is first looked at, and from the point of view of the node occupied by DFS at that point.

- Tree edge - an edge to a white neighbour.
- Back edge - an edge to a grey neighbour (an ancestor).
- Forward edge - an edge to a black descendant (impossible in an undirected graph). The edge from A to C is an example.
- Cross edge - an edge to a black neighbour which is not a descendant, e.g. the edge from D to C.

Note that if the traversal began at C, C would turn black immediately, thereby forming its own component, if we assume that DFS visits components one at a time. However, starting at A makes the whole graph into one component, so the number of components depends on where the traversal starts. In spite of this it is still true for directed graphs that cycles occur exactly where there are grey non-parental neighbours.

[Back to Top]The problem is to traverse the graph while visiting each edge exactly once, and to finish at the starting point (this is known as the Euler tour). An Euler graph is a graph which contains an Euler tour; the figure on top is an Euler graph, but the figure below it is not. The reason is the difference in degree sequences. The degree of a node in an undirected graph is the number of edges emanating from it. If all the degrees are even then the graph can be traversed in the manner described. This is because each node must be arrived at and departed from the same number of times, and each time using a different edge. The converse is also true: all connected undirected graphs which are Euler graphs have even degrees at all nodes.

In a directed graph, define an "in" degree to be the number of incoming edges, and an "out" degree to be the number of outgoing edges.

A connected directed graph is an Euler graph if and only if every node's "in" degree equals its "out" degree. It is relatively easy to check if a graph is an Euler graph, but the more difficult task is to actually find the Euler Tour. This requires a modified version of the DFS (see Cormen, Leiserson and Rivest on page 485, Exercise 23.3-9), and can be done in O(|V| + |E|) time.

[Back to Top]R. Decker.

E. Horowitz, S. Sahni.

[Back to Top]

Depth First Search |
Mani Chandy's notes on Depth First Search for undirected graphs from Caltech, with postscript slides. |

Depth First Search in Directed Graphs |
Mani Chandy's notes on Depth First Search for directed graphs from Caltech, with postscript slides. |

BFS and DFS |
Lecture notes by David Eppstein from UC Irvine. |

Depth-First Search |
Two implementations of DFS by Thomas A. Anastasio at UMBC: one using a tree and the other using a stack. |

Lecture: Depth-First Search |
Simple lecture notes by Oklahoma State's Nick Street illustrating modified DFS algorithm for solving Euler Graph problem, with sample code. |

Graphs: Search |
DFS lecture notes from Tom Leblanc at the University of Rochester with sample code and applictations. |

Advanced Data Structures and Algorithms |
Course outline from Dina Kravets at NJIT including a postscript file on DFS and BFS. |

Simple Search Techniques |
DFS and BFS as applied to the traveling-salesperson problem, by Nikos Drakos at Leeds. |

Euler, Leonhard (1707-1753) |
Biographical information on the famous 18th-century mathemetician, Leonhard Euler, by Paul Golba. |

This wonderful web work was written by Rob Kravitz, who hacked Java to no end and panicked about Rich; Rob David, who composed the text and panicked about Rich; and Rich Lafferty who caused tremendous stress by disappearing for half a week, and threw together the HTML during a power failure.

*
Copyright © 1997 by the authors. All rights reserved. Reproduction of all or part of this
work is permitted for educational or research use provided that this copyright notice is
included in any copy. Disclaimer: this collection of notes is experimental, and does not
serve as-is as a substitute for attendance in the actual class. *