Only the following operations are associated with this A.D.T. :
Observe that the standard dictionary operations are missing. For example, search (x) and delete
(x) are not in the list of operations. Also, sets can never be split or shrink in size.
To simplify things, we make two convenient assumptions :
1) EQUIVALENCE RELATIONS AND EQUIVALENCE CLASSES
Recall that a relation on a set A is called an equivalence relation if it is reflexive, symmetric, and transitive. The operation congruence modulo n is an equivalence relation. The statement a ~ b indicates that elements a and b are in the same set. It follows that to create the equivalence classes (disjoint sets) of a given collection of equivalence relations, we must proceed as such :
A good example of this application would be the equivalence relation congruence modulo 5. In
this situation, elements {0,5,10,...} are in the same equivalence class. The usual representative of
the class is 0. Element 7 is not in the set whose representative is 0.
The following animated graphic is a second equivalence relation example that uses pixels as elements.
In the preceding image, A, B and C are three equivalence classes. A pixel is an element of a class
if it is within that class' boundaries. Furthermore, two pixels are equivalent if they are both in the
same set.
2) THE FORTRAN PROGRAMMING LANGUAGE
In this particular programming language, it is possible to reserve the same address for more than one variable. The variables can even be of different types. This gives the user the ability to manipulate for instance, integer variables in character format. The Fortran compiler intrinsically has to determine the partition of the elements (in this case the variables) from a given set of equivalence relations.
EQUIVALENCE (A, B, C) , (D, E) , (E, F)
This allows int A, char B and float C, for example, to hold the same value at the same memory
address. If the contents of variable B are changed at some point in the execution of the program,
then so will be the contents of variables A and C.
3) COLORING OLD MOVIES
Every frame of an old black and white movie is full of pixels. In order to color this movie, we would need to assign a color to each pixel in such a fashion that equivalent pixels have the same color. Equivalent pixels are those that logically belong together as, for example, the ones representing an actor`s sweater. On a frame, adjacent pixels with either identical or slightly different gray levels may very well be defined as equivalent. Therefore, a gray level frame of pixels defines possibly thousands if not millions of equivalence relations between adjacent pixels. In order to separate the equivalence classes of these relations, a 4-pixel square scan technique is used to analyse pixel x's neighbors. This scanning and thus entering of the equivalence relations between pixels is done from top left to bottom right, row by row.
The outer region of the U shape is colored in yellow. The leftmost upper edge is colored in a different color, here green. Once the rightmost edge is reached by the scan, it is filled in yet another different color, here blue. Finally, once the lower rightmost corner is reached, it is first colored blue but then changed to green as the scan only then "realises" that there is one figure and not two distinct elements. Therefore, all previously blue colored pixels are changed to green as every pixel forming the U shape is in the same distinct set. This coloring problem expands to a 3-dimensional one once more frames are considered and colored.
A) In Array State
We can setup a table of n rows and 2 columns. The first column will hold the n elements whereas the second column will contain the representative of the set in which the current element is found. The following is an example of this structure.
1 | 3 |
2 | 2 |
3 | 3 |
... | ... |
n | 3 |
Obviously, this setup is considered to be static as it can only hold a maximum number of
elements.
B) Linked List
To avoid the static problem that arises in the array form, we can also structure the data in a linked
list format. In this implementation, every node points to its representative. This is true even for
the representative node, it points to itself.
These two setups allow the FindSet operation to be performed in constant time. However, in the
linked list structure, the Union (x,y) operation takes time Omega (n). The operation goes as
follows:
Step 1) The nil pointer of the last node of the set with the most elements is changed to point to
the representative of the other distinct set.
Step 2) All the representative pointers of the smaller set are changed to point to the representative
of the first set.
The following animation illustrates the initial and final state of the Union (2,1) operation.
The above heuristic is called the weighted union heuristic.
BENCHMARK TEST
Before we go on to the second possible implementation of distinct set structures, we can now
define a benchmark test for the operations on this A.D.T. Suppose we have n singleton sets.
Obviously, to obtain one complete set, we must apply n-1 Union (x,y) operations.
We must consider each step of the transformation from n singleton sets to one final set to
understand the time requirements.
Union can thus be considered as taking [ O(n) + k representative pointer changes] time units.
Take element x to be in a singleton set. The size of the set is of 1. When it is subject to a Union operation where its representative pointer changes, the size of the new set in which x now lives in is at least twice as large. Let us now assume that the representative pointer of x changes k times then following the previous observation, the size of the new set containing x would have to be greater then or equal to at least 2 to the power k but always smaller than or equal to n.
Therefore, even if the worst case time of one union is O(n), we see that the total time of this benchmark test is :
As for the second data structure implementation, it is based on the philosophy that the Union operation should be fast.
In this structure, where p[x] is the parent of x, the FindSet operation`s algorithm looks like this.
The time of this operation within this setup is given by the distance of the node x to the root. The
worst case time is thus given by the height of the tree.
Within the parent pointer tree implementation, we must define a new operation that will however remain private. This new operation, Link (x,y), also assumes that x and y are distinct set representatives and makes y`s representative pointer point to x as such
This simple operation takes time O(1).
With the preceding two functions as defined, Union (x,y) is simply implemented as :
and also the operation MakeSet is as simple as :
The obvious and unfortunate problem that arises from this parent pointer tree structure is the
formation of linear trees which leads to terrible worst case times for the operations. To counter
this, two heuristics have been developed.
1) Weighted Heuristic
Small sets are merged into larger sets. This makes the height of any tree smaller than or equal to log to the base 2 of n.
Proof :
Upon creation, all elements are their own representatives. Let us now follow one such element x
involved in Union operations. Every time the distance from x to its representative grows by one,
the size of the set containing x doubles but can become no greater than log to the base 2 of n, the
height of the tree. This is sometimes called the doubling argument.
2) Path Compression
During the FindSet operation, we make each node on the find path point directly to the root.
The code for the FindSet (x) operations looks like this :
This FindSet operation is based on a two-pass method : it first makes one pass up the find path to locate the root and a second pass up the find path to update each node so that it points directly to the root. The time of the clean-up is exactly equal to the time of the activity. For instance, on the given graphic above, the Findset operation takes 6 time units and so, the path compression heuristic also only costs 6 time units.
LINK | FINDSET | BENCHMARK | |
Linked-List
+ heuristic |
O(1) | ||
Parent Pointer Tree | O(1) | ||
Parent Pointer Tree
+ heuristics |
O(1) |
* Actually, the running time of the benchmark for a parent pointer tree with the 2 heuristics is O(n log*n). Log* is defined in Topic 3.
Disjoint set structures class note | New Jersey Institute of Technology Computer Science 435 Course : Topic 24. |
Disjoint set structures operation algorithms | Mark Allen Weiss' algorithms of operations on disjoint set structures. |
Disjoint set structures algorithm | Mark Allen Weiss' algorithm on disjoint set structures. |