## Introduction

The lecture is motivated by two things:

(2) application in which data "live in" various data structures simultaneously.

Augmentation is a process by which one adds fields to the nodes as necessary. Augmenting a data structure can be broken into four steps:

• choosing an underlying data structure
• determining additional information to be maintained in the underlying data structure
• verifying that the additional information can be efficiently maintained for the basic modifying operations on the underlying data structure
• developing new operations

Here we give several examples to show how augmenting red-black trees results in new data structures.

Dynamic order-statistic trees
Binary search trees for browsing
Interval trees

## Dynamic order-statistic trees

ADT: set S; Dictionary operations (Insert, Delete, Search) + two additional operations:

• rank(S,x) : returns rank of node x (smallest rank is 1)
• select(S, i) : select ith smallest element from set S

Implementation :

An order-statistic tree T is simply a red-black tree with additional information stored at each node. Besides the usual red-black tree fields key[x], left[x], right[x], p[x] and color[x] in a node x, we have another field size[x]. This field contains the number of internal nodes in the subtree rooted at x (including x itself), that is, the size of the subtree.

size[x] = size[left[x]] + size[right[x]] + 1

The node of the order-statistic tree will now look as follows :

figure 1

There are couple of things though that we have to worry about:

1. Can size be updated in O(log2n) time per operation?
2. rank(x) and select(S, i) to be done in O(log2n) time.

### Operations on order-statistic trees

Insert

As we know, insertion into a red-black tree consists of two phases. Phase 1 goes down the path from the root inserting the new node as a child of the existing node. To maintain the subtree sizes we simply increment size[x] for each node x on the path traversed from the root down toward the leaves. The new node added gets size 1. Figure 2 illustrates Phase 1.

figure 2

In the Phase 2, the only structural changes to the underlying red-black tree are caused by rotations, of which there are at most two. Rotation is a local operation and it invalidates only the two size fields in the nodes incident on the link around which the rotation is performed. Figure 3 shows Left and Right rotations.

figure 3

Delete

Phase 1 splices out the node we wish to delete. To update subtree sizes we simply traverse a path from node we wish to delete up to the root, decrementing size field for each node on the path. Figure 4 illustrates this process.

figure 4

The rotations in the Phase 2 are handled in the same manner as for insertion.

Running time: to maintain tree sizes in the Phase 1 of insertion or deletion we have to increment or decrement size[x] for each node x on the path from the root down toward the leaves. Since there are O(log2n) nodes on the traversed path, the additional cost of maintaining the size fields is O(log2n). Moreover, rotation is a local operation and hence only O(1) additional time is spent updating size fields in the Phase 2. Thus, both insertion and deletion take O(log2n) time.

Select

The procedure Select(x, i) returns a pointer to the node containing the ith smallest key in the subtree rooted at x.

Select(x, i)

r (rank of root) = size[left[x]] + 1
case r = i : return x
case r > i : return Select(left[x], i)
case r < i : return Select(right[x], i)

Because each recursive call goes down one level in the order-statistic tree, the total time for Select is at worst proportional to the height of the tree. Since the tree is the red-black tree, its height is O(log2n). Thus, the running time of Select is O(log2n).

Rank

The procedure Rank(T, x) returns the position of x in the linear order determined by an inorder tree walk of T. Let x is a pointer to the node in the tree.

Rank(T, x)

r = size[left[x]] + 1
y = x
while y <> root do
if right[parent[y]] = y
then r += size[left[parent[y]]] + 1
y = parent[y]
return r

Since each iteration of the while loop takes O(1) time and y goes up one level in the tree with each iteration, the running time of Rank is at worst proportional to the height of the tree: O(log2n). Figure 5 illustrates the Rank procedure.

figure 5

## Binary search trees for browsing

• MIN
• MAX
• Predecessor
• Successor

Implementation :

The underlying data structure is a red-black tree where besides usual red-black tree fields each node is augmented with Min[x], Max[x], Predecessor[x] and Successor[x]. Figure 6 shows the augmented Red-Black tree on the input {1 2 3 4 5 6 7 8 9 10 11}. Red lines link all nodes in order according to their key values.

figure 6

The data structure in Figure 6 can be looked at as a binary search tree or a sorted linked list as shown in Figure 7. In fact, according to Prof. Devroye, it is a smooth "marriage" of both data structures.

figure 7

When we insert or delete a node we always have to update predecessor and successor pointers. In the Figure 8 we have deleted a new node with a key value 8. Accordingly we have to update successor and predecessor pointers. The time it takes to update pointers is equal to O(log2n), since we either follow a path up the tree or down the tree. Rotation operations take as usual O(1) time, so the running time for insertion or deletion is O(log2n). In figure 8 the updated pointer is colored green.

figure 8

## Interval Trees

Interval tree is a binary search tree for intervals which are efficient for the dictionary operations and overlap. Overlap(x, i) for interval i and a tree rooted at x, returns a pointer to an interval in the collection that overlaps the given interval i or returns NIL otherwise.

The underlying data structure is a red-black tree in which each node x contains an interval int[x] and the key of x is a low endpoint of interval, low[int[x]]. high[x] is the high endpoint of interval.In addition, each node x contains a value max[x] which is the maximum value of any interval endpoint stored in the subtree rooted at x.

figure 10

Operations :

• Determine overlap with some interval in the tree : return Yes or No
• Dictionary operations (Insert, Delete, Search)

figure 11

Given interval i and pointer to the root x Overlap(x, i) returns NIL if no overlap occured or pointer to the node if there is an overlap.

Overlap(x, i)
if x = NIL then return NIL
while x <> NIL and i doesn't overlap int[x] do
if left[x] <> NIL and low[i] <= max[left[x]]
then x = left[x]
else x = right[x]

Running time : the search for the inteval that overlaps i starts with x at the root of the tree and proceeds downward. It terminates when either overlapping interval is found or x becomes NIL. Since each iteration of the basic loop takes O(1) time, and since the height or the red-black tree is O(log2n) Overlap takes O(log2n) time.

Insertion of a node x into a tree consists of two phases. During the first phase x is inserted as a child of an existing node. The value of max[x] can be computed in O(1) time since it depends only on information in the other fields of x itself and x's children, but x's children are both NIL. Once max[x] is computed, the change propagates up the tree. Thus, total time for the first phase is O(log2n). During second phase the only structural changes are caused by rotations. Since only two nodes change in rotation, the total time for updating the max fields is O(log2n) per rotation. Since the number of rotations during insertion is at most two, the total time for insertion is O(log2n).

In the first phase of deletion, changes occur if the deleted node is replaced by its successor, and then again when either the deleted node or its successor is spliced out. Propagating the updates to max caused by these changes costs at most O(log2n) since the changes modify the tree locally. Fixing up the red-black tree during the second phase requires at most three rotations, and each rotation requires at most O(log2n) time to propagate the updates to max. Thus, like insertion, the total time for deletion is O(log2n).

### Problem : Layout of VSLI chips

figure 9

In this section we present a problem which is motivated by automated chip design. The problem is how can we place many printed circuits on one chip without overlapping?

To make the problem more amenable to analysis, we assume that all n rectangles to be aligned with horizontal axis. We also assume that we are given a configuration of the rectangles, and our task is merely to check whether there is any overlap between rectangles.

In the naive solution we try to remove overlapped rectangles and place them randomly again. It takes time of (n²). The more efficient algorithm, which takes O(nlog2n) time, uses a technique known as sweeping.

In sweeping an imaginary vertical sweep line passes through a given set of geometric objects (rectangles), usually from left to right. Sweeping provides a method for ordering geometric objects, usualy by placing them into dynamic data structure.

We sort the rectangles' endpoints by increasing x-coordinate and proceed from left to right. We insert y-direction intervals into an interval tree when its left endpoint is encountered and we delete it from interval tree when its right endpoint is encountered. Whether two segments first become consecutive in the total order, we check if they overlap.

The sweepline algoritms is given below

Sweepline (turning 2-d into 1-d)
- Sort x coordinates of rectangles
- Interval tree for y direction intervals
- for i = 1 to 2n do
if ith point is
start then
if (a, ß) overlap some interval in interval tree - STOP:
Insert((a, ß), interval tree)
endpoint then Delete((a, ß), interval tree)

a is a segment immediately above segment i and ß is a segment immediately below segment i. Sorting takes O(nlog2n) time using mergesort or heapsort. Since there are 2n points the for loop iterates at most 2n times. Each iteration takes O(log2n) time, since each red-black tree operation takes O(log2n) time. To check if there is an overlap takes O(1) time. Thus the total running time is O(log2n).

 Class notes on Augmented Data Structures Class notes on dynamic order statistics Enclosure Algorithm using augmented data types

### References

1. Thoman Cormen,Charles E Leiserson,Ronald L Rivest. Introduction to Algorithms. The MIT Press,1990.
Augmenting Data Structures
2. Daniel F Stubbs & Neil W. Webre. Data Structures with Abstract Data types and Ada. PWS Publishing Company, 1993:411-412.
Motivation and implementation of adding fields to nodes
3. Weiss,Mark. Data Structures and Algorithm Analysis. Benjamin/Cummings Publishing Company, 1995:457-460.
Augmenting Red Black Trees

### Web page creators

Last updated April 8, 1997 by Michael Timofeyev

Copyright © 1997, Michael Timofeyev, Michael Kangas and Matthew Steven Kitching. All rights reserved. Reproduction of all or part of this work is permitted for educational or research useprovided that this copyright notice is included in any copy. Violatorswill be shot!!! Survivors will be shot again!!! Disclaimer: this collection of notes is experimental, and does not serve as-is as a substitute for attendance in the actual class.