Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
252
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 9

Looking at these results, it’s plain to see that sorting after each insertion is certainly not a viable option. If you want the data to remain in sorted order while inserting, binary insertion wins hands down, performing at least 1,000 times fewer comparisons!

One final point to note: Because you would preferably use an array list (or some other data structure that supports fast, index-based lookup) to achieve the desired performance, inserting new items can be relatively slow. (Recall that an array list must copy all data items one position to the right after the insertion point to make room.) With small data sets, the overhead is negligible. In larger data sets, however, this can have a noticeable impact on performance.

Summar y

In this chapter, you have learned the following key points:

Binary searching uses a divide-and-conquer approach to locating a search key and in doing so achieves an average number of comparisons that approaches O(log N).

Binary searching can be achieved efficiently using either recursion or iteration.

Binary searching works best when used with a data structure that supports fast, index-based lookup.

Binary insertion builds on binary searching to add items to a list while maintaining the sort order using O(log N!) comparisons.

Binary insertion proves to be very efficient, performing as well as — and arguably better than — some of the sorting algorithms.

224

10

Binar y Search Trees

Chapter 9 introduced a binary search algorithm that enables you to efficiently search a sorted array list. Unfortunately, a binary search algorithm suffers when it comes to insertions and deletions as elements are copied around. Binary search trees, on the other hand, can achieve the O(log N)average search/insert/delete time of the binary search algorithm without the associated overhead. By storing values in a tree structure — where values are linked together — it’s easy to insert new values and remove deleted ones.

Unlike most other chapters, this chapter is largely theoretical. That is, you don’t work through any practical examples because binary search trees actually form the basis for many other data structures. This chapter is confined to a discussion about how binary search trees work, rather than how you use them in practice. In later chapters on sets (Chapter 12), maps (Chapter 13), and B-Trees (Chapter 15), you’ll see how these other data structures are built using the code in this chapter as a template.

This chapter discusses the following topics:

Characteristics that make binary search trees so interesting

Various binary search tree operations and how each works

How the ordering of data can affect performance

A simple technique for restoring the balance after insertion and deletion

How to test and implement a nonbalancing binary search tree

Chapter 10

Understanding Binar y Search Trees

Chapter 2 mentioned how a file system is often referred to as a directory tree. More formally, though, a tree is a set of linked nodes whereby each node has zero or more children and at most one parent. One of the nodes is singled out as the root, and is the only node without a parent. Nodes without children are referred to as leaf nodes.

Just like a directory tree with any number of directories, subdirectories, and files, trees can have any number of children. However, a very common type of tree is a binary tree, so called because each node may have at most two children. (These children are often referred to as the left and right.)

A binary search tree is a binary tree with one more restriction: All children to the left of a node have smaller values, whereas all children to the right will have larger values.

Figure 10-1 shows an example of a binary search tree. Here, the letter I is the root node and the letters A, H, K, and P are all leaf nodes. The value of every child to the left of a node is smaller, and the value of every child to the right is larger, than the value of the node itself.

root

left (smaller)

I right

(larger)

D L

A F K M

H P

leaf nodes

Figure 10-1: A simple binary search tree.

These properties allow very efficient searching, insertion, and deletion. In fact, the average search time for a binary search tree is directly proportional to its height: O(h). In the case of the example shown in Figure 10-1, the height of the tree — the longest path from the root node to a leaf — is three, so you can expect the average search time to be in the order of three comparisons.

For a balanced tree — such as the one in Figure 10-1 — the height of the tree is O(log N). However, under certain circumstances, the height of the tree can degenerate, leading to a worst-case search time of O(N).

226

Binary Search Trees

Minimum

The minimum of a binary search tree is the node with the smallest value, and, thanks to the properties of a binary search tree, finding the minimum couldn’t be simpler. Simply follow the left links starting from the root of the tree to find the one with the smallest value. In other words, the minimum is the leftmost node in the tree.

In the example shown in Figure 10-1, if you follow the left links starting from the root node, I, all the way down to a leaf node, you end up with the letter A — the smallest value in the tree.

Maximum

Whereas the minimum of a binary search tree is the node with the smallest value, the maximum is the node with the largest value. Finding the maximum is very similar to finding the minimum except that you follow the right links instead of the left. In other words, the maximum is the rightmost node in the tree.

To prove it, try following the right links starting from the root node in Figure 10-1 all the way down to a leaf node. You should end up with the letter P — the largest value in the tree.

Successor

A node’s successor is the one with the next largest value in the tree. For example, given the tree shown in Figure 10-1, the successor of A is D, the successor of H is I, and the successor of I is K. Finding the successor is not that difficult, but it involves two distinct cases.

In the first case, if a node has a right child, then the successor is the minimum of that. For example, to find the successor of I, we see that it has a right child, L, so we take its minimum, K. The same holds for the letter L: It has a right child, M, so we find its minimum, which in this case happens to be M itself.

Conversely, if the node has no right child — as is the case with H — you need to search back up the tree until you find the first “right-hand turn.” By this we mean that you keep looking up the tree until you find a node that is the left child, and then use its parent. In this example, you move up the tree moving left (that is, following right-hand links) from H to F and then left again to D, and finally you make a right-hand turn to I.

Predecessor

The predecessor of a node is the one with the next smallest value. For example, the predecessor of P is M, the predecessor of F is D, and the predecessor of I is H.

The algorithm for finding the predecessor is essentially the inverse of what you’d use for the successor, and it involves two similar cases. In the first case, if a node has a left child, then you take its maximum. In the second case — whereby the node has no left child — you work up the tree until you find a “lefthand” turn.

227

Chapter 10

Search

When searching for a value in a binary search tree, you start at the root node and follow links left or right as appropriate until either you find the value you are looking for or there are no more links to follow. This can be summarized in the following steps:

1.Start at the root node.

2.If there is no current node, the search value was not found and you are done. Otherwise, proceed to step 3.

3.Compare the search value with the key for the current node.

4.If the keys are equal, then you have found the search key and are done. Otherwise, proceed to step 5.

5.If the search key sorts lower than the key for the current node, then follow the left link and go to step 2.

6.Otherwise, the search key must sort higher than the key for the current node, so follow the right link and go to step 2.

The following example shows how you would search for the letter K in the binary search tree shown in Figure 10-1.

Starting with the root node (step 1), you compare your search value, K, with the letter I, as shown in Figure 10-2.

I

D L

A F K M

H P

Figure 10-2: A search always starts with the root node.

Because K comes before I, you follow the right link (step 6), which leads you to the node containing the letter L (see Figure 10-3).

228

Binary Search Trees

 

 

I

 

 

D

 

L

A

F

K

M

 

H

 

P

Figure 10-3: Follow the right link when the search key sorts after the key at the current node.

You still don’t have a match, but because K is smaller than L, you follow the left link (step 5), as shown in Figure 10-4.

 

 

I

 

 

D

 

L

A

F

K

M

 

H

 

P

Figure 10-4: Follow the left link when the search key sorts before the key at the current node.

Finally, you have a match (see Figure 10-4): The search value is the same as the value at the current node (step 4), and your search completes. You searched a tree containing nine values and found the one you were looking for in only three comparisons. In addition, note that you found the match three levels down in the tree — O(h).

Each time you move down the tree, you effectively discard half the values — just as you do when performing a binary search of a sorted list. In fact, given a sorted list, you can easily construct the equivalent binary search tree, as shown in Figure 10-5.

229

Chapter 10

0

1

2

3

4

5

6

7

8

A

D

F

H

I

K

L

M

P

4

I

 

1

 

6

 

D

 

L

0

2

5

7

A

F

K

M

 

3

 

8

 

H

 

P

Figure 10-5: A sorted list depicted as a balanced binary search tree.

If you compare the search you just made with the examples in Chapter 9, you will find the order of comparisons to be identical. That’s because a balanced binary search tree has the same performance characteristics as a sorted list.

Insertion

Insertion is nearly identical to searching except that when the value doesn’t exist, it is added to the tree as a leaf node. In the previous search example, if you had wanted to insert the value J, you would have followed the left link from K and discovered that there were no more nodes. Therefore, you could safely add the J as the left child of K, as shown in Figure 10-6.

 

I

 

 

 

D

 

L

A

F

K

M

 

H J

 

P

Figure 10-6: Insertion always involves creating a new leaf node.

230

Binary Search Trees

The newly inserted value was added as a leaf, which in this case hasn’t affected the height of the tree.

Inserting relatively random data usually enables the tree to maintain its O(log N) height, but what happens when you insert nonrandom data such as a word list from a dictionary or names from a telephone directory? Can you imagine what would happen if you started with an empty tree and inserted the following values in alphabetical order: A, D, F, H, I, K, L, M, and P?

Considering that new values are always inserted as leaf nodes, and remembering that all larger values become right children of their parent, inserting the values in ascending order leads to a severely unbalanced tree, as shown in Figure 10-7.

A

D

F

H

I

K

L

M

P

Figure 10-7: An unbalanced tree resulting from the insertion of ordered data.

In fact, whenever data is inserted in order, a binary search tree will degenerate into a linked list and the height of the tree — and with it the average search time — becomes O(N).

Even if you do have ordered data, though, all is not lost. There are several variations on binary search trees that perform balancing, including Red-Black trees, AVL trees, Splay trees, and so on, all of which involve fairly complex restructuring to restore some balance to the tree. For this reason, they are not covered as part of this book. However, one novel yet fairly simple variation known as a B-Tree is introduced in Chapter 15.

231

Chapter 10

Deletion

Deletion is a little more involved than searching or insertion. There are essentially three cases to consider. The node to be deleted will reflect one of the following conditions:

No children (a leaf), in which case you can simply remove it.

One child (either left or right), in which case you can replace the deleted node with its child.

Two children, in which case you swap the node with its successor and try again with either case 1 or case 2 as appropriate.

We’ll now discuss each of the cases in more detail, starting with the simplest case: deleting a leaf node. In all cases, assume you are starting with a tree that looks like the one shown in Figure 10-1.

The simplest case involves deleting a leaf node. Because a leaf node has no children, all you need to do is break the link with its parent. Figure 10-8 shows how to delete the value H from the tree.

 

 

I

 

 

D

 

L

A

F

K

M

 

H

 

P

Figure 10-8: Leaf nodes are deleted by breaking the link with their parent.

The next simplest case involves deleting a node with only one child. When deleting a node with only one child, splice it out by making its parent point to its child. Figure 10-9 shows the tree after deleting the letter M.

 

 

I

 

 

D

 

L

A

F

K

M

 

H

 

P

Figure 10-9: Nodes with only one child are spliced out.

232

Binary Search Trees

Note how the links between M and its parent, L, and its child, P, have been replaced with a direct link between L and P.

Deleting a node with two children is somewhat trickier. For example, imagine you had to delete the root node I from the tree in Figure 10-1. Which node would you use as its replacement? Technically, you could use either of the two child nodes — D or L — and still maintain the properties of a binary search tree. However, both of these nodes already have two children each, making it difficult to simply splice out the deleted node.

Therefore, to delete a node with two children, the first thing you need to do is find the successor (you could just as easily choose the predecessor and it would work just as well) and switch the values. Figure 10-10 shows the result of switching the I with the K. Notice that this temporarily violates the properties of a binary search tree.

 

 

K

 

 

D

 

L

A

F

I

M

 

H

 

P

Figure 10-10: The value of the node to be deleted is swapped with its successor.

Now that the value has moved, rather than delete the original node, which now has the value K, you delete the node with which you switch values. This process is guaranteed to fall into either case 1 or case 2. How do you know this? Well, for a start, the node you originally wanted to delete had two children, meaning that it must have a right child. Furthermore, the successor of a node with a right child is the minimum (or the leftmost node) of the right child. Therefore, the successor can either be a leaf node (one with no children) or have at most a right child. If it had a left child, by definition it wouldn’t be the minimum.

In the example, having swapped the values (see Figure 10-10), you can safely delete the leaf node containing the value I, as shown in Figure 10-11.

233