CS 240: Lecture 18 - Balanced Tree
Dear students:
Last time we introduced a special case of binary trees in which the left descendents are all less than their root and all the right descendents are greater. Such a tree holds the binary search in its bones.
The binary search is fast, right? We like that. To get that speed, we need to sort our collection and keep it sorted as we add new things to it. We already know how to perform sorting, searching, and inserting on array lists, so why bother with trees as a a new structure for performing a binary search? You've heard the answer before: for faster performance. Consider this table comparing the worst case performance of the find, insert, and remove operations for sorted arrays and binary search trees:
data structure | find | insert | remove |
---|---|---|---|
sorted array | \(\Theta(\log_2 n)\) | \(\Theta(n)\) | \(\Theta(n)\) |
unbalanced tree | \(\Theta(n)\) | \(\Theta(n)\) | \(\Theta(n)\) |
balanced tree | \(\Theta(\log_2 n)\) | \(\Theta(\log_2 n)\) | \(\Theta(\log_2 n)\) |
Binary search trees beat out arrays when it comes to inserting and removing because they are linked. Arrays need to shuffle their elements around. However, binary search trees are faster only when they are close to perfect, when they have as much mass on the left as on the right. When they aren't balanced, the performance degenerates to linear time for all three operations.
AVL Trees
Care must be taken to keep a tree balanced, and there are several data structures that do so. The data structure we examine today is an AVL tree. This was the first so-called self-balancing tree that was invented all the way back in the 1960s by a couple of Soviet researchers. There's another balancing structure called a red-black tree that we will not discuss. One balancing tree is sufficient for you to understand how we achieve logarithmic worst-case performance.
In an AVL tree, each node is assigned a balance factor. A node's balance factor is the difference between its children's heights. If the left side is taller than the right, then the balance factor is positive. If the right side is taller, then the balance factor is negative. Some definitions flip the sign. If the two sides are equal, the balance factor is 0. The balance factor is calculated as left.height - right.height
. If a child doesn't exist, its height is considered to be -1.
We'll practice computing balance factors on several random trees.
An AVL tree is a binary search tree, and inserting and removing start just as they do in a non-balanced tree. You start at the root and traverse either left or right until you find the spot in the tree where the key belongs. After the node is added or removed, you might have an imbalanced tree. You restore balance by rotating the imbalanced node.
Rotating
After a new node is inserted in an AVL tree or an old one is removed, one must walk up from the new node, possibly all the way to the root. If any node on that path has a balance factor greater than 1 or less than -1, then it must be rebalanced by rotating the node. Rotating a node left means that its right child takes its place as the parent, and the node becomes its left child. Rotating a node right means that its left child takes its place as the parent, and the node becomes its right child.
The new parent might already have a child in the place where the rotated node is supposed to be installed. The child will have to be moved, but where? The rotated node has lost one of its children, so it's a good candidate. In a left rotation, the new parent's left child moves to become the right child of the rotated node. In a right rotation, the new parent's right child moves to become the left child of the rotated node.
A single rotation will not always balance out the tree. This situation can be detected by comparing the signs of the balance factors of the node you want to rotate and the node that will take its place. If their signs are different, then the new balance factors won't improve. In such a situation, we first rotate the node's child.
Our strategy for deciding how to rotate a node condenses down to this pseudocode:
if node's balance factor is > 1 (left is heavy)
if left child's balance factor is opposite (< 0)
rotate left child left
rotate node right
else if node's balance factor is < -1 (right is heavy)
if right child's balance factor is opposite (> 0)
rotate right child right
rotate node left
We'll practice inserting and removing nodes with our magnetic letters.
TODO
You've got a few things to do before we meet again:
See you next time!
Sincerely,
P.S. It's time for a haiku: