teaching machines

CS 240: Lecture 29 – Red-black Trees

November 1, 2021 by . Filed under cs3, fall-2021, lectures.

Dear students:

AVL trees are one solution to the problem of balancing binary search trees. They do an excellent job of minimizing the height. Humans have continued to invent other self-balancing trees that occasionally beat out AVL trees. The red-black tree, for instance, tends to have cheaper insertions because it is not as sensitive to imbalances and therefore doesn’t perform nearly as many rotations.

A red-black tree maintains a boolean property for each node. By convention, this property labels each node as either red or black. Unlike the balance factor, the semantic meaning of node’s color is not intuitive. When a node is inserted in the tree, it starts off red. Its color may be changed in order to preserve these properties that are true of all red-black trees:

  1. All nodes are either red or black.
  2. The root node is black.
  3. Each null child is implicitly black. Some implementations terminate each branch using an explicit black sentinel node.
  4. If a node is red, then both its children are black.
  5. Every path through the tree, from root to leaf, contains the same number of black nodes. That number is called the black height.

These properties seem pretty arbitrary at first glance. And maybe second glance too. And maybe third glance. The big story is that if you have a tree that maintains these properties and has $n$ nodes, the height will never exceed $2\log_2 (n + 1)$. This isn’t as strong a guarantee as an AVL tree, which bounds the height around $1.5\log_2 (n + 1)$, but that means it has less work to do.

I will ask you to draw a red-black tree with at least 5 nodes.

That leads us to a peer-instruction question. Is the following tree red-black? If it is, great. If it isn’t, how many rules does it violate?

You insert a new node in a red-black tree like any binary search tree. It starts off red. If its parent is also red, then you have a violation of one of the properties. To restore the tree to its red-black state, you must examine the node’s uncle. If the uncle is also red, there’s only a little work to do: you flip the colors of the red parent, the red uncle, and the black grandparent and then you recurse up the tree. You skip up to the grandparent when doing the recursion, which is now red and leads to a new potential violation.

If the uncle of a red child with a red parent is instead black, then you perform a rotation just as you do in an AVL tree. And maybe some recoloring.

We will not get into the specifics of the balancing logic, because you won’t remember the details later, just as I do not. The important takeaway is that there is more than one way to balance a tree. Because of their complexity, I would prefer not to mention red-black trees at all, but many of our data structures use them, including Java’s TreeMap and TreeSet and C++’s std::map. This puts them into the discourse of computer scientists. If you show up to this discourse without having heard of them, I will get in trouble. I didn’t teach them at my last university, and look where that got me.

Exercises

For the remainder of our time together, you’ll work through some exercises on AVL trees with your neighbors. We ran out of time for this last week. It doesn’t match our discussion of red-black trees perfectly, but both are self-balancing trees that use the same rotation operation to restore a balance.

TODO

You have some work to do before the next class:

See you next time.

Sincerely,

P.S. It’s time for a haiku!

Why are they red-black?
Since the pens were those colors
They were so ink-lined