teaching machines

CS 240: Lecture 30 – Heaps

November 3, 2021 by . Filed under cs3, fall-2021, lectures.

Dear students:

One of the most challenging things about this course is that you are asked to learn about things before you feel that you need them. I wish I knew how to fix that. We have so very little time in which to make learning happen. That said, we have a new abstract data type to discuss. To motivate why we need this new ADT, consider the following scenarios:

In all these cases you have a collection, and you need to repeatedly draw out the least or the greatest element from the collection. The collection gains new entries periodically. The abstract data type that describes such a collection is the priority queue, which has this interface:

interface PriorityQueue<T extends Comparable<T>> {
  void add(T item);
  T removeHighestPriorityItem();
  int size();
  boolean isEmpty();
}

A regular queue has an easy job. Items are dished out in chronological order. The priority queue instead has to measure each item against the others to see which one has the highest priority.

One way to implement this would to create an array and keep it sorted by the items’ priorities. Adding and removing items would both be linear time operations. Can we find a faster implementation? Perhaps one that runs in logarithmic time? We can if we use a tree. But we don’t need a total ordering of all elements. We just need to be able to find the highest priority item at any given time. A binary search tree is not what we need. Instead, we’ll use a heap.

A heap is a binary tree in which the highest priority item is always at the root. The children of a node always have a lower priority. Unlike a BST, there’s not necessarily any difference between the left and right subtrees. Both sides are lower than their parent, but nothing can be said about their relationship to each other.

Heaps can be implemented slickly if you add another requirement: the tree behind the heap is always complete. That means the only holes in the tree are in the bottom level and at thvel and at the right. The benefit of this requirement is that you can very compactly store the heap not as a linked tree, but as an array. The savings can be significant if the heap is large and you are storing three links per node (two for the children and one for the parent).

Child Indices

An array-based heap is stored in level order. The root is at index 0. Its left child is at index 1 and its right child at index 2. The grandchildren are at indices 3 through 7.

So, if you don’t have a linked tree, how do you get around from root to leaves? Well, consider these parent-left relationships:

Node $i$ finds its left child at $2i+1$. Similarly, consider these parent-right relationships:

Node $i$ finds its right child at $2i+2$.

Parent Index

Suppose you are looking at node $i$ and you want to find its parent at index $p$. If the node is a left child, then you know this:

$$\begin{align}2p + 1 &= i \\2p &= i – 1 \\p &= \frac{i – 1}{2} \\\end{align}$$

If the node is a right child, then you know this:

$$\begin{align}2p + 2 &= i \\2p &= i – 2 \\p &= \frac{i – 2}{2} \\\end{align}$$

It looks like there are two different formulae depending on the leftness or rightness of the node. However, consider that a right child is always going to have an even index. Subtracting by two and then performing an integer-division by 2 is always going to yield the same result as subtracting by 1 and dividing. Don’t believe me? Look at some examples:

$$\begin{align}\frac{2 – 2}{2} &= 0 = \frac{2 – 1}{2} \\\frac{4 – 2}{2} &= 1 = \frac{4 – 1}{2} \\\frac{6 – 2}{2} &= 2 = \frac{6 – 1}{2} \\\end{align}$$

So, that means we can use this one formula for any node, whatever child it may be of its parent:

$$\begin{align}\mathrm{parentIndex} &= \frac{\mathrm{childIndex} – 1}{2} \\\end{align}$$

Add

To add a new item to a queue, you don’t walk the tree from the root to the leaf as you do in a binary search tree. Rather, you throw the item in the next open leaf cell in the array. Then you swap it upward with its ancestors until the priorities are ordered properly. This swapping is called sifting or bubbling or reheaping up.

Remove

The only item you remove from the queue is the item with the highest priority. As with a regular queue, you don’t have access to any other item. The highest priority item is at the root. When you remove it, you have a vacancy in the tree that must be filled in order to keep the tree complete. The item that fills that vacancy is the item with the next highest priority. It will be one of the vacated node’s children. In particular, it must be the larger of the two to maintain the heap property. This swapping is called sifting or bubbling or reheaping down.

Exercises

For the remainder of our time together, you’ll work through some exercises on AVL trees with your neighbors. We ran out of time for this last week. It doesn’t match our discussion of red-black trees perfectly, but both are self-balancing trees that use the same rotation operation to restore a balance.

TODO

You have some work to do before the next class:

See you next time.

Sincerely,

P.S. It’s time for a haiku!

They don’t vote for kings
Not in the land of the blind
They just count the eyes