CS 240: Lecture 19 - Priority Queue

Dear students:

One of the most challenging things about this course is that you are asked to learn about things before you feel that you need them. I wish I knew how to fix that. We have so very little time in which to make learning happen. That said, we have a new abstract data type to discuss. To motivate why we need this new ADT, consider the following scenarios:

You are a white mage and can cast a cure spell every turn. You always want to cure whoever has the fewest hit points.
You are a streaming music app, a browser, or a mobile operating system. When disk in the device gets full of downloaded files, you want to purge old ones to make room for more.
You are building an advertising system that amplifies the voices of Twitter users who don't have many followers. You want a fast and easy way to pick which user to highlight next.

In all these cases you have a collection, and you need to repeatedly draw out the least or the greatest element from the collection. The collection gains new entries periodically. The abstract data type that describes such a collection is the priority queue, which has this interface:

interface PriorityQueue<T extends Comparable<T>> {
  void add(T item);
  T remove();
  int size();
  boolean isEmpty();
}

A regular queue has an easy job. Items are dished out in chronological order. The priority queue instead has to measure each item against the others to see which one has the highest priority.

One way to implement this would to create an array and keep it sorted by the items' priorities. Adding and removing items would both be linear time operations. Can we find a faster implementation? Perhaps one that runs in logarithmic time? We can if we use a tree. But we don't need a total ordering of all elements. We just need to be able to find the highest priority item at any given time. A binary search tree is not what we need because it orders items by unique identifiers. Priorities are not unique identifiers. Instead, we'll use a heap.

Heap

A heap is a binary tree in which the highest priority item is always at the root. The children of a node always have a lower priority. Unlike a BST, there's not necessarily any difference between the left and right subtrees. Both sides are lower than their parent, but nothing can be said about their relationship to each other.

Heaps can be implemented slickly if you add another requirement: the tree that stores the heap is always a complete tree. That means the only holes in the tree are in the bottom level and at the right of the bottom level. The benefit of this requirement is that you can very compactly store the heap not as a linked tree, but as an array. The savings can be significant if the heap is large and you are storing three links per node (two for the children and one for the parent).

Add

To add a new item to a heap, you don't walk the tree from the root to the leaf as you do in a binary search tree. Rather, you throw the item in the next open leaf cell in the array. Then you swap it upward with its ancestors until the priorities are ordered properly. This swapping is called sifting or bubbling or reheaping up or percolating up.

Remove

The only item you remove from a heap is the item with the highest priority. As with a regular queue, you don't have access to any other item. It is not a random access data structure. The highest priority item is at the root. When you remove it, you have a vacancy in the tree that must be filled in order to keep the tree complete. The item that fills that vacancy is the item with the next highest priority. It will be one of the vacated node's children. In particular, it must be the larger of the two to maintain the heap property. This swapping is called sifting or bubbling or reheaping down or percolating down.

Child Indices

Percolating down requires comparing the priorities of a parent node with its children. If the tree is stored in an array, we need a way to find the indices of a node's children. The tree is stored in level order. The root is at index 0. Its left child is at index 1 and its right child at index 2. The grandchildren are at indices 3 through 7.

So, if you don't have a linked tree, how do you get around from root to leaves? Well, consider these parent-left relationships:

0's left child is 1
1's left child is 3
2's left child is 5
3's left child is 7
4's left child is 9

Node $i$ finds its left child at $2i + 1$. Similarly, consider these parent-right relationships:

0's left child is 2
1's left child is 4
2's left child is 6
3's left child is 8
4's left child is 10

Node $i$ finds its right child at $2i + 2$.

Parent Index

Percolating up requires comparing the priorities of a child node with its parent. Suppose you are looking at node $i$ and you want to find its parent at index $p$. If the node is a left child, then you know this:

$$\begin{align} 2p + 1 &= i \\ 2p &= i - 1 \\ p &= \frac{i - 1}{2} \\ \end{align}$$

If the node is a right child, then you know this:

$$\begin{align} 2p + 2 &= i \\ 2p &= i - 2 \\ p &= \frac{i - 2}{2} \\ \end{align}$$

It looks like there are two different formulae depending on the leftness or rightness of the node. However, consider that a right child is always going to have an even index. Subtracting by two and then performing an integer-division by 2 is always going to yield the same result as subtracting by 1 and dividing. Don't believe me? Look at some examples:

$$\begin{align} \frac{2 - 2}{2} &= 0 = \frac{2 - 1}{2} \\ \frac{4 - 2}{2} &= 1 = \frac{4 - 1}{2} \\ \frac{6 - 2}{2} &= 2 = \frac{6 - 1}{2} \\ \end{align}$$

So, that means we can use this one formula for any node, whatever child it may be of its parent:

$$\begin{align} \mathrm{parentIndex} &= \frac{\mathrm{childIndex} - 1}{2} \\ \end{align}$$

Complexity

Suppose we want to implement the priority queue ADT. We have many data structures we could use. Let's consider our choices and their computational complexities by filling out this table:

data structure	insert	removeMin	peekMin
sorted array	?	?	?
unbalanced BST	?	?	?
AVL tree	?	?	?
heap	?	?	?

The insert operation takes in a priority-item pair. The removeMin operation always removes and returns the item with the lowest priority value. The peekMin operation returns it but does not remove it.

Heaps have an advantage over balanced BSTs that is not reflected in this table. If you have an array filled with values in an arbitrary order, you can turn it into a heap in linear time. Turning it into a binary tree requires linearithmic time.

Let's consider also how much space these data structures consume.

data structure	incidental space
sorted array	?
unbalanced BST	?
AVL tree	?
heap	?

TODO

You've got a few things to do before we meet again:

Read chapter 15 in the textbook.

The last PA will likely be released early next week. It'll require implementing your own heap-based priority queue.

See you next time!

Sincerely,

P.S. It's time for a haiku:

I am not the best But I'm not far from the top Like heaps of people