CS 240: Lecture 9 - Lists

Dear students:

Today we step foot in the gallery of abstract data types in which we'll spend the rest of the semester. We start with one of the gentler specimens: the humble list. But even it may be implemented in several different ways.

List

List is an abstract data type. Recall that an ADT describes a set of values and the operations that those values support. It is the outside view of the data, not the inside view. From the outside, a list is an indexed sequence of data that supports these operations and maybe others:

size
get an element at some index
remove an element at some index
appending an element
check whether an element is in the list

ArrayList and LinkedList are two kinds of lists. The former uses a backing array as its data structure, while the latter uses a chain of objects as its data structure. One could dream up other kinds of lists, like one that used a database or the file system, but these are the two most common. All of these different lists are implementations of the same abstract data type and are in theory interchangeable. But they have different costs.

ArrayList

What is the output of this program?

List<Integer> numbers = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
  numbers.add(i);
}
System.out.println(numbers.size());

This code prints 10000, as you'd expect. Java's ArrayList does not have a fixed size, even though arrays do. How does it manage this? It starts off with a small array of a certain length. OpenJDK starts off with 10. There aren't any items in this array initially, so the list itself has size 0, even though the array has length 10.

To track the size of the list, which is independent of the array capacity, the ArrayList holds an extra int. This int makes the size method easier to implement.

The get method of ArrayList can exploit the random access subscript operation on arrays:

get(i)
  assert valid index
  return element i from array

This method will be fast. Since looking up an element doesn't depend on the number of items in the list, it's $\Theta(1)$.

The contains method of ArrayList must traverse the list to find the given element:

contains(target)
  for each item in array
    if item is target
      return true
  return false

This method is a linear search, so it's $\Theta(n)$. If the list were sorted, we could search more quickly. However, sorting is not a feature of the list ADT.

As the client calls add, the array fills up. Eventually the size of the list matches its capacity. If the client calls add on a full list, a new and bigger array is created, and all of the elements from the previous array are copied over to it. Here's how I might implement the add method:

add(item)
  if array is full
    copy it to a bigger array
  append item

What can we say about the complexity of add? The cost depends on how many items have already been added to the list, so we have to look at more than just the parameter item. When the list is full, we will have to copy over all elements of the list to a new array. The worst case is therefore $\Theta(n)$. However, if the list is not full, the algorithm is constant time. The best case is therefore $\Theta(1)$. Since the worst case happens infrequently and on a predictable schedule, we usually say that add is $\Theta(1)$ but qualify that this is its amortized complexity. If you pay $2400 in taxes annually, it would be silly to budget $2400 per month. Instead, you'd spread out or amortize the cost across all 12 months, budgeting only $200 per month. We view add in the same way. It's worst case cost is spread out across all the many calls that hit the best case.

The remove operation follows this pseudocode:

remove(index)
  assert valid index
  for each item after the removed item
    shift it back

In the worst case, we'll remove the head of the list, and we'll have to shift all items. That makes remove a $\Theta(n)$ operation in the worst case. In the best case, we'll remove the final item, which has a cost that is $\Theta(1)$.

LinkedList

A linked list is a chain of node objects. Each node is a pair of an item and a reference to the next item:

class Node<T> {
  public T item;
  public Node<T> next;
}

Usually this class is made an inner class within some larger linked list abstraction.

Computing the size of a linked list could be a linear time operation. You would be wise to store the size in an instance variable on the list instead of traversing the chain.

Getting an item at a particular index requires a traversal. So does checking if the list contains an item. These are both $\Theta(n)$.

Appending an element does not require a traversal if you track the tail node. You just a make new node and wire it up to the tail. This is a $\Theta(1)$ operation.

Removing a given index likely requires a traversal, so this is a $\Theta(n)$ operation in the worst case. However, if you have an iterator that is currently pointing a node, you can remove it without any traversal. Removing through an iterator is $\Theta(1)$.

TODO

You've got a few things to do before we meet again:

If you need more time on the Multiboard assignment, you may apply up to two of your three extension days to keep working on it.

See you next time!

Sincerely,

P.S. It's time for a haiku!

That meal filled me up Time to find a new body I know, it sounds growth