teaching machines

CS 330: Lecture 21 – Parametric Polymorphism

March 28, 2018 by . Filed under cs330, lectures, spring 2018.

Dear students,

Don’t get me started on arrays in C. What sort of world is this where an array is fixed in size, but yet it doesn’t know what that size is? We must fix this. Let’s create a growable array:

class Flexray {
  public:
    Flexray() :
      capacity(10),
      nitems(0),
      elements(new int[capacity]) {
    }

    ~Flexray() {
      delete[] elements;
    }

    void operator<<(int item) {
      if (nitems == capacity) {
        grow();
      }
      elements[nitems] = item;
      ++nitems;
    }

    int size() const {
      return nitems;
    }

    int &operator[](int i) {
      // TODO bounds check
      return elements[i];
    }

    const int &operator[](int i) const {
      // TODO bounds check
      return elements[i];
    }

  private:
    void grow() {
      int new_capacity = 2 * capacity;
      int *new_elements = new int[new_capacity];
      memcpy(new_elements, elements, sizeof(int) * capacity); 
      delete[] elements;
      capacity = new_capacity;
      elements = new_elements;
    }

    int capacity;
    int nitems;
    int *elements;
}

Okay, great. Now the boss tells us we need a growable array of strings. Then a growable array of stringstreams. Then a growable array of std::out_of_range. What are we going to do? Copy and paste and tweak types? No.

There is actually more than one answer to this question. We’ve already seen one vehicle for reusing code: subtype polymorphism. We could just have our collection appeal to the supertype. This is how Java’s collections worked when I first learned it. Collections were written in terms of Object, like this:

public class Flexray {
  private int capacity;
  private int nitems;
  private Object[] elements;

  public Flexray() {
    capacity = 10;
    nitems = 0;
    elements = new Object[capacity];
  }

  public void add(Integer item) {
    if (nitems == capacity) {
      grow();
    }
    elements[nitems] = item;
    ++nitems;
  }

  public int size() {
    return nitems;
  }

  public Object get(int i) {
    return elements[i];
  }

  private void grow() {
    int newCapacity = 2 * capacity;
    Object[] newElements = new Object[newCapacity];
    System.arraycopy(elements, 0, newElements, 0, nitems);
    capacity = newCapacity;
    elements = newElements;
  }
}

Now we can make a list that works for any type, right? Because everything’s an Object. Sadly, a generalizing polymorphic solution suffers from a couple of drawbacks:

A better fix is to define Flexray once but let Flexray‘s type come in as a parameter. This rounds out our four forms of polymorphism:

In C++, we decorate our class with a template and replace all references to int elements to T elements:

template<typename T>
class Flexray {
  public:
    Flexray() :
      capacity(10),
      nitems(0),
      elements(new T[capacity]) {
    }

    ~Flexray() {
      delete[] elements;
    }

    void operator<<(T item) {
      if (nitems == capacity) {
        grow();
      }
      elements[nitems] = item;
      ++nitems;
    }

    int size() const {
      return nitems;
    }

    T &operator[](int i) {
      // TODO bounds check
      return elements[i];
    }

    const T &operator[](int i) const {
      // TODO bounds check
      return elements[i];
    }

  private:
    void grow() {
      int new_capacity = 2 * capacity;
      T *new_elements = new T[new_capacity];
      memcpy(new_elements, elements, sizeof(T) * capacity); 
      delete[] elements;
      capacity = new_capacity;
      elements = new_elements;
    }

    int capacity;
    int nitems;
    T *elements;
}

A templated class or function isn’t actually compilable code in C++. It’s just a pattern—a template. Only when the compiler sees a reference to the templated construct with the type value provided does it go and create a specialization of that pattern. If it sees another reference with a different type, it generates a completely different specialization for just that type. So, we might have 20 different versions of Flexray, all catering to elements of different types. The source code will have just one definition, but the machine code will have 20 different representations.

Java handles templates differently. The source code looks mostly the same:

public class Flexray<T> {
  private int capacity;
  private int nitems;
  private T[] elements;

  public Flexray() {
    capacity = 10;
    nitems = 0;
    elements = (T[]) new Object[capacity];
  }

  public void add(T item) {
    if (nitems == capacity) {
      grow();
    }
    elements[nitems] = item;
    ++nitems;
  }

  public int size() {
    return nitems;
  }

  public T get(int i) {
    return elements[i];
  }

  private void grow() {
    int newCapacity = 2 * capacity;
    T[] newElements = (T[]) new Object[newCapacity];
    System.arraycopy(elements, 0, newElements, 0, nitems);
    capacity = newCapacity;
    elements = newElements;
  }
}

The designers decided that they didn’t want 20 different machine code representations. They decided that they would use the generic parameters only for type checking and not for code generation. The generic type is erased completely from the single machine code representation and is replaced with a general supertype—either Object or the generic type’s upper bound. Their design suffers from a couple of issues:

Hey, at least we fixed the type checking issues!

The most common application of templates is to create reusable collections. But in C++ we can also pass the compiler a value to help it specialize the templated pattern. For example, suppose we want lightning fast multiplication. Let’s take a first stab, which is anything but lightning fast:

int multiply(int a, int n) {
  int sum = 0;
  for (int i = 0; i < n; ++i) {
    sum += a;
  }
  return sum;
}

Compare this to a templated version of multiply. Here one of the coefficients comes in as a template parameter:

template<int n>
int multiply(int a) {
  int sum = 0;
  for (int i = 0; i < n; ++i) {
    sum += a;
  }
  return sum;
}

Now consider this function that invokes the two multiplys:

void foo() {
  multiply(2, 7);
  multiply<2>(7);
}

What will the compiler do in the first case? It will generate very general machine code that invokes the multiplication hardware through a multiplication instruction, which tends to consume more cycles than addition—though they are close on modern architectures.

In the second case, the compiler is equipped with more information. It knows one of the operands. That means it can tailor the machine code it generates to the situation at hand. It may be able to avoid the multiplication unit! Let’s verify this using Matt Godbolt’s Compiler Explorer.

Let’s annotate the assembly for just the non-template version with absolutely no optimizations:

multiply(int, int):
  pushq %rbp            # store the caller's base pointer; we might change it
  movq %rsp, %rbp       # set our base pointer to the current stack pointer
  movl %edi, -4(%rbp)   # copy parameter 1 from register edi to stack variable A
  movl %esi, -8(%rbp)   # copy parameter 2 from register esi to stack variable B
  movl $0, -12(%rbp)    # zero out stack variable C, the accumulator
  movl $0, -16(%rbp)    # zero out stack variable D, the iterator
.LBB0_1:
  movl -16(%rbp), %eax  # copy D into register eax
  cmpl -8(%rbp), %eax   # compare eax and B
  jge .LBB0_4           # if eax >= B, break
  movl -4(%rbp), %eax   # copy A to eax
  addl -12(%rbp), %eax  # add C onto eax
  movl %eax, -12(%rbp)  # copy eax into C
  movl -16(%rbp), %eax  # copy D into eax
  addl $1, %eax         # increment eax
  movl %eax, -16(%rbp)  # copy eax into D
  jmp .LBB0_1           # go back to start of loop
.LBB0_4: 
  movl -12(%rbp), %eax  # copy C into eax
  popq %rbp             # restore caller's base pointer
  retq

foo():
  pushq %rbp
  movq %rsp, %rbp
  movl $3, %edi
  movl $2, %esi
  callq multiply(int, int)
  movl %eax, p
  popq %rbp
  retq

If we compile this with a little bit of optimizations, things get better:

multiply(int, int):
  imull %esi, %edi    # edi *= esi
  xorl %eax, %eax     # eax = 0
  testl %esi, %esi    # compare esi to 0
  cmovgl %edi, %eax   # eax = edi if esi > 0
  retq

foo():
  pushq %rax
  movl $3, %edi
  movl $2, %esi
  callq multiply(int, int)
  movl %eax, p(%rip)
  popq %rax
  retq

I don’t quite understand the need for the xorl through cmovgl bit. Still, the compiler was smarter enough to recognize that we were trying to do multiplication. That’s impressive, I think.

Let’s compare this the template version, the version where the compiler knows one of the operands:

foo():
  movl $7, %edi 
  jmp int multiply4(int)

int multiply4(int):
  leal (,%rdi,4), %eax     # eax = 0 + rdi * 4
  # leal (%rdi,%rdi,2), %eax     # when 3 * 7, eax = rdi + rdi * 2
  # leal (%rdi,%rdi,4), %eax     # when 5 * 7, eax = rdi + rdi * 4
  retq

Instead of invoking the general multiplication hardware with an imull instruction, we get the load-effective-address (leal) instruction. This instruction has special support in hardware and is going to be much faster than general multiplication. We’ll try several different values for the template parameter and see what magic the compiler can work.

What else can we do with templates? We can do automatic garbage collection. But that’s a story for another day.

Sincerely,

P.S. It’s time for a haiku!

First it was New York
Now you can ♥ anything
‘Cuz they are T-shirts

P.P.S. Here’s the code we wrote together:

flex.cpp

#include <iostream>

template<typename T> class Flexray {
  public:
    Flexray() :
      capacity(10),
      nitems(0),
      items(new T[capacity]) {
    }

    ~Flexray() {
      delete[] items;
    }

    void operator<<(T item) {
      items[nitems] = item;
      ++nitems;
    }

    int size() const {
      return nitems;
    }

    T operator[](int i) const {
      return items[i];
    }

    T &operator[](int i) {
      return items[i];
    }

  private:
    int capacity;
    int nitems;
    T *items;
};

int main(int argc, char **argv) {
  Flexray<int> fr;
  fr << 2;
  std::cout << "fr.size(): " << fr.size() << std::endl;
  std::cout << "fr[0]: " << fr[0] << std::endl;

  /* Flexray<Flexray<char> > ff; */
  /* ff << Flexray<char>(); */
  Flexray<char> fc;
  fc << 'E';
  fc << 'g';
  std::cout << "fc.size(): " << fc.size() << std::endl;

  return 0;
}

Flexray.java

public class Flexray<T> {
  private int nitems;
  private T[] items; 

  public Flexray() {
    nitems = 0;
    items = (T[]) new Object[10];
  }

  public T get(int i) {
    return items[i];
  }

  public void add(T item) {
    items[nitems] = item;
    ++nitems;
  }

  public static void main(String[] args) {
    Flexray<String> fr = new Flexray<String>();
    fr.add("Elizabeth");
    fr.add("Freddy Mercury");
    /* fr.add(new NegativeArraySizeException()); */

    /* char c = ((String) fr.get(0)).charAt(0); */
    char c = fr.get(0).charAt(0);
    System.out.println("c: " + c);
  }
}