# teaching machines

## CS 330: Lecture 21 – Parametric Polymorphism

March 28, 2018 by . Filed under cs330, lectures, spring 2018.

Dear students,

Don’t get me started on arrays in C. What sort of world is this where an array is fixed in size, but yet it doesn’t know what that size is? We must fix this. Let’s create a growable array:

class Flexray {
public:
Flexray() :
capacity(10),
nitems(0),
elements(new int[capacity]) {
}

~Flexray() {
delete[] elements;
}

void operator<<(int item) {
if (nitems == capacity) {
grow();
}
elements[nitems] = item;
++nitems;
}

int size() const {
return nitems;
}

int &operator[](int i) {
// TODO bounds check
return elements[i];
}

const int &operator[](int i) const {
// TODO bounds check
return elements[i];
}

private:
void grow() {
int new_capacity = 2 * capacity;
int *new_elements = new int[new_capacity];
memcpy(new_elements, elements, sizeof(int) * capacity);
delete[] elements;
capacity = new_capacity;
elements = new_elements;
}

int capacity;
int nitems;
int *elements;
}


Okay, great. Now the boss tells us we need a growable array of strings. Then a growable array of stringstreams. Then a growable array of std::out_of_range. What are we going to do? Copy and paste and tweak types? No.

There is actually more than one answer to this question. We’ve already seen one vehicle for reusing code: subtype polymorphism. We could just have our collection appeal to the supertype. This is how Java’s collections worked when I first learned it. Collections were written in terms of Object, like this:

public class Flexray {
private int capacity;
private int nitems;
private Object[] elements;

public Flexray() {
capacity = 10;
nitems = 0;
elements = new Object[capacity];
}

if (nitems == capacity) {
grow();
}
elements[nitems] = item;
++nitems;
}

public int size() {
return nitems;
}

public Object get(int i) {
return elements[i];
}

private void grow() {
int newCapacity = 2 * capacity;
Object[] newElements = new Object[newCapacity];
System.arraycopy(elements, 0, newElements, 0, nitems);
capacity = newCapacity;
elements = newElements;
}
}


Now we can make a list that works for any type, right? Because everything’s an Object. Sadly, a generalizing polymorphic solution suffers from a couple of drawbacks:

• When we retrieve an element using get, we must downcast the result to the actual type.
• When we add an element using add, we can throw anything in. It doesn’t have to be of the type we want our list to hold.
• The collection can’t hold primitives, as they are not in the Object hierarchy.

A better fix is to define Flexray once but let Flexray‘s type come in as a parameter. This rounds out our four forms of polymorphism:

• Parametric polymorphism is when we pass a type as a parameter to a function, class, or some other construct. The compiler uses that type to produce seemingly many different versions of the construct.

In C++, we decorate our class with a template and replace all references to int elements to T elements:

template<typename T>
class Flexray {
public:
Flexray() :
capacity(10),
nitems(0),
elements(new T[capacity]) {
}

~Flexray() {
delete[] elements;
}

void operator<<(T item) {
if (nitems == capacity) {
grow();
}
elements[nitems] = item;
++nitems;
}

int size() const {
return nitems;
}

T &operator[](int i) {
// TODO bounds check
return elements[i];
}

const T &operator[](int i) const {
// TODO bounds check
return elements[i];
}

private:
void grow() {
int new_capacity = 2 * capacity;
T *new_elements = new T[new_capacity];
memcpy(new_elements, elements, sizeof(T) * capacity);
delete[] elements;
capacity = new_capacity;
elements = new_elements;
}

int capacity;
int nitems;
T *elements;
}


A templated class or function isn’t actually compilable code in C++. It’s just a pattern—a template. Only when the compiler sees a reference to the templated construct with the type value provided does it go and create a specialization of that pattern. If it sees another reference with a different type, it generates a completely different specialization for just that type. So, we might have 20 different versions of Flexray, all catering to elements of different types. The source code will have just one definition, but the machine code will have 20 different representations.

Java handles templates differently. The source code looks mostly the same:

public class Flexray<T> {
private int capacity;
private int nitems;
private T[] elements;

public Flexray() {
capacity = 10;
nitems = 0;
elements = (T[]) new Object[capacity];
}

if (nitems == capacity) {
grow();
}
elements[nitems] = item;
++nitems;
}

public int size() {
return nitems;
}

public T get(int i) {
return elements[i];
}

private void grow() {
int newCapacity = 2 * capacity;
T[] newElements = (T[]) new Object[newCapacity];
System.arraycopy(elements, 0, newElements, 0, nitems);
capacity = newCapacity;
elements = newElements;
}
}


The designers decided that they didn’t want 20 different machine code representations. They decided that they would use the generic parameters only for type checking and not for code generation. The generic type is erased completely from the single machine code representation and is replaced with a general supertype—either Object or the generic type’s upper bound. Their design suffers from a couple of issues:

• The generic type is simply not known at runtime. This means we can’t say things like new T.
• The collection can’t hold primitives, as they are not in the Object hierarchy.

Hey, at least we fixed the type checking issues!

The most common application of templates is to create reusable collections. But in C++ we can also pass the compiler a value to help it specialize the templated pattern. For example, suppose we want lightning fast multiplication. Let’s take a first stab, which is anything but lightning fast:

int multiply(int a, int n) {
int sum = 0;
for (int i = 0; i < n; ++i) {
sum += a;
}
return sum;
}


Compare this to a templated version of multiply. Here one of the coefficients comes in as a template parameter:

template<int n>
int multiply(int a) {
int sum = 0;
for (int i = 0; i < n; ++i) {
sum += a;
}
return sum;
}


Now consider this function that invokes the two multiplys:

void foo() {
multiply(2, 7);
multiply<2>(7);
}


What will the compiler do in the first case? It will generate very general machine code that invokes the multiplication hardware through a multiplication instruction, which tends to consume more cycles than addition—though they are close on modern architectures.

In the second case, the compiler is equipped with more information. It knows one of the operands. That means it can tailor the machine code it generates to the situation at hand. It may be able to avoid the multiplication unit! Let’s verify this using Matt Godbolt’s Compiler Explorer.

Let’s annotate the assembly for just the non-template version with absolutely no optimizations:

multiply(int, int):
pushq %rbp            # store the caller's base pointer; we might change it
movq %rsp, %rbp       # set our base pointer to the current stack pointer
movl %edi, -4(%rbp)   # copy parameter 1 from register edi to stack variable A
movl %esi, -8(%rbp)   # copy parameter 2 from register esi to stack variable B
movl $0, -12(%rbp) # zero out stack variable C, the accumulator movl$0, -16(%rbp)    # zero out stack variable D, the iterator
.LBB0_1:
movl -16(%rbp), %eax  # copy D into register eax
cmpl -8(%rbp), %eax   # compare eax and B
jge .LBB0_4           # if eax >= B, break
movl -4(%rbp), %eax   # copy A to eax
movl %eax, -12(%rbp)  # copy eax into C
movl -16(%rbp), %eax  # copy D into eax
addl $1, %eax # increment eax movl %eax, -16(%rbp) # copy eax into D jmp .LBB0_1 # go back to start of loop .LBB0_4: movl -12(%rbp), %eax # copy C into eax popq %rbp # restore caller's base pointer retq foo(): pushq %rbp movq %rsp, %rbp movl$3, %edi
movl $2, %esi callq multiply(int, int) movl %eax, p popq %rbp retq  If we compile this with a little bit of optimizations, things get better: multiply(int, int): imull %esi, %edi # edi *= esi xorl %eax, %eax # eax = 0 testl %esi, %esi # compare esi to 0 cmovgl %edi, %eax # eax = edi if esi > 0 retq foo(): pushq %rax movl$3, %edi
movl $2, %esi callq multiply(int, int) movl %eax, p(%rip) popq %rax retq  I don’t quite understand the need for the xorl through cmovgl bit. Still, the compiler was smarter enough to recognize that we were trying to do multiplication. That’s impressive, I think. Let’s compare this the template version, the version where the compiler knows one of the operands: foo(): movl$7, %edi
jmp int multiply4(int)

int multiply4(int):
leal (,%rdi,4), %eax     # eax = 0 + rdi * 4
# leal (%rdi,%rdi,2), %eax     # when 3 * 7, eax = rdi + rdi * 2
# leal (%rdi,%rdi,4), %eax     # when 5 * 7, eax = rdi + rdi * 4
retq


Instead of invoking the general multiplication hardware with an imull instruction, we get the load-effective-address (leal) instruction. This instruction has special support in hardware and is going to be much faster than general multiplication. We’ll try several different values for the template parameter and see what magic the compiler can work.

What else can we do with templates? We can do automatic garbage collection. But that’s a story for another day.

Sincerely, P.S. It’s time for a haiku!

First it was New York
Now you can ♥ anything
‘Cuz they are T-shirts

P.P.S. Here’s the code we wrote together:

#### flex.cpp

#include <iostream>

template<typename T> class Flexray {
public:
Flexray() :
capacity(10),
nitems(0),
items(new T[capacity]) {
}

~Flexray() {
delete[] items;
}

void operator<<(T item) {
items[nitems] = item;
++nitems;
}

int size() const {
return nitems;
}

T operator[](int i) const {
return items[i];
}

T &operator[](int i) {
return items[i];
}

private:
int capacity;
int nitems;
T *items;
};

int main(int argc, char **argv) {
Flexray<int> fr;
fr << 2;
std::cout << "fr.size(): " << fr.size() << std::endl;
std::cout << "fr: " << fr << std::endl;

/* Flexray<Flexray<char> > ff; */
/* ff << Flexray<char>(); */
Flexray<char> fc;
fc << 'E';
fc << 'g';
std::cout << "fc.size(): " << fc.size() << std::endl;

return 0;
}


#### Flexray.java

public class Flexray<T> {
private int nitems;
private T[] items;

public Flexray() {
nitems = 0;
items = (T[]) new Object;
}

public T get(int i) {
return items[i];
}

items[nitems] = item;
++nitems;
}

public static void main(String[] args) {
Flexray<String> fr = new Flexray<String>();