teaching machines

CS 330: Lecture 22 – Templates

March 30, 2018 by . Filed under cs330, lectures, spring 2018.

Dear students,

The most common application of templates is to create reusable collections. But in C++ we can also pass the compiler a value to help it specialize the templated pattern. For example, suppose we want lightning fast multiplication. Let’s take a first stab, which is anything but lightning fast:

int multiply(int a, int n) {
  int sum = 0;
  for (int i = 0; i < n; ++i) {
    sum += a;
  }
  return sum;
}

Compare this to a templated version of multiply. Here one of the coefficients comes in as a template parameter:

template<int n>
int multiply(int a) {
  int sum = 0;
  for (int i = 0; i < n; ++i) {
    sum += a;
  }
  return sum;
}

Now consider this function that invokes the two multiplys:

void foo() {
  multiply(2, 7);
  multiply<2>(7);
}

What will the compiler do in the first case? It will generate very general machine code that invokes the multiplication hardware through a multiplication instruction, which tends to consume more cycles than addition—though they are close on modern architectures.

In the second case, the compiler is equipped with more information. It knows one of the operands. That means it can tailor the machine code it generates to the situation at hand. It may be able to avoid the multiplication unit! Let’s verify this using Matt Godbolt’s Compiler Explorer.

Let’s annotate the assembly for just the non-template version with absolutely no optimizations:

multiply(int, int):
  pushq %rbp            # store the caller's base pointer; we might change it
  movq %rsp, %rbp       # set our base pointer to the current stack pointer
  movl %edi, -4(%rbp)   # copy parameter 1 from register edi to stack variable A
  movl %esi, -8(%rbp)   # copy parameter 2 from register esi to stack variable B
  movl $0, -12(%rbp)    # zero out stack variable C, the accumulator
  movl $0, -16(%rbp)    # zero out stack variable D, the iterator
.LBB0_1:
  movl -16(%rbp), %eax  # copy D into register eax
  cmpl -8(%rbp), %eax   # compare eax and B
  jge .LBB0_4           # if eax >= B, break
  movl -4(%rbp), %eax   # copy A to eax
  addl -12(%rbp), %eax  # add C onto eax
  movl %eax, -12(%rbp)  # copy eax into C
  movl -16(%rbp), %eax  # copy D into eax
  addl $1, %eax         # increment eax
  movl %eax, -16(%rbp)  # copy eax into D
  jmp .LBB0_1           # go back to start of loop
.LBB0_4: 
  movl -12(%rbp), %eax  # copy C into eax
  popq %rbp             # restore caller's base pointer
  retq

foo():
  pushq %rbp
  movq %rsp, %rbp
  movl $3, %edi
  movl $2, %esi
  callq multiply(int, int)
  movl %eax, p
  popq %rbp
  retq

If we compile this with a little bit of optimizations, things get better:

multiply(int, int):
  imull %esi, %edi    # edi *= esi
  xorl %eax, %eax     # eax = 0
  testl %esi, %esi    # compare esi to 0
  cmovgl %edi, %eax   # eax = edi if esi > 0
  retq

foo():
  pushq %rax
  movl $3, %edi
  movl $2, %esi
  callq multiply(int, int)
  movl %eax, p(%rip)
  popq %rax
  retq

I don’t quite understand the need for the xorl through cmovgl bit. Still, the compiler was smart enough to recognize that we were trying to do multiplication. That’s impressive, I think.

Let’s compare this the template version, the version where the compiler knows one of the operands:

foo():
  movl $7, %edi 
  jmp int multiply4(int)

int multiply4(int):
  leal (,%rdi,4), %eax     # eax = 0 + rdi * 4
  # leal (%rdi,%rdi,2), %eax     # when 3 * 7, eax = rdi + rdi * 2
  # leal (%rdi,%rdi,4), %eax     # when 5 * 7, eax = rdi + rdi * 4
  retq

Instead of invoking the general multiplication hardware with an imull instruction, we get the load-effective-address (leal) instruction. This instruction has special support in hardware and is going to be much faster than general multiplication. We’ll try several different values for the template parameter and see what magic the compiler can work.

Okay, on to something else.

One of the takeaways of this class is that good ideas are present in all languages. We should recognize a good idea when we see one and steal it. We might not choose to work with that language, but let’s take the idea back to the languages we do use. Templates are a pretty good idea, so let’s take them to C.

C doesn’t make this very easy. But let’s try writing a stack data structure. Here’s the header:

#ifndef STACK_T_H
#define STACK_T_H

typedef struct {
  T elements[100];
  int top;
} stack_T_t;

stack_T_t *stack_T_create();
void stack_T_free(stack_T_t *stack);
void stack_T_push(stack_T_t *stack, T item);
T stack_T_pop(stack_T_t *stack);
T stack_T_peek(stack_T_t *stack);

#endif

And the implementation:

#include <assert.h>
#include <stdlib.h>

#include "stack_T.h"

stack_T_t *stack_T_create() {
  stack_T_t *stack = (stack_T_t *) malloc(sizeof(stack_T_t));
  stack->top = -1;
  return stack;
}

void stack_T_free(stack_T_t *stack) {
  free(stack);
}

T stack_T_peek(stack_T_t *stack) {
  assert(stack->top >= 0);
  return stack->elements[stack->top];
}

T stack_T_pop(stack_T_t *stack) {
  assert(stack->top >= 0);
  T element = stack->elements[stack->top];
  --stack->top;
  return element;
}

void stack_T_push(stack_T_t *stack, T item) {
  assert(stack->top < 100 - 1);
  ++stack->top;
  stack->elements[stack->top] = item;
}

Now, let’s write a main that makes two stacks of different types:

#include <stdio.h>
#include "stack_int.h"
#include "stack_float.h"

int main(int argc, char **argv) {
  stack_int_t *ints = stack_int_create(); 
  stack_int_push(ints, 5);
  stack_int_push(ints, 6);
  stack_int_push(ints, 7);
  printf("%d\n", stack_int_peek(ints));

  stack_float_t *floats = stack_float_create(); 
  stack_float_push(floats, 5.5f);
  stack_float_push(floats, 6.6f);
  stack_float_push(floats, 7.7f);
  printf("%f\n", stack_float_peek(floats));

  return 0;
}

But stack_int_t and stack_float_t don’t really exist. All we have is stack_T_t. What can do we do? We can go through the template and substitute all the occurrences of T for int and float—and any other types we need. This sounds pretty annoying, but we can write some rules in our makefile to hide the uglyness of this task. Here’s our rule for stack_int:

stack_int.c: stack_template.c makefile
	sed -e 's/TYPENAME/int/g' $< > $@

stack_int.h: stack_template.h makefile
	sed -e 's/TYPENAME/int/g' $< > $@

stack_int.o: stack_int.c stack_int.h makefile
	$(CC) $(CFLAGS) -o $@ -c $<

The rule for our main sets the substitution process in motion:

main: stack_int.o stack_float.o main.c makefile
	$(CC) $(CFLAGS) -o $@ stack_int.o stack_float.o main.c

This closes out our discussion of the four kinds of polymorphism. Whew. Next time we began our examination of functional programming through the lens of Haskell. Here’s your TODO list:

See you next time!

Sincerely,

P.S. It’s time for a haiku!

None of it’s source code
I am the source, not the code
The bugs, however

P.P.S. Here’s the code we wrote together:

stack_template.h

#ifndef STACK_TYPENAME_H
#define STACK_TYPENAME_H

typedef struct {
  TYPENAME elements[100];
  int top;
} stack_TYPENAME_t;

stack_TYPENAME_t *stack_TYPENAME_create();
void stack_TYPENAME_free(stack_TYPENAME_t *stack);
void stack_TYPENAME_push(stack_TYPENAME_t *stack, TYPENAME item);
TYPENAME stack_TYPENAME_pop(stack_TYPENAME_t *stack);
TYPENAME stack_TYPENAME_peek(stack_TYPENAME_t *stack);

#endif

stack_template.c

#include <assert.h>
#include <stdlib.h>

#include "stack_TYPENAME.h"

stack_TYPENAME_t *stack_TYPENAME_create() {
  stack_TYPENAME_t *stack = (stack_TYPENAME_t *) malloc(sizeof(stack_TYPENAME_t));
  stack->top = -1;
  return stack;
}

void stack_TYPENAME_free(stack_TYPENAME_t *stack) {
  free(stack);
}

TYPENAME stack_TYPENAME_peek(stack_TYPENAME_t *stack) {
  assert(stack->top >= 0);
  return stack->elements[stack->top];
}

TYPENAME stack_TYPENAME_pop(stack_TYPENAME_t *stack) {
  assert(stack->top >= 0);
  TYPENAME element = stack->elements[stack->top];
  --stack->top;
  return element;
}

void stack_TYPENAME_push(stack_TYPENAME_t *stack, TYPENAME item) {
  assert(stack->top < 100 - 1);
  ++stack->top;
  stack->elements[stack->top] = item;
}

makefile

CC = gcc
CFLAGS = -g -Wall

all: main

stack_int.c: stack_template.c makefile
  sed -e 's/TYPENAME/int/g' $< > $@

stack_int.h: stack_template.h makefile
  sed -e 's/TYPENAME/int/g' $< > $@

stack_int.o: stack_int.c stack_int.h makefile
  $(CC) $(CFLAGS) -o $@ -c $<

stack_float.c: stack_template.c makefile
  sed -e 's/TYPENAME/float/g' $< > $@

stack_float.h: stack_template.h makefile
  sed -e 's/TYPENAME/float/g' $< > $@

stack_float.o: stack_float.c stack_float.h makefile
  $(CC) $(CFLAGS) -o $@ -c $<

main: stack_int.o stack_float.o main.c makefile
  $(CC) $(CFLAGS) -o $@ stack_int.o stack_float.o main.c

clean:
  rm -rf *.o stack_int.* stack_float.* *.dSYM main