CS 330: Lecture 22 – Templates
Dear students,
The most common application of templates is to create reusable collections. But in C++ we can also pass the compiler a value to help it specialize the templated pattern. For example, suppose we want lightning fast multiplication. Let’s take a first stab, which is anything but lightning fast:
int multiply(int a, int n) {
int sum = 0;
for (int i = 0; i < n; ++i) {
sum += a;
}
return sum;
}
Compare this to a templated version of multiply
. Here one of the coefficients comes in as a template parameter:
template<int n>
int multiply(int a) {
int sum = 0;
for (int i = 0; i < n; ++i) {
sum += a;
}
return sum;
}
Now consider this function that invokes the two multiply
s:
void foo() {
multiply(2, 7);
multiply<2>(7);
}
What will the compiler do in the first case? It will generate very general machine code that invokes the multiplication hardware through a multiplication instruction, which tends to consume more cycles than addition—though they are close on modern architectures.
In the second case, the compiler is equipped with more information. It knows one of the operands. That means it can tailor the machine code it generates to the situation at hand. It may be able to avoid the multiplication unit! Let’s verify this using Matt Godbolt’s Compiler Explorer.
Let’s annotate the assembly for just the non-template version with absolutely no optimizations:
multiply(int, int):
pushq %rbp # store the caller's base pointer; we might change it
movq %rsp, %rbp # set our base pointer to the current stack pointer
movl %edi, -4(%rbp) # copy parameter 1 from register edi to stack variable A
movl %esi, -8(%rbp) # copy parameter 2 from register esi to stack variable B
movl $0, -12(%rbp) # zero out stack variable C, the accumulator
movl $0, -16(%rbp) # zero out stack variable D, the iterator
.LBB0_1:
movl -16(%rbp), %eax # copy D into register eax
cmpl -8(%rbp), %eax # compare eax and B
jge .LBB0_4 # if eax >= B, break
movl -4(%rbp), %eax # copy A to eax
addl -12(%rbp), %eax # add C onto eax
movl %eax, -12(%rbp) # copy eax into C
movl -16(%rbp), %eax # copy D into eax
addl $1, %eax # increment eax
movl %eax, -16(%rbp) # copy eax into D
jmp .LBB0_1 # go back to start of loop
.LBB0_4:
movl -12(%rbp), %eax # copy C into eax
popq %rbp # restore caller's base pointer
retq
foo():
pushq %rbp
movq %rsp, %rbp
movl $3, %edi
movl $2, %esi
callq multiply(int, int)
movl %eax, p
popq %rbp
retq
If we compile this with a little bit of optimizations, things get better:
multiply(int, int):
imull %esi, %edi # edi *= esi
xorl %eax, %eax # eax = 0
testl %esi, %esi # compare esi to 0
cmovgl %edi, %eax # eax = edi if esi > 0
retq
foo():
pushq %rax
movl $3, %edi
movl $2, %esi
callq multiply(int, int)
movl %eax, p(%rip)
popq %rax
retq
I don’t quite understand the need for the xorl
through cmovgl
bit. Still, the compiler was smart enough to recognize that we were trying to do multiplication. That’s impressive, I think.
Let’s compare this the template version, the version where the compiler knows one of the operands:
foo():
movl $7, %edi
jmp int multiply4(int)
int multiply4(int):
leal (,%rdi,4), %eax # eax = 0 + rdi * 4
# leal (%rdi,%rdi,2), %eax # when 3 * 7, eax = rdi + rdi * 2
# leal (%rdi,%rdi,4), %eax # when 5 * 7, eax = rdi + rdi * 4
retq
Instead of invoking the general multiplication hardware with an imull
instruction, we get the load-effective-address (leal
) instruction. This instruction has special support in hardware and is going to be much faster than general multiplication. We’ll try several different values for the template parameter and see what magic the compiler can work.
Okay, on to something else.
One of the takeaways of this class is that good ideas are present in all languages. We should recognize a good idea when we see one and steal it. We might not choose to work with that language, but let’s take the idea back to the languages we do use. Templates are a pretty good idea, so let’s take them to C.
C doesn’t make this very easy. But let’s try writing a stack data structure. Here’s the header:
#ifndef STACK_T_H
#define STACK_T_H
typedef struct {
T elements[100];
int top;
} stack_T_t;
stack_T_t *stack_T_create();
void stack_T_free(stack_T_t *stack);
void stack_T_push(stack_T_t *stack, T item);
T stack_T_pop(stack_T_t *stack);
T stack_T_peek(stack_T_t *stack);
#endif
And the implementation:
#include <assert.h>
#include <stdlib.h>
#include "stack_T.h"
stack_T_t *stack_T_create() {
stack_T_t *stack = (stack_T_t *) malloc(sizeof(stack_T_t));
stack->top = -1;
return stack;
}
void stack_T_free(stack_T_t *stack) {
free(stack);
}
T stack_T_peek(stack_T_t *stack) {
assert(stack->top >= 0);
return stack->elements[stack->top];
}
T stack_T_pop(stack_T_t *stack) {
assert(stack->top >= 0);
T element = stack->elements[stack->top];
--stack->top;
return element;
}
void stack_T_push(stack_T_t *stack, T item) {
assert(stack->top < 100 - 1);
++stack->top;
stack->elements[stack->top] = item;
}
Now, let’s write a main
that makes two stacks of different types:
#include <stdio.h>
#include "stack_int.h"
#include "stack_float.h"
int main(int argc, char **argv) {
stack_int_t *ints = stack_int_create();
stack_int_push(ints, 5);
stack_int_push(ints, 6);
stack_int_push(ints, 7);
printf("%d\n", stack_int_peek(ints));
stack_float_t *floats = stack_float_create();
stack_float_push(floats, 5.5f);
stack_float_push(floats, 6.6f);
stack_float_push(floats, 7.7f);
printf("%f\n", stack_float_peek(floats));
return 0;
}
But stack_int_t
and stack_float_t
don’t really exist. All we have is stack_T_t
. What can do we do? We can go through the template and substitute all the occurrences of T
for int
and float
—and any other types we need. This sounds pretty annoying, but we can write some rules in our makefile
to hide the uglyness of this task. Here’s our rule for stack_int
:
stack_int.c: stack_template.c makefile
sed -e 's/TYPENAME/int/g' $< > $@
stack_int.h: stack_template.h makefile
sed -e 's/TYPENAME/int/g' $< > $@
stack_int.o: stack_int.c stack_int.h makefile
$(CC) $(CFLAGS) -o $@ -c $<
The rule for our main
sets the substitution process in motion:
main: stack_int.o stack_float.o main.c makefile
$(CC) $(CFLAGS) -o $@ stack_int.o stack_float.o main.c
This closes out our discussion of the four kinds of polymorphism. Whew. Next time we began our examination of functional programming through the lens of Haskell. Here’s your TODO list:
- Read chapter 1 of Real World Haskell. On a quarter sheet, write a simple Haskell program to calculate some value of interest to you. Write down any questions or observations you have.
See you next time!
P.S. It’s time for a haiku!
None of it’s source code
I am the source, not the code
The bugs, however
P.P.S. Here’s the code we wrote together:
stack_template.h
#ifndef STACK_TYPENAME_H
#define STACK_TYPENAME_H
typedef struct {
TYPENAME elements[100];
int top;
} stack_TYPENAME_t;
stack_TYPENAME_t *stack_TYPENAME_create();
void stack_TYPENAME_free(stack_TYPENAME_t *stack);
void stack_TYPENAME_push(stack_TYPENAME_t *stack, TYPENAME item);
TYPENAME stack_TYPENAME_pop(stack_TYPENAME_t *stack);
TYPENAME stack_TYPENAME_peek(stack_TYPENAME_t *stack);
#endif
stack_template.c
#include <assert.h>
#include <stdlib.h>
#include "stack_TYPENAME.h"
stack_TYPENAME_t *stack_TYPENAME_create() {
stack_TYPENAME_t *stack = (stack_TYPENAME_t *) malloc(sizeof(stack_TYPENAME_t));
stack->top = -1;
return stack;
}
void stack_TYPENAME_free(stack_TYPENAME_t *stack) {
free(stack);
}
TYPENAME stack_TYPENAME_peek(stack_TYPENAME_t *stack) {
assert(stack->top >= 0);
return stack->elements[stack->top];
}
TYPENAME stack_TYPENAME_pop(stack_TYPENAME_t *stack) {
assert(stack->top >= 0);
TYPENAME element = stack->elements[stack->top];
--stack->top;
return element;
}
void stack_TYPENAME_push(stack_TYPENAME_t *stack, TYPENAME item) {
assert(stack->top < 100 - 1);
++stack->top;
stack->elements[stack->top] = item;
}
makefile
CC = gcc
CFLAGS = -g -Wall
all: main
stack_int.c: stack_template.c makefile
sed -e 's/TYPENAME/int/g' $< > $@
stack_int.h: stack_template.h makefile
sed -e 's/TYPENAME/int/g' $< > $@
stack_int.o: stack_int.c stack_int.h makefile
$(CC) $(CFLAGS) -o $@ -c $<
stack_float.c: stack_template.c makefile
sed -e 's/TYPENAME/float/g' $< > $@
stack_float.h: stack_template.h makefile
sed -e 's/TYPENAME/float/g' $< > $@
stack_float.o: stack_float.c stack_float.h makefile
$(CC) $(CFLAGS) -o $@ -c $<
main: stack_int.o stack_float.o main.c makefile
$(CC) $(CFLAGS) -o $@ stack_int.o stack_float.o main.c
clean:
rm -rf *.o stack_int.* stack_float.* *.dSYM main