teaching machines

CS 430: Lecture 8 – Activation

April 5, 2022 by . Filed under lectures, programming languages, spring-2022.

Dear students,

Today we take a foray into a technical topic: the mechanics of calling functions. We might think this is information that is pretty well hidden from us when we’re writing in a high-level language. But abstractions leak. The way that functions are called bubbles up into the design of our languages. Having some awareness of what’s happening under the hood will help you write safe and faster code and will help you empathize with the designers of a language.

Where We’ve Been

But first, let’s just situate ourselves in the flow of this course. At the very beginning of the semester we discussed syntax, learning how a grammar influences what our programs look like. Then we moved on to discussing the meaning of things we utter in our programs, things like variables, types, operations, control structures, and parameters. These topics form the semantics of a programming language.

Now we move on to implementation details. For the next several weeks, we’ll be looking at how certain features are made to work. In particular, we’ll be looking at how functions are called, how data and code is combined into classes, and how programs are made to run concurrently.

Managing Function State

Your mind is consumed by the latest programming assignment. Then someone rings your doorbell. You panic. Dirty laundry carpets the floor. Dirty dishes fill the sink and counter. Your roommate’s pizza crusts have been squirreled away in random places. Immediately you begin stashing things away and switching over your mind to accommodate this new activity of having company over. The exact same thing happens when you call a function. Let’s talk about the mad scramble that accompanies a function call using the ubiquitous C calling conventions.

The calling function will be paused while the called function executes, but the calling function’s state must be preserved so it can be restored when the called function finishes. Only one function is active at a time, so the memory can be represented as a stack structure. The current function’s local state is at the top of the stack. In many operating systems, a process is laid out like this in memory:

high stack
heap
globals
low code

Technically, the stack grows downward, with its top at the bottom. A called function will get its own state below the caller’s on the stack. That state is called a stack frame or activation record.

In x86-64, two registers are used to indicate the extent of a function’s stack frame. Register %rbp is the base pointer or frame pointer. It points at the start of the function’s memory. Your book calls it the environment pointer (EP). This pointer is fixed in place during the function’s execution. The register %rsp points at the last allocated memory cell of the stack. It will grow and shrink as the function allocates or deallocates stack memory. We change %rsp either directly with a subq $NBYTES, %rsp instruction or indirectly via pushq RVALUE or popq LVALUE instructions. Calling Without Parameters Let’s see what happens when a call to a 0-arity function is encountered. Consider this C code: int rand(); int f() { return 3 + rand(); }  We compile this down to assembly, not machine code, to see what the machine does to implement the function call. This invocation of gcc will produce the assembly: gcc -fno-asynchronous-unwind-tables -o assembly.s -S -O0 file.c  And here’s a portion of the assembly that results: callq _rand addl$3, %eax


What can you tell about how function calls are implemented by looking at this assembly?

The return value from rand is available in %eax. This is generally how simple data is returned from functions. Sometimes the return value might be sent along as an out parameter. Other times the return value might be sent back using other registers.

One thing that you can’t tell from this code is that the callq instruction implicitly does something like this:

pushq ADDRESS-OF-NEXT-INSTRUCTION


The return address is pushed onto the stack so the CPU can navigate back to the caller when rand finishes. Sometimes malicious programs will take advantage of this and try to write a different address into the stack, one that leads to code the malicious program has inserted somewhere else in memory.

Register %rip is the instruction pointer or program counter. We can’t write to it it directly in x86-64. Instead we use call or jmp.

Here’s the code in its fuller context:

# Prologue
pushq %rbp
movq  %rsp, %rbp

callq _rand
addl  $3, %eax # Epilogue popq %rbp retq  The first few lines constitute the prologue, the setup code that all functions include. We push the caller’s base pointer onto the stack. Your book calls this the dynamic link. With that safely archived, we update the base pointer for f so that it points at the location of the caller’s base pointer. We call the rand function and get its result in %eax, as we have seen. Then we reach the epilogue, the sequence of instructions that all functions follow to close out the function call. We grab the caller’s base pointer off the stack and restore it to the register. The retq instruction grabs the caller’s return address off the stack and jumps to it. Effectively it is this instruction: popq %rip  Calling With Parameters We’ve seen how control is passed to a parameterless function and how its return value is managed. Let’s now consider this call to a function with parameters: void g(int a, int b, int c); void f() { g(5 + 10, 20, 30); }  This code compiles down to this assembly: # Prologue... movl$15, %edi
movl  $20, %esi movl$30, %edx
callq _g

# Epilogue...


There are several things to observe in this code:

• Even with no optimizations, gcc will add the 5 and 10 together. A more conservative compiler would insert an add instruction.
• The parameter expressions are evaluated eagerly in left-to-right order before the function is called.
• The parameters are passed in registers.

What happens if we add more parameters? Let’s have 8 of them total:

void g(int a, int b, int c, int d, int e, int f, int g, int h);
void f() {
g(5 + 10, 20, 30, 4, 5, 6, 7, 8);
}


This code compiles down to this assembly:

# Prologue ...

subq  $16, %rsp movl$15, %edi
movl  $20, %esi movl$30, %edx
movl  $4, %ecx movl$5, %r8d
movl  $6, %r9d movl$7, (%rsp)
movl  $8, 8(%rsp) callq _g addq$16, %rsp

# Epilogue ...


We can make some new observations:

• Only the first six parameters are passed in registers.
• The stack pointer is adjusted by 16 bytes.
• The remaining actual parameters are placed on the stack in reverse order. The first parameter is nearest to the dividing line between stack frames.

Accessing Parameters and Local Variables

We’ve seen the mechanics of setting up and tearing down a function. Let’s look now at how parameters and local variables are accessed. Consider this C function which accepts eight parameters and allocates two local variables:

int sum(int a, int b, int c, int d, int e, int f, int g, int h) {
int x = a + h;
int y = x + 13;
return y;
}


The compiler might translate this function into this assembly:

# Prologue...

movl  24(%rbp), %eax
movl  16(%rbp), %r10d
movl  %edi, -4(%rbp)
movl  %esi, -8(%rbp)
movl  %edx, -12(%rbp)
movl  %ecx, -16(%rbp)
movl  %r8d, -20(%rbp)
movl  %r9d, -24(%rbp)

movl  -4(%rbp), %ecx

movl  %ecx, %eax

# Epilogue...


We can make the following observations about how local data is accessed:

• The base pointer acts as a foothold whereby all parameters and stack variables are accessed. Recall that %rbp itself points to the location of the caller’s archived base pointer. 8 bytes above this is the caller’s return address.
• 16 bytes above the archived base pointer is the first parameter. 24 bytes above is the second parameter. And so on.
• 4 bytes below the archived base pointer is the first local variable. 8 bytes below is the second local variable. And so on.
• The stack pointer is never adjusted, even local variables are being added to the stack. Since this function doesn’t call any other functions, it doesn’t need to worry about making sure %rsp is accurate.

Interestingly, there are a bunch of unnecessary movl instructions. Parameters b through f are never used, but yet they are spilled into local memory. This is probably done so that the registers can be used by other instructions.

Only so many registers available, and functions have to coordinate how they change them so they don’t clobber each other’s state. In x86-64, the convention is that a function is free to modify %rax, %rcx, %rdx, %r8, %r9, %r10, and %r11 because the caller has preserved its copies on the stack before the call. These are the caller-saved registers. If the function needs to use other registers, it must push the caller’s values of these registers on the stack and then restore them as the function’s finish. These other registers are callee-saved registers.

Calling Conventions in Brief

In summary, the following steps happen on a function call:

1. In the caller, evaluate the actual parameters in left-to-right order.
2. In the caller, push the actual parameters onto the stack in reverse order or drop them into designated parameter registers.
3. In the caller, push the return address on the stack.
4. In the callee, push the caller’s base pointer on the stack.
5. In the callee, allocate space for local variables by adjusting %rsp or just writing into memory below %rbp.
6. Do the work of the callee.
7. In the callee, deallocate the local variables.
8. Pop the caller’s base pointer.
10. Release the memory consumed by the parameters.

Somewhere near the beginning of this process, the caller will need to spill its caller-saved registers to the stack. Somewhere near the end, the callee will need to unspill any callee-saved registers.

Static Scoping

All of the discussion above is focused on C and x86-64. C takes a very simple view of functions: they are all defined at the top-level. Many languages allow functions to nest. If an inner function references variables from an outer function, we must have a scheme for accessing another function’s stack frame.

When the compiler encounters a free variable, it works outward through the chain of surrounding scopes to look for the variable’s declaration. The number of scopes it takes to find the declaration is recorded as the chain offset. The variable’s offset within that function’s stack frame is recorded as the local offset. This all happens at compile time.

At runtime, every time a function is called, a static link is inserted in the called function’s stack frame. The static link points to the stack frame of the called function’s outer function—which is not generally the same as the caller. When a free variable is referenced, the chain offset is used to traverse up these static links (the static chain). Then the local offset is used to find the outer function’s variable. The code to do this traversing is all generated at compile time, so the lookups aren’t as slow as they might be.

Dynamic Scoping

Recall that if we look for free variables in the caller rather than in the surrounding scope, we have dynamic scoping. One might implement this flavor of scoping by walking up the dynamic links (the caller’s base pointer) until one encounters a stack frame that has a declaration for the referenced variable. This scheme is called deep access. Since a function’s caller is only known at runtime, the compiler can’t automatically generate code to walk up the dynamic chain. Deep access is slow.

An alternative that allows faster lookup is to maintain a stack for each active variable name in the program. When a function accesses foo, the value is pulled installed off the top of the stack for foo. No traversal of the dynamic chain is necessary. However, when the function returns, the stacks for all of its local variables must be popped.

To be perfectly honest, I don’t know why we still talk about dynamic scoping. One of its benefits is that we can effectively pass data to functions without using parameters, but even this is not a clear benefit.

Exercises

To illustrate how these scope lookups might be implemented, let’s work through a couple of exercises and draw the changing stack. We’ll assume that each activation record has the following structure:

A

01  program P() {
02
03    var x = 'p'
04
05    function A() {
06      println(x)
07    }
08
09    function B() {
10      var x = 'b'
11      function C() {
12        var x = 'c'
13        println(x)
14        D()
15      }
16      function D() {
17        println(x)
18        A()
19      }
20      C()
21    }
22    B()
23  }

B

01   function main() {
02     var x
03     function bigsub() {
04       var a, b, c
05       function sub1() {
06         var a, d
07         a = b + c             // 1
08       }
09       function sub2(x) {
10         var b, e
11         function sub3() {
12           var c, e
13           sub1()
14           e = b + a           // 2
15         }
16         sub3()
17         a = d + e             // 3
18       }
19       sub2(7)
20     }
21     bigsub()
22   }

See you next time!

Sincerely,

P.S. It’s time for a haiku.

My function just broke
I took it back to the store
No returns, they said