» CS 330: Lecture 11 – Type Systems

CS 330: Lecture 11 – Type Systems

February 26, 2018 by Chris Johnson. Filed under cs330, lectures, spring 2018.

Dear students,

Today we begin our focus of programming language concepts, rather than the tools (regex, lexers, and parsers) that we use for interpreting programming languages. In particular, we start off with types. To get us started, here’s a question to discuss with your neighbor.

What are types? Why do have them?

In the past, I used to start this class with assembly programming, because no programming language should be evaluated outside the context of history. All that humans make is shaped by what exists at the time of a thing’s making. When you write in assembly, there are no types, right?

Yes and no. Each instruction expects the data on which it operates to have a particular format. The circuit invoked by instruction x86’s addl adds twos-complement 4-byte integers. The circuit invoked by ARM’s vmul multiplies two floating-point numbers. But nothing will stop you from adding two four-letter words together.

Types in assembly are declared in the programmer’s brain. It is up to the programmer to ensure congruence between data and instruction. This just isn’t reliable. We can’t keep our fingers out of sockets or park between the lines. How are we supposed to match up data and instructions?

Enter high-level languages, in which each chunk of memory is tagged in some way by the type of data it holds. Fortran, ALGOL, and Cobol were the earliest such languages, but we’re going to skip ahead to C. The world of type systems is a high-dimensional space. Let’s look at a few of these dimensions in the context of C.

First, what sort of types does C provide? These go in the first draft of my list:

integer
real
character
array
struct
pointer
boolean (as of C99)

What types does C not provide that we’ve seen in other languages?

Strings. Null-terminated strings are a convention of the standard library that lives outside the type system. Case in point, the compiler doesn’t check that a string is still a string after I operate on it.
Enums. C’s enum is really just a shortcut for defining a bunch of integer constants.
Data structures. There’s no linked list or hashtable or stack built in to the language proper.
Maybe types, which allow a value to be represented as undefined, though pointers can use NULL for this.
Tuples, which are a composite type much like structs. They collect a fixed number of fields together under one umbrella, and the fields can be of different types. Unlike a struct, however, the fields are not named. We refer to them by their position.

Second, does C allow the user to define new types? Yes, through struct. One can also use typedef to create a shorter name for an existing type (or a longer name, if that’s your thing), but typedef doesn’t really introduce a new type.

Third, let’s consider the dimension of how types are used to ensure congruence between data and instructions. We start with developing an intuition:

Consider this C expression:
4 + 3 * 5.1
Hypothesize with your neighbor about what happens here to ensure that the data and instructions jive.

Here’s my rendition of what happens:

The * is identified as the highest-precedent operation.
The operands of * have different types. The operand with the narrower type is promoted to the more general type. So we really have
```
4 + 3.0 * 5.1
```
The instruction for multiplying floating-point numbers is invoked. on 3.0 and 5.1. The result is stored in a temporary floating-point register. We now have
```
4 + 15.3
```
We proceed as above, except now + is our operation. We promote and solve to get 19.3.

So, how does C ensure type congruency? It considers a value’s type when deciding which machine instruction to invoke. If we see a + ..., the compiler considers a‘s type to determine what flavor of add instruction should be used. The operations, in a sense, hang off the type.

If the types are inspected at compile time to figure out what instructions to invoke, we have a static type system. If the types are inspected on the fly at runtime, we have a dynamic type system.

This sort of lookup/association scheme based on type alone doesn’t have a special name, as far as I am aware. But it is not the only way of doing things.

Consider this little Ruby program:

s = 'tree'

def s.updowncase
  self[0].upcase + self[1..-1].downcase
end

puts s.updowncase

When we invoke s.updowncase, what happens? Ruby knows that s is a String. Inspecting s.class confirms this. But puts "tire".updowncase fails. That’s because updowncase is not associated with the type, but with s. This is very different from C. We call the idea of associating legal operations with individual values—but not their type—duck typing.

Duck typing, dynamic typing, and static typing—why should we know about these three systems? Surely they are just an implementation detail? Nope. You will be affected by your language’s position on this dimension. Generally, dynamic and duck typed languages allow you to write functions that are very polymorphic. As long as the values you send a function support the operations within it, you can feed it anything. Static typing is much more rigid, but because it knows the types at compile time, it can make decisions and check for errors very early in the development process.

For another demonstration of hanging legal operations off of types, let’s write a program to evaluate a hand of Blackjack. We’ll see that C doesn’t really have enums, just ints.

Here’s your TODO list for next time:

Read What to know before debating type systems. On a quarter sheet, write down 2-3 questions or observations inspired by your reading. Consider sharing your thoughts on type systems in the languages you’ve used.

See you then!

Sincerely,

P.S. It’s time for a haiku!

Privileged people
They think their type is enough
I don’t like their kind

P.P.S. Here’s the code we wrote together:

newtype.c

#include <stdio.h>
#include <stdlib.h>

struct being {
  char *name;
  char *phone_number;
};

/* typedef struct being int; */
typedef int foo;
typedef struct being grass;

int main(int argc, char **argv) {
  
  grass orjand = {"Jordan", "111-222-3333"};

  /* int failplease = {"Jordan", "111-222-3333"}; */

  foo f = 17;

  return 0;
}

case.rb

#!/usr/bin/env ruby

s = 'tree'

def s.updowncase
  self[0].upcase + self[1..-1].downcase
end

puts s.updowncase

s = 'petra'
puts s.updowncase

puts s.class
puts t.class