teaching machines

CS 330: Lecture 14 – Polymorphism and More Type Safety

March 5, 2018 by . Filed under cs330, lectures, spring 2018.

Dear students,

Like we’ve been saying, type systems keep you from invoking illegal operations on data. A good type system will alert us when we write code like this:

char alphabet[] = "abcdefghijklmnopqrstuvwxyz";
char c4 = 4[alphabet];
printf("c4: %c\n", c4);

Actually, this code compiles—without warnings—and runs just fine. Because array subscripting is commutative: a[2] is the same as 2[a]. This is an interesting curiosity only. Never lead with the subscript. I mention it because it reveals what happens to arrays under the hood. Suppose a is an array of ints:

int a[] = ...;

When we say a[2], this gets turned into pointer arithmetic and dereferenced: *(a + 2). These have different types. Adding an integer and a pointer implies a type conversion to get them into the same units: *(a + 2 * sizeof(int)).

When we say 2[a], this gets turned into *(2 + a), which gets turned into *(2 * sizeof(int) + a).

Both forms resolve to the same value. This is not technically a type safety issue. It’s just that C has some funny tolerance about ordering. We still can’t say things like 2[5].

Let’s look at a occasion where type safety is actually violated but is also actually helpful. In CS 352 you may have talked about big endian vs. little endian architectures. Suppose we declare an int, and it lives at address 1000 in memory. On a big endian machine, it will be laid out like this:

1000 1001 1002 1003
big digits small digits

On a little endian machine, it will be laid out like this:

1000 1001 1002 1003
small digits big digits

You need to know the endianness of a machine if you are writing out binary values for a standard file format. Usually these formats specify that you must write a four-byte integer in big endian ordering, or an eight-byte floating point number in little endian ordering. If you’re computer is not already operating in the specified endianness, you’ve got to swap the natural byte order.

We can write a test for endianness in C thanks to its unsafe type system. Let’s assign 1 to an int and see which end that 1 lands on. We can unsafely turn the address of our int into a char pointer and dereference it:

bool islittle() {
  // Does the address point to the little end?
  int x = 1;
  char *c = (char *) &x;
  return *x == 1;
}

When would you want unsafe typing? This function hints at the primary occasion: when you are dealing with raw data and you don’t want the compiler’s help, either because what it wants to do is too slow or because you know what’s in memory better than it does. Some of our data is so massive that it can never be brought into memory all at once. This data may not be representable in a programming language and therefore evades being shoehorned into a type.

The primary vehicle for disabling the type system in C is casting. Last time we saw how enums in C also are not type-safe. The type-checking switch is effectively dialed down from enumerated set to int, making it possible for us to use values that are not enumerated in the set. A really good question to ask yourself is why enums in C aren’t typesafe.

When the folks behind Java took up the task of adding enumerated types to version 5—which came out in 2004—they had the good fortune to see how not to do it. Their ultimate solution is worth investigating for its elegance and conformity to the Java way of doing things.

An enumerated set in Java is achieved through a single class. This class contains constants for each of its members:

public class Card {
  public static final Card ACE = new Card();
  public static final Card TWO = new Card();
  public static final Card THREE = new Card();
  public static final Card FOUR = new Card();
  public static final Card FIVE = new Card();
  public static final Card SIX = new Card();
  public static final Card SEVEN = new Card();
  public static final Card EIGHT = new Card();
  public static final Card NINE = new Card();
  public static final Card TEN = new Card();
  public static final Card JACK = new Card();
  public static final Card QUEEN = new Card();
  public static final Card KING = new Card();
}

Now, we can write code that accepts only values of the enumerated type, right?

public int score(Card... cards) {
  int total = 0;
  int nAces = 0;

  for (Card card : cards) {
    total += card;
    if (card == ACE) {
      ++nAces;
    }
  }

  if (nAces > 0 && total + 10 <= 21) {
    total += 10;
  }

  return total;
}

We can call the code like so:

score(Card.JACK, Card.ACE, Card.ACE, Card.ACE, Card.FIVE);

Two things are wrong here. First, total += card doesn’t compile. That’s int += Card, which violates the type system. In C, this wasn’t an issue because enums were ints. In Java, because these are classes, we can give each member an instance variable to hold its value:

public class Card {
  private int value;

  public Card(int value) {
    this.value = value;
  }

  public int getValue() {
    return value;
  }

  public static final Card ACE = new Card(1);
  public static final Card TWO = new Card(2);
  public static final Card THREE = new Card(3);
  public static final Card FOUR = new Card(4);
  public static final Card FIVE = new Card(5);
  public static final Card SIX = new Card(6);
  public static final Card SEVEN = new Card(7);
  public static final Card EIGHT = new Card(8);
  public static final Card NINE = new Card(9);
  public static final Card TEN = new Card(10);
  public static final Card JACK = new Card(11);
  public static final Card QUEEN = new Card(12);
  public static final Card KING = new Card(13);
}

That fixes the value problem. But we’ve still got a type-safety issue:

score(new Card(21));

We just snuck in a magic card. Boo. That was the same issue we saw in C. But wait! Java gives us an additional vehicle for addressing this: access modifiers. Let’s make the constructor private. The only thing that will be able to make new instances of Card is the Card class itself.

public class Card {
  private int value;

  private Card(int value) {
    this.value = value;
  }

  // ...
}

And this is the story of how Java got type-safe enums. The real implementation is effectively what we have just described, but the Java grammar allows a slightly simpler syntax:

public enum Card {
  ACE(1),
  TWO(2),
  THREE(3),
  FOUR(4),
  FIVE(5),
  SIX(6),
  SEVEN(7),
  EIGHT(8),
  NINE(9),
  TEN(10),
  JACK(11),
  QUEEN(12),
  KING(13);

  private int value;

  public Card(int value) {
    this.value = value;
  }

  public int getValue() {
    return value;
  }
}

Okay, enough about type safety. Let’s talk about polymorphism.

All my favorite talks try to classify some phenomenon in a 2×2 matrix, so let’s do the same. Let’s categorize human endeavors into four buckets based on how much effort is involved and what effect they have on the broader world:

Little Effort Lots of Effort
Small Effect sleeping voting
teaching
Large Effect washing hands John Henry
learning Vim

As soon as we start giving our data types, we introduce a problem. If the functions we write only operate on a single type, we’ll have to write a separate function for each type we will need to support. Ugh. That’s a lot of effort to achieve a large effect.

Is there a way we can get a single piece of code to work with multiple types? Yes. Polymorphism will help us get a big return out of just a little investment. Just like washing our hands.

We usually think of polymorphism as a feature of object-oriented programming, but polymorphism comes in many forms:

Let’s discuss the first two of these today.

We rely on coercion a fair bit. Consider this definition of the hypot function:

double hypot(double x, double y) {
  return sqrt(x * x + y * y);
}

int main(int argc, char **argv) {
  std::cout << hypot((char) 3, (char) 4) << std::endl;
  std::cout << hypot(3, 4) << std::endl;
  std::cout << hypot(3.0f, 4.0f) << std::endl;
  std::cout << hypot(3.0, 4.0) << std::endl;
}

Function hypot accepts doubles for sure, but also ints, float, chars, and any type that the compiler can coerce into a double. That’s great for builtin types, but what about custom types? Suppose we’re building our own string class named Zing, one that isn’t O(n) for every operation like C strings are. We’d love to be able to say something like this:

Zing z = "foo";

What’s the type on the left-hand side? The right-hand side? Will this work? If you provide a pathway, it will:

class Zing {
  public:
    Zing(const char *s) {
      n = strlen(s);
      text = new char[n];
      memcpy(text, s, n);
    }

    ~Zing() {
      delete[] text;
    }

  private:
    char *text;
    int n; 
};

The compiler will sense that it can convert the right-hand side’s const char * type into the Zing type through this converting constructor. Sometimes such implicit conversions can cover up mistakes, and C++ provides an explicit modifier to prohibit such automatic conversion:

explicit Zing(const char *s) {
  n = strlen(s);
  text = new char[n];
}

Coercion can be seen as a polymorphic funnel. Many types converge to an encompassing type, and a single function is written to the encompassing type.

In ad hoc polymorphism, we have several definitions of the function available for various types. What are some famous ad hoc polymorphic methods you know from Java? The first one that comes to my mind is PrintStream.println. How many different versions of this method exist? The answer can be reasoned out.

For Zing, it’d be really nice to be able to access individual characters of the string. How shall we provide that functionality? It’d be really nice to overload the [] operator. Let’s do it:

char operator[](int i) {
  return text[i];
}

Now our string feels a lot like an array. Or does it? What happens here?

Zing z = "goliath";
z[0] = 'G';

To make the character writable, we can return a reference instead of a temporary:

char &operator[](int i) {
  return text[i];
}

Now suppose we want to print the whole string just like we can print other things in C++:

std::cout << z << std::endl;

What do we want to overload here? A method of cout? The rules of C++ state that the compiler will look first for something like this:

ostream &ostream::operator<<(const Zing &z);

But there’s no way we can sneak into the ostream class and an overloaded definition for our new class. However, if this method can’t be found, the compiler then looks for a method like this:

ostream &operator<<(ostream &stream, const Zing &z);

The difference here is that operator<< is not owned by any class. It’s a top-level definition. We can write it like so:

std::ostream &operator<<(std::ostream &stream, const Zing &z) {
  for (int i = 0; i < z.Length(); ++i) {
    stream << z[i];
  }
  return stream;
}

For this to work, we need another version of operator[] that works with const strings:

const char &operator[](int i) const {
  return text[i];
}

There we have it. Two different means of making our handle several different types. And neither has to do with inheritance!

Here’s your TODO list for next time:

See you then!

Sincerely,

P.S. It’s time for a haiku!

The first was for men
The second was for women
There was no restroom

P.P.S. Here’s the code we wrote together:

commute.c

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  char abc[] = "abcdefghijklmnopqrstuvwxyz";
  char c4 = 4[abc];
  printf("c4: %c\n", c4);
  return 0;
}

endian.c

#include <stdio.h>
#include <stdlib.h>

int islittle() {
  int i = 1;
  char *p = (char *) &i;
  char addressed_byte = *p;
  return addressed_byte;
}

int main(int argc, char **argv) {
  printf("islittle(): %d\n", islittle());
  return 0;
}

Blackjack.java

public class Blackjack {
  public static int score(Card... cards) {
    int total = 0;
    int nAces = 0;

    for (Card card : cards) {
      total += card.value();
      if (card == Card.ACE) {
        ++nAces;
      }
    }

    if (nAces > 0 && total + 10 <= 21) {
      total += 10;
    }

    return total;
  } 

  public static void main(String[] args) {
    int score = score(Card.ACE, Card.THREE, Card.KING);
    /* int score = score(new Card(21)); */
    System.out.println("score: " + score);
  }
}

class Card {
  private int value;

  private Card(int value) {
    this.value = value;
  }

  public int value() {
    return value;
  }

  public static Card ACE = new Card(1);
  public static Card TWO = new Card(2);
  public static Card THREE = new Card(3);
  public static Card KING = new Card(10);
}

Card2.java

public enum Card2 {
  ACE(1),
  TWO(2),
  THREE(3),
  KING(10);

  private int value;

  private Card2(int value) {
    this.value = value;
  }

  public int value() {
    return value;
  }
}

math.cpp

#include <iostream>
#include <cmath>

double hypot(double a, double b) {
  return sqrt(a * a + b * b);
}

int main(int argc, char **argv) {
  double c = hypot(3, 4); 
  double c2 = hypot(3.0f, 4.0f); 
  double c3 = hypot((char) 3, (char) 4); 
  /* double d = hypot(5, 12);  */
  std::cout << "c: " << c << std::endl;
  /* std::cout << "d: " << d << std::endl; */
  return 0;
}