teaching machines

CS1: Lecture 5 – Primitive Types

September 13, 2019 by . Filed under cs1, fall 2019, lectures.

Dear students,

We’re going to start with a little game. You will pick a number, I will present to you a series of sheets with numbers written on them, you will tell me if your number is present, and I will tell you what your number is. It’ll be a little bit of magic! Or maybe a few bits of magic.

Bit Groupings

Computers seem to be pretty good with numbers, but they don’t actually know anything about them. The computer itself is just relying on physical processes like electrical flow, magnetism, and the reflection of laser light, all of which can be made to have two distinguishable physical states that nicely correspond to the 0s and 1s of the binary number system.

If we take a single wire, we can sense either high or low voltage. We call a single wire a binary digit or bit. Java has a data type that corresponds to data that can only be in two possible states: boolean. This type is named after George Boole, who died from pneumonia he contracted after walking home in the rain after teaching.

If we have two wires, we have two states for the first wire and two for the second. That’s four possible states. What kind of data could we store with just two wires? Maybe the current season or a quadrant. Java doesn’t have a data type for data that has four possible states.

If we have three wires, we have $2 \cdot 2 \cdot 2 = 2^3 = 8$ possible states. We could store the day of the week in three bits, with one state leftover. What else?

If we have four wires, we have $2 \cdot 2 \cdot 2 \cdot 2 = 2^4 = 16$ possible states. What could we store in four bits? 4-bit chunks are sometimes called nibbles.

If we have five wires, we have $2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 = 2^5 = 32$ possible states. We could store the day of the month in five bits. What else?

If we have seven wires, we can represent the on-ness or off-ness of all seven chords of a major scale. That means with just seven wires, we can create a little musical instrument. We’ll take a brief interlude to do just that.

Jumping ahead, if we have eight wires, we have $2^8 = 256$ possible states. What could we store in eight bits? Java has a data type for 8-bits: byte. However, the 256 possible values that can be stored in a byte are numbers in the range [-128, 127].

If we have 16 wires, we have $2^{16} = 65536$ possible states. A lot of our digital music is stored as a sequence of 16-bit numbers. Java has a data type for 16-bits: short. These numbers span the range [-32768, 32767].

If we have 32 wires, we have $2^{32} = 4,294,967,296$ possible states. Java has a data type for 32-bits: int. These numbers span from -2 billion to 2 billion.

If we have 64 wires, we have $2^{64} = 18,446,744,073,709,551,616$ possible states. Java has a data type for 64-bits: long. These numbers span from -9 quintillion to 9 quintillion.

We’ve got four different ways to store integers. This catalog leads us to a couple of questions:

Floating Point

Not all of our data is integers. When we need to store a number with a fraction, we can use a 4-byte float or an 8-byte double. Both hold floating point numbers, so-called because they store a number like 92314.78 in scientific notation—as $9.231478 \cdot 10^4$. The decimal point “floats.” Of course, the number is in binary rather than decimal. We’d instead have a number like $1.1010111 \cdot 2^{1110}$. The floating point standard splits this number into three sets of bits:

sign coefficient exponent
0 1010111 1110

Note that we don’t store the leading 1 on the coefficient. Because the binary point is floated just to the right of the leftmost 1, there will always be a leading 1 and there’s no need explicitly store it.

The hardware that works with floating point numbers is considerably more involved than the hardware that works with integer numbers.

In general programming in Java, we default to using double. But in the world of computer graphics, where less precision is needed, float is more common. Is there a difference in performance between the two types? Let’s check. Here’s a program that sums up a bunch of random numbers using double:

import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.Random;

public class DoubleSum {
  public static void main(String[] args) {
    ThreadMXBean bean = ManagementFactory.getThreadMXBean();
    long start = bean.getCurrentThreadCpuTime();

    Random generator = new Random();
    double sum = 0;
    for (int i = 0; i < 400000000; ++i) {
      sum = sum + generator.nextDouble();
    }
    
    long end = bean.getCurrentThreadCpuTime();
    double elapsed = (end - start) / 1e9;
    System.out.println(elapsed);
  }
}

And here’s version that uses float instead:

import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.Random;

public class FloatSum {
  public static void main(String[] args) {
    ThreadMXBean bean = ManagementFactory.getThreadMXBean();
    long start = bean.getCurrentThreadCpuTime();

    Random generator = new Random();
    float sum = 0;
    for (int i = 0; i < 400000000; ++i) {
      sum = sum + generator.nextFloat();
    }
    
    long end = bean.getCurrentThreadCpuTime();
    double elapsed = (end - start) / 1e9;
    System.out.println(elapsed);
  }
}

On my trial runs on my MacBook Pro, double takes 8 seconds and float takes 4. It would seem that float is a better choice. However, be careful of making blanket statements about types. Let’s write a very similar program in C:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  TYPE sum = 0;
  for (long long i = 0; i < 400000000; ++i) {
    sum += ((TYPE) rand()) / RAND_MAX;
  }
  printf("%f\n", sum);
  return 0;
}

We only need one version of this program, but we can specify float or double to replace all the occurences of the TYPE placeholder. We compile the code with these two lines at the terminal:

gcc -o float -DTYPE=float timetypes.c
gcc -o double -DTYPE=double timetypes.c

We then run each under a timer:

> time ./float
16777216.000000
./float  2.21s user 0.01s system 99% cpu 2.220 total
> time ./double
200002758.261023
./double  2.53s user 0.01s system 99% cpu 2.601 total

The C program runs more quickly than the Java program, and there is less disparity between the two types. The takeaway here is not that float is better than double, or that C is better Java. Rather, it’s that we should test our intuition about performance with real timings. You might get vastly different results on a different computer. You might find that all your concerns about speed are not worth the effort.

TODO

There’s just one last primitive type: char. We leave that for next time.

Here’s your TODO list of things to complete before next class:

See you next class!

Sincerely,

P.S. It’s time for a haiku!

Bits for now, soon trits
Once we get quaternary…
Then we’ll call it quits