CS1: Lecture 5 – Primitive Types
Dear students,
We’re going to start with a little game. You will pick a number, I will present to you a series of sheets with numbers written on them, you will tell me if your number is present, and I will tell you what your number is. It’ll be a little bit of magic! Or maybe a few bits of magic.
Bit Groupings
Computers seem to be pretty good with numbers, but they don’t actually know anything about them. The computer itself is just relying on physical processes like electrical flow, magnetism, and the reflection of laser light, all of which can be made to have two distinguishable physical states that nicely correspond to the 0s and 1s of the binary number system.
If we take a single wire, we can sense either high or low voltage. We call a single wire a binary digit or bit. Java has a data type that corresponds to data that can only be in two possible states: boolean
. This type is named after George Boole, who died from pneumonia he contracted after walking home in the rain after teaching.
If we have two wires, we have two states for the first wire and two for the second. That’s four possible states. What kind of data could we store with just two wires? Maybe the current season or a quadrant. Java doesn’t have a data type for data that has four possible states.
If we have three wires, we have $2 \cdot 2 \cdot 2 = 2^3 = 8$ possible states. We could store the day of the week in three bits, with one state leftover. What else?
If we have four wires, we have $2 \cdot 2 \cdot 2 \cdot 2 = 2^4 = 16$ possible states. What could we store in four bits? 4-bit chunks are sometimes called nibbles.
If we have five wires, we have $2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 = 2^5 = 32$ possible states. We could store the day of the month in five bits. What else?
If we have seven wires, we can represent the on-ness or off-ness of all seven chords of a major scale. That means with just seven wires, we can create a little musical instrument. We’ll take a brief interlude to do just that.
Jumping ahead, if we have eight wires, we have $2^8 = 256$ possible states. What could we store in eight bits? Java has a data type for 8-bits: byte
. However, the 256 possible values that can be stored in a byte
are numbers in the range [-128, 127].
If we have 16 wires, we have $2^{16} = 65536$ possible states. A lot of our digital music is stored as a sequence of 16-bit numbers. Java has a data type for 16-bits: short
. These numbers span the range [-32768, 32767].
If we have 32 wires, we have $2^{32} = 4,294,967,296$ possible states. Java has a data type for 32-bits: int
. These numbers span from -2 billion to 2 billion.
If we have 64 wires, we have $2^{64} = 18,446,744,073,709,551,616$ possible states. Java has a data type for 64-bits: long
. These numbers span from -9 quintillion to 9 quintillion.
We’ve got four different ways to store integers. This catalog leads us to a couple of questions:
- Which type should we use and when? In general, we default to using
int
. If the numbers get too big, as they easily do when measuring time, we switch tolong
. If we’re writing files that must conform to some file format that conserves space, we might usebyte
orshort
. - Why are there only four? Other languages have had a different number. A few even let you set how many digits you need. These languages use fixed-point arithmetic. But this question can be answered with another question: “Why are there are only four sizes of measuring cups?” Certainly there is no culinary law at work. It’s just that 1 cup, 1/2 cup, 1/3 cup, and 1/4 cup are good enough to cover most of our needs, and this fixed set keeps our drawers from getting cluttered with 1/7, 5/9, and so on. In the case of the computer, the hardware can be optimized to operate on well-known quantities of bits.
Floating Point
Not all of our data is integers. When we need to store a number with a fraction, we can use a 4-byte float
or an 8-byte double
. Both hold floating point numbers, so-called because they store a number like 92314.78 in scientific notation—as $9.231478 \cdot 10^4$. The decimal point “floats.” Of course, the number is in binary rather than decimal. We’d instead have a number like $1.1010111 \cdot 2^{1110}$. The floating point standard splits this number into three sets of bits:
sign | coefficient | exponent |
---|---|---|
0 | 1010111 | 1110 |
Note that we don’t store the leading 1 on the coefficient. Because the binary point is floated just to the right of the leftmost 1, there will always be a leading 1 and there’s no need explicitly store it.
The hardware that works with floating point numbers is considerably more involved than the hardware that works with integer numbers.
In general programming in Java, we default to using double
. But in the world of computer graphics, where less precision is needed, float
is more common. Is there a difference in performance between the two types? Let’s check. Here’s a program that sums up a bunch of random numbers using double
:
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.Random;
public class DoubleSum {
public static void main(String[] args) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean();
long start = bean.getCurrentThreadCpuTime();
Random generator = new Random();
double sum = 0;
for (int i = 0; i < 400000000; ++i) {
sum = sum + generator.nextDouble();
}
long end = bean.getCurrentThreadCpuTime();
double elapsed = (end - start) / 1e9;
System.out.println(elapsed);
}
}
And here’s version that uses float
instead:
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.Random;
public class FloatSum {
public static void main(String[] args) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean();
long start = bean.getCurrentThreadCpuTime();
Random generator = new Random();
float sum = 0;
for (int i = 0; i < 400000000; ++i) {
sum = sum + generator.nextFloat();
}
long end = bean.getCurrentThreadCpuTime();
double elapsed = (end - start) / 1e9;
System.out.println(elapsed);
}
}
On my trial runs on my MacBook Pro, double
takes 8 seconds and float
takes 4. It would seem that float
is a better choice. However, be careful of making blanket statements about types. Let’s write a very similar program in C:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
TYPE sum = 0;
for (long long i = 0; i < 400000000; ++i) {
sum += ((TYPE) rand()) / RAND_MAX;
}
printf("%f\n", sum);
return 0;
}
We only need one version of this program, but we can specify float
or double
to replace all the occurences of the TYPE
placeholder. We compile the code with these two lines at the terminal:
gcc -o float -DTYPE=float timetypes.c gcc -o double -DTYPE=double timetypes.c
We then run each under a timer:
> time ./float 16777216.000000 ./float 2.21s user 0.01s system 99% cpu 2.220 total > time ./double 200002758.261023 ./double 2.53s user 0.01s system 99% cpu 2.601 total
The C program runs more quickly than the Java program, and there is less disparity between the two types. The takeaway here is not that float
is better than double
, or that C is better Java. Rather, it’s that we should test our intuition about performance with real timings. You might get vastly different results on a different computer. You might find that all your concerns about speed are not worth the effort.
TODO
There’s just one last primitive type: char
. We leave that for next time.
Here’s your TODO list of things to complete before next class:
- On a quarter sheet of paper, write down two things:
- Tell me how things are going in the class for you thus far.
- Identify some non-numeric phenomenon that has multiple states and whose current state you could represent with one more bits. Like the day of the week, which can be represented with 3 bits. What is that phenomenon how many bits would you need, and what is the mapping between the bit configurations and the states?
See you next class!
P.S. It’s time for a haiku!
Bits for now, soon trits
Once we get quaternary…
Then we’ll call it quits