teaching machines

CS1: Lecture 28 – Indexed Data

November 6, 2019 by . Filed under cs1, fall 2019, lectures.

Dear students,

We officially close out our discussion of the Computer as a Pilot, during which we made the computer navigate this way and that way and back again through our code using conditional statements and loops. But, like always, these ideas of branching and repetition will never leave us.

Computers make a lot of things possible. In particular, they can be used to…

The last two of these are what we focus on next. We enter our treatment of the Computer as a Factory Worker. Up till now, our data has been fairly small and singular. We declared just a few variables in the programs we’ve written. Now we’ll begin to use data that can be numbered in the hundreds or thousands.

Dealing with Lots of Data

Let’s do a little data analysis. Let’s plot an ASCII histogram of how many siblings we have. We’ll first collect our numbers in a file, and then we’ll process it with something like this:

File file = new File("siblings.txt");
Scanner in = new Scanner(file);

while (in.hasNextInt()) {
  int nsiblings = in.nextInt();
  ...
}

What exactly do we do with nsiblings? Well, we want to know how many 0-sibling situations there are, how many 1-sibling situations, 2-sibling, and so on. We might make this first attempt:

int count0 = 0;
int count1 = 0;
int count2 = 0;
int count3 = 0;
int count4 = 0;

while (in.hasNextInt()) {
  int nsiblings = in.nextInt();
  if (nsiblings == 0) {
    ++count0; 
  } else if (nsiblings == 1) {
    ++count1; 
  } else if (nsiblings == 2) {
    ++count2; 
  } else if (nsiblings == 3) {
    ++count3; 
  } else if (nsiblings == 4) {
    ++count4; 
  }
}

Are there any drawbacks to this approach? I can think of a couple. First, that’s a lot of labor. Second, we’re not done. My mother-in-law had nine kids in her family. My father-in-law had eight. (My wife had three Uncle Daves and two Uncle Matts.) We’re going to need a few more variables. Ugh.

Array as an Indexed Collection

But wait, there’s a better way. Most decent programming languages provide a data type called an array. It lets you collection up a bunch of data under one name. We could create an array that held maybe 20 ints. We do so like this:

int[] counts = new int[20];

In memory, we’ll get 20 ints laid out in sequence. They will be automatically wiped to 0 by the Java virtual machine. The individual boxes within the array are just ints. We can access one through an index:

System.out.println(counts[i]); // reading
counts[i] = 45;                // writing
counts[i] += 1;                // both

Where I have i, you can put any integer expression: a literal, a variable, a method call, anything that yields an int. What indices do you suppose are legal? Just as with String, indices start at 0 and go up to 1 less than the length.

The flexibility of that name is what makes arrays so powerful. We can reduce our earlier code to this:

int[] counts = new int[20];

while (in.hasNextInt()) {
  int nsiblings = in.nextInt();
  ++counts[nsiblings];
}

After we get all our tallies, it’s really easy to walk through the array and print out the results:

for (int i = 0; i < counts.length; ++i) {
  System.out.println("%d -> %d", i, counts[i]);
}

Let’s look at another example. This time we’ll test the crazy Benford’s Law, which states leading digits of numbers are not uniformly distributed. We’ll examine the populations of cities across the US, and plot a histogram of how their leading digits are distributed.

Initializing an Array

In these examples, we built up the array as we processed the input. Sometimes we know what data should go in the array ahead of time. In such cases, there are two ways to initialize arrays:

Usually the second of these is preferred.

Arrays can often replace decision structures that we write to simply choose between data. For instance, we might write a program that draws stripes across an image. Without arrays, we’d use an if statement to pick our color:

create image
for each row r
  for each column c
    if c % nstripes == 0
      pick first color
    elsif c % nstripes == 1
      pick second color
    elsif c % nstripes == 2
      pick third color
    elsif ...
      ...
    else
      pick last color
    set pixel (c, r) to color

But with arrays, we can do a much simpler lookup into a collection of colors:

colors = [color0, color1, color2, ...]
create image
for each row r
  for each column c
    i = c % number of colors
    set pixel (c, r) to color

TODO

Here’s your TODO list to complete before we meet again:

See you next class!

Sincerely,

P.S. It’s time for a haiku!

They say 1’s lonely
But they forget his neighbor
At index 0

P.P.S. Here’s the code we wrote together in class…

Benford.java

package lecture1106.cs145;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Benford {
  public static void main(String[] args) throws FileNotFoundException {
    File populationFile = new File("/Users/johnch/Desktop/populations_from_wikipedia.txt");
    Scanner in = new Scanner(populationFile);

    int[] leadingDigitCounts = new int[10];

    while (in.hasNext()) {
      String population = in.next();
      int leadingDigit = population.charAt(0) - '0';
      leadingDigitCounts[leadingDigit]++;
    }

    for (int i = 0; i < leadingDigitCounts.length; i++) {
      System.out.printf("%d: %s%n", i, "*".repeat(leadingDigitCounts[i]));
    }

    in.close();
  }
}

Sistagram.java

package lecture1106.cs145;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Sistagram {
  public static void main(String[] args) throws FileNotFoundException {
    File siblingsFile = new File("/Users/johnch/Desktop/siblings.txt");
    Scanner in = new Scanner(siblingsFile);

    int[] counts = new int[200];
//    int count0 = 0;
//    int count1 = 0;
//    int count2 = 0;
//    int count3 = 0;
//    int count4 = 0;
//    int count5 = 0;
//    int count6 = 0;
//    int count7 = 0;

    while (in.hasNextInt()) {
      int nsiblings = in.nextInt();
      counts[nsiblings]++;
    }
    in.close();

    for (int i = 0; i < counts.length; i++) {
      if (counts[i] > 0) {
        System.out.printf("%d -> %s%n", i, "*".repeat(counts[i]));
      }
    }
  }
}

Benford.java

package lecture1106.cs148;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Benford {
  public static void main(String[] args) throws FileNotFoundException {
    File popFile = new File("/Users/johnch/Desktop/populations_from_wikipedia.txt");
    Scanner in = new Scanner(popFile);

    int[] counts = new int[10];

    while (in.hasNext()) {
      String population = in.next();
      int digit = population.charAt(0) - '0';
      counts[digit]++;
    }

    in.close();

    for (int i = 0; i < counts.length; i++) {
      System.out.printf("%2d -> %s%n", i, "*".repeat(counts[i]));
    }
  }
}

Siblings.java

package lecture1106.cs148;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Siblings {
  public static void main(String[] args) throws FileNotFoundException {
    File siblingsFile = new File("/Users/johnch/Desktop/siblings.txt");
    Scanner in = new Scanner(siblingsFile);

//    int count0 = 0;
//    int count1 = 0;
//    int count2 = 0;
//    int count3 = 0;
//    int count4 = 0;
//    int count5 = 0;
//    int count6 = 0;
//    int count7 = 0;
//    int count8 = 0;
//    int count9 = 0;
//    int count10 = 0;
//    int count11 = 0;
//    int count12 = 0;
    int[] counts = new int[100];

    while (in.hasNextInt()) {
      int nsiblings = in.nextInt();
      counts[nsiblings]++;
    }

    in.close();

    for (int i = 0; i < counts.length; i++) {
      System.out.printf("%2d -> %s%n", i, "*".repeat(counts[i]));
    }
  }
}