CS1: Lecture 28 – Indexed Data
Dear students,
We officially close out our discussion of the Computer as a Pilot, during which we made the computer navigate this way and that way and back again through our code using conditional statements and loops. But, like always, these ideas of branching and repetition will never leave us.
Computers make a lot of things possible. In particular, they can be used to…
- facilitate communication
- simulate
- deliver entertainment
- measure, record, and analyze data
- automate and speed up menial tasks
The last two of these are what we focus on next. We enter our treatment of the Computer as a Factory Worker. Up till now, our data has been fairly small and singular. We declared just a few variables in the programs we’ve written. Now we’ll begin to use data that can be numbered in the hundreds or thousands.
Dealing with Lots of Data
Let’s do a little data analysis. Let’s plot an ASCII histogram of how many siblings we have. We’ll first collect our numbers in a file, and then we’ll process it with something like this:
File file = new File("siblings.txt");
Scanner in = new Scanner(file);
while (in.hasNextInt()) {
int nsiblings = in.nextInt();
...
}
What exactly do we do with nsiblings
? Well, we want to know how many 0-sibling situations there are, how many 1-sibling situations, 2-sibling, and so on. We might make this first attempt:
int count0 = 0;
int count1 = 0;
int count2 = 0;
int count3 = 0;
int count4 = 0;
while (in.hasNextInt()) {
int nsiblings = in.nextInt();
if (nsiblings == 0) {
++count0;
} else if (nsiblings == 1) {
++count1;
} else if (nsiblings == 2) {
++count2;
} else if (nsiblings == 3) {
++count3;
} else if (nsiblings == 4) {
++count4;
}
}
Are there any drawbacks to this approach? I can think of a couple. First, that’s a lot of labor. Second, we’re not done. My mother-in-law had nine kids in her family. My father-in-law had eight. (My wife had three Uncle Daves and two Uncle Matts.) We’re going to need a few more variables. Ugh.
Array as an Indexed Collection
But wait, there’s a better way. Most decent programming languages provide a data type called an array. It lets you collection up a bunch of data under one name. We could create an array that held maybe 20 int
s. We do so like this:
int[] counts = new int[20];
In memory, we’ll get 20 int
s laid out in sequence. They will be automatically wiped to 0 by the Java virtual machine. The individual boxes within the array are just int
s. We can access one through an index:
System.out.println(counts[i]); // reading
counts[i] = 45; // writing
counts[i] += 1; // both
Where I have i
, you can put any integer expression: a literal, a variable, a method call, anything that yields an int
. What indices do you suppose are legal? Just as with String
, indices start at 0 and go up to 1 less than the length.
The flexibility of that name is what makes arrays so powerful. We can reduce our earlier code to this:
int[] counts = new int[20];
while (in.hasNextInt()) {
int nsiblings = in.nextInt();
++counts[nsiblings];
}
After we get all our tallies, it’s really easy to walk through the array and print out the results:
for (int i = 0; i < counts.length; ++i) {
System.out.println("%d -> %d", i, counts[i]);
}
Let’s look at another example. This time we’ll test the crazy Benford’s Law, which states leading digits of numbers are not uniformly distributed. We’ll examine the populations of cities across the US, and plot a histogram of how their leading digits are distributed.
Initializing an Array
In these examples, we built up the array as we processed the input. Sometimes we know what data should go in the array ahead of time. In such cases, there are two ways to initialize arrays:
- Allocate and assign individual elements:
String[] names = new String[7]; names[0] = "Sunday"; names[1] = "Monday"; names[2] = "Tuesday"; names[3] = "Wednesday"; names[4] = "Thursday"; names[5] = "Friday"; names[6] = "Saturday";
- Use an initializer list:
String[] names = {"Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"};
Usually the second of these is preferred.
Arrays can often replace decision structures that we write to simply choose between data. For instance, we might write a program that draws stripes across an image. Without arrays, we’d use an if
statement to pick our color:
create image for each row r for each column c if c % nstripes == 0 pick first color elsif c % nstripes == 1 pick second color elsif c % nstripes == 2 pick third color elsif ... ... else pick last color set pixel (c, r) to color
But with arrays, we can do a much simpler lookup into a collection of colors:
colors = [color0, color1, color2, ...] create image for each row r for each column c i = c % number of colors set pixel (c, r) to color
TODO
Here’s your TODO list to complete before we meet again:
- Read chapter 7 through section 7.2 in your book. Solve two problems from Array-1 and two problems from Array-2 on Coding Bat. On a quarter sheet, write down the names of the problems you solved and at least one of the solutions.
- CS 148, your lab is posted.
See you next class!
P.S. It’s time for a haiku!
They say 1’s lonely
But they forget his neighbor
At index 0
P.P.S. Here’s the code we wrote together in class…
Benford.java
package lecture1106.cs145;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Benford {
public static void main(String[] args) throws FileNotFoundException {
File populationFile = new File("/Users/johnch/Desktop/populations_from_wikipedia.txt");
Scanner in = new Scanner(populationFile);
int[] leadingDigitCounts = new int[10];
while (in.hasNext()) {
String population = in.next();
int leadingDigit = population.charAt(0) - '0';
leadingDigitCounts[leadingDigit]++;
}
for (int i = 0; i < leadingDigitCounts.length; i++) {
System.out.printf("%d: %s%n", i, "*".repeat(leadingDigitCounts[i]));
}
in.close();
}
}
Sistagram.java
package lecture1106.cs145;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Sistagram {
public static void main(String[] args) throws FileNotFoundException {
File siblingsFile = new File("/Users/johnch/Desktop/siblings.txt");
Scanner in = new Scanner(siblingsFile);
int[] counts = new int[200];
// int count0 = 0;
// int count1 = 0;
// int count2 = 0;
// int count3 = 0;
// int count4 = 0;
// int count5 = 0;
// int count6 = 0;
// int count7 = 0;
while (in.hasNextInt()) {
int nsiblings = in.nextInt();
counts[nsiblings]++;
}
in.close();
for (int i = 0; i < counts.length; i++) {
if (counts[i] > 0) {
System.out.printf("%d -> %s%n", i, "*".repeat(counts[i]));
}
}
}
}
Benford.java
package lecture1106.cs148;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Benford {
public static void main(String[] args) throws FileNotFoundException {
File popFile = new File("/Users/johnch/Desktop/populations_from_wikipedia.txt");
Scanner in = new Scanner(popFile);
int[] counts = new int[10];
while (in.hasNext()) {
String population = in.next();
int digit = population.charAt(0) - '0';
counts[digit]++;
}
in.close();
for (int i = 0; i < counts.length; i++) {
System.out.printf("%2d -> %s%n", i, "*".repeat(counts[i]));
}
}
}
Siblings.java
package lecture1106.cs148;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Siblings {
public static void main(String[] args) throws FileNotFoundException {
File siblingsFile = new File("/Users/johnch/Desktop/siblings.txt");
Scanner in = new Scanner(siblingsFile);
// int count0 = 0;
// int count1 = 0;
// int count2 = 0;
// int count3 = 0;
// int count4 = 0;
// int count5 = 0;
// int count6 = 0;
// int count7 = 0;
// int count8 = 0;
// int count9 = 0;
// int count10 = 0;
// int count11 = 0;
// int count12 = 0;
int[] counts = new int[100];
while (in.hasNextInt()) {
int nsiblings = in.nextInt();
counts[nsiblings]++;
}
in.close();
for (int i = 0; i < counts.length; i++) {
System.out.printf("%2d -> %s%n", i, "*".repeat(counts[i]));
}
}
}