teaching machines

SpecCheck talk at ITiCSE 2012

June 27, 2012 by . Filed under public, specchecker.

This is the draft of a presentation I’m preparing to give on some of my work. This text is meant to accompany my slides.

Premise

This work is rooted on a single premise: grading code for two hundred first-year computer science students is painful. It’s painful for two reasons. First, two hundred of anything is a force to be reckoned with. Two hundred freshmen especially so. Spending even just five minutes per assignment requires 16 and 2/3 person-hours of grading. To give meaningful feedback on students’ deviations and mistakes, five minutes is surely not enough. The second reason for pain is that it’s code we are grading. In theory, grading it should be pretty objective. But students don’t follow the directions. They solve different problems than the ones I specified. If I graded students’ code according to the students’ ability to follow directions, I’d lose half my class.

Solutions

While I was teaching at Iowa State University, I thought a little bit about how to fix this problem of grading code in my large lecture classes. Here’s the solution spectrum I saw:

  1. Assign few homeworks.
  2. Distribute the grading burden.
  3. Autograde.

Really, my dream was for students to turn in code that I could grade quickly. Option one is bad for students; it reduces their “doing.” Option two is bad for budgets (TAs cost money, and money is expensive). And option three can be helpful, but when furnished with a black-box checker, we students sometimes stop thinking and code to the grader instead. I wanted to find a different happy medium, one that would allow me to assign frequent homeworks but would make grading easier without shutting students’ brains off.

Reflection

It was at that time that I discovered Java’s ability to programmatically inspect itself. In particular, I saw that one could query all the methods of a class and find information about modifiers like public and static, about parameters, about return types, and about exceptions. Reflection seemed like it might be the tool for the job of making grading easier. For example, I could check for a String-int constructor in class Hero with this test:

try {
  Class<?>[] params = {
    String.class,
    int.class
  };
  Class<?> clazz = Class.forName("Hero");
  clazz.getDeclaredConstructor(params);
} catch (NoSuchMethodException e) {
  fail("You need a constructor in class " +
       "Hero taking 2 arguments, having " +
       "types String and int.");
}

And I could assert a superclass with this test:

Class<?> clazz = Class.forName("Hero");
assertEquals("The superclass of class Hero is not correct.",
             Character.class,
             clazz.getSuperclass());

After writing tests like this for all the specified constructs of my assignment, I could then distribute my tests to the students. They’d run them and see where they are deviating from the specification, changing names, adding parameters, failing to return a value, and so on.

Annotations

The only obstacle I saw was that I didn’t want to write these reflection tests myself. The code is ugly, potential exceptions are numerous, and their generation is mind-numbing. But reading documentation paid off: I discovered in the reflection API that I could query whether a method had a certain Java annotation.

Annotations in Java are little information-bearing tags that one can use to communicate signals to systems interpreting the code. For my purposes, I initially just needed to mark methods and fields as specified, so I created a very simple annotation, @Specified, and used it like so:

@Specified
public int getPi() {
  return 3;
}

With the reflection and annotations, I devised SpecCheck, a system for automatically generating tests for specification conformance. The generation abstracts to a pretty simple algorithm:

for each specified class
  for each element
    if element is @Specified
      generate test for existence
      generate test for modifiers

Example

Let’s work through an example to see how SpecCheck fits into the homework pipeline. Suppose I provide the following specification to my students:

Write a class Character for a game, having

  1. a constant double named DEFAULT_GOLD,
  2. a constructor taking a String name as its sole parameter,
  3. and a method getHitPoints returning the character’s hit points as an int.

The first thing to do as an instructor is to solve the homework myself. You really should be doing this whether you adopt SpecChecker or not. Here’s my solution:

class Character {
  public static double DEFAULT_GOLD = 100.0;

  private String name;
  private int hitPoints;

  public Character(String name) {
    this.name = name;
  }

  public int getHitPoints() {
    return hitPoints;
  }
}

Now, let’s annotate the specified parts.

class Character {
  @Specified
  public static double DEFAULT_GOLD = 100.0;

  private String name;
  private int hitPoints;

  @Specified
  public Character(String name) {
    this.name = name;
  }

  @Specified
  public int getHitPoints() {
    return hitPoints;
  }
}

So far the extra work of using SpecChecker is 30 seconds annotating. Now let’s generate the tests by calling on the generator:

SpecCheckGenerator.testClasses(Character.class);

The generated code is our suite of tests. We insert it into our tester class, which may contain hand-written functional tests, and distribute it to students.

Currently, I use JUnit for writing tests. This means the students must add JUnit to their classpath. If this is an issue, it’s certainly possible to eliminate this dependency by writing your own testing package with the exact same interface. Also, generated code tends to be overwhelming, even to me. I ship my SpecCheckers to students bundled in JAR files, which are easy to drag, drop, and run in their IDE.

Let’s now pretend we are a struggling but persistent student by running the SpecChecker tests under various conditions:

  1. without having added JUnit
  2. without having written any code
  3. with having written only an empty class
  4. with DEFAULT_GOLD private
  5. with a constructor having an extra parameter
  6. with a wrong modifier
  7. with a wrong return type

[test code with SpecCheck-generated tests]

I hope you can agree that there was little extra work for the instructor here, and that the student can root out deviations without too much hassle.

Design philosophy

I built SpecCheck with a narrow focus and strived to uphold these three points in its development:

  1. SpecCheckers share nothing not already published in the PDF. Students don’t read, but they are frightened by the red in the Eclipse console.
  2. Limited hand-holding. No, “Oh, you’ve got an extra parameter on your method A.”
  3. Condition student code. Turn grade-time errors into develop-time errors.

Extras in @Specified

I found it possible to check a few other useful things in students’ code. Annotations can be parameterized, and I added these parameters to @Specified to examine aspects beyond methods and fields:

  1. int maxVariableCount() default -1;

    Limits the number of instance variables. Tells students, “favor locals.” This violates the separation between interface and implementation.

  2. boolean allowUnspecifiedPublicStuff() default false;

    Allow a student-defined unspecified public interface?

  3. boolean allowUnspecifiedPublicConstants() default false;

    Maybe allow just student-defined constants?

  4. boolean checkSuper() default false;
    Class<?>[] mustImplement() default {};

    Does the student’s class have the right supertype(s)?

  5. Class<?>[] mustThrow() default {};
    Class<?>[] mustNotThrow() default {};

    Does the student’s method handle or not handle certain exceptions?

What students thought

[show some graphs]

Conclusion

  1. Success. I felt less guilty giving a bad grade.
  2. Success. Graders can write functional tests that link successfully.
  3. Success. Did I save any time grading? Not mine. Yours.
  4. Failure. Why do students still submit when no tests pass?
    Perhaps more effective in a signatures-first homework setup?
  5. Failure. Impact unmeasured.
    Does anyone want to set up a controlled experiment with me?