teaching machines

Keystrokes Patcher

June 3, 2013 by . Filed under keystrokes, public.

For our Keystrokes project, we grab “frames” for our text movie as snapshots of the file being edited. We don’t really want to pass the complete file around for every snapshot, so instead we store diffs between frames that allow us to reconstruct any particular frame beyond the first.

To produce frame 2, we apply diff 2 to frame 1. To produce frame 3, we apply diff 3 to (diff 2 to frame 1). To produce frame N, we apply diff N to (diff N – 1 to (… to frame 1)).

Since we take snapshots at every cursor change, the diffs are likely to be quite small, often a line or two. Grabbing a text movie from the web should be much faster compared to grabbing the complete set of snapshots.

The reconstruction process involves parsing the diff file and applying it just like the Unix patch utility does. What follows is my attempt at understanding the diff protocol and reconstructing patch. The Patcher class takes a complete frame and a patch (both stored as String lists to make indexing into the files easier) and applies the patch to produce the next complete frame.

package org.twodee.undiff;

import java.io.File;
import java.io.FileInputStream;
import java.io.FilenameFilter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.twodee.utilities.Utilities;

/**
 Patches a source text modified by changes, deletes, and appends.
 */
public class Patcher {
  private static final Pattern TRANSACTION_PATTERN = Pattern.compile("(\\d+)(?:,(\\d+))?([acd])(\\d+)(?:,(\\d+))?");

  private ArrayList<String> pre;
  private ArrayList<String> patch;
  private int iPatchLine;
  private int iPreLine;
  private ArrayList<String> post;

  /**
   Applies a patch to a source text.
   @param pre The source text, given as a list of lines.
   @param patch The patch text as produced by diff, given as a list of lines.
   @return The destination text, given as a list of lines.
   */
  public static ArrayList<String> patch(ArrayList<String> pre, ArrayList<String> patch) {
    Patcher patcher = new Patcher(pre, patch);
    patcher.apply();
    return patcher.post;
  }

  /*
   Not meant for public use.
   */
  private Patcher(ArrayList<String> pre, ArrayList<String> patch) {
    this.pre = pre;
    this.patch = patch;
  }

  private void apply() {
    // We start at the beginning of the patch and the pre. Post gets built up
    // as we advance through the patch, but it starts off empty.
    iPatchLine = 0;
    iPreLine = 0;
    post = new ArrayList<String>();

    // Read and apply the patch, one transaction at a time. Since not every
    // line may be altered by the transactions, before we apply a transaction,
    // we make sure all the lines previous to the lines the transaction affects
    // get copied over from pre to post.
    while (hasTransaction()) {
      Transaction transaction = readTransaction();
      appendPreLinesToPost(transaction.getPreStart());
      transaction.execute();
      iPreLine = transaction.getPreEnd() + 1;
    }

    // We also want to copy any lines after the last transaction left off up
    // to the end of the file.
    appendPreLinesToPost(pre.size());
  }

  /*
   Determines whether the patch contains another transaction to process. If
   there are any unvisited lines in the patch, we have more transactions.
   */
  private boolean hasTransaction() {
    return iPatchLine < patch.size();
  }

  /*
   Appends lines from pre to post.
   */
  private void appendPreLinesToPost(int to) {
    for (; iPreLine < to; ++iPreLine) {
      post.add(pre.get(iPreLine));
    }
  }

  /*
   Gets the next transaction that is to be executed. It does not execute the
   transaction. It does not advance iPatchLine.
   */
  private Transaction readTransaction() {
    Matcher matcher = TRANSACTION_PATTERN.matcher(patch.get(iPatchLine));

    if (!matcher.find()) {
      throw new RuntimeException("Bad transaction type");
    }

    Transaction transaction;
    String command = matcher.group(3);
    if (command.equals("a")) {
      transaction = new AppendTransaction();
    } else if (command.equals("c")) {
      transaction = new ChangeTransaction();
    } else {
      transaction = new DeleteTransaction();
    }

    int preStart = Integer.parseInt(matcher.group(1)) - 1;
    int preEnd = matcher.group(2) == null ? preStart : Integer.parseInt(matcher.group(2)) - 1;
    int postStart = Integer.parseInt(matcher.group(4)) - 1;
    int postEnd = matcher.group(5) == null ? postStart : Integer.parseInt(matcher.group(5)) - 1;
    transaction.setBounds(preStart, preEnd, postStart, postEnd);

    return transaction;
  }

  /*
   Represents a transaction, which turns ranges of lines in pre to ranges of
   lines in post.
   */
  private abstract class Transaction {
    public int preStart;
    public int preEnd;
    public int postStart;
    public int postEnd;

    public void setBounds(int preStart, int preEnd, int postStart, int postEnd) {
      this.preStart = preStart;
      this.preEnd = preEnd;
      this.postStart = postStart;
      this.postEnd = postEnd;
    }

    public int getPreStart() {
      return preStart;
    }

    public int getPreEnd() {
      return preEnd;
    }

    /*
     Executes this transaction and puts the patch line cursor at the start of
     the next transaction. After this method, we are ready to read the next
     transaction from the patch.
     */
    public abstract void execute();
  }

  /*
   Changes the specified pre lines into the specified post lines. A change
   transaction has the following representation in the patch:

     9c7,8
     < first line replace
     ---
     > first line to insert
     > second line to insert
   */
  private class ChangeTransaction extends Transaction {
    public void execute() {
      // Move past the transaction header, the pre lines on the chopping block,
      // and the --- separator.
      iPatchLine += (preEnd - preStart + 1) + 2;

      // Read each line to append from the patch, removing the leading "> ",
      // and insert the line into post.
      for (int i = postStart; i <= postEnd; ++i) {
        post.add(patch.get(iPatchLine).substring(2));
        ++iPatchLine;
      }
    }
  }

  /*
   Appends the enumerated lines into post. An append transaction has the
   following representation in the patch:

     9a10,11                                                                                                                            
     > first line to append
     > second line to append
   */
  private class AppendTransaction extends Transaction {
    public void execute() {
      // Move past the transaction header line.
      ++iPatchLine;

      // The transaction is responsible for dealing with pre lines in
      // [preStart, preEnd]. On append, only one pre line is in this range,
      // which we simply copy over.
      appendPreLinesToPost(preStart + 1);

      // Read each line to append from the patch, removing the leading "> ",
      // and insert the line into post.
      for (int i = postStart; i <= postEnd; ++i) {
        post.add(patch.get(iPatchLine).substring(2));
        ++iPatchLine;
      }
    }
  }

  /*
   Advances past all the deleted pre lines in this transaction in the patch.
   We don't really need to do anything more, since the lines in this
   transaction are not to appear in post.

   A delete transaction that removes two lines has this representation in the
   patch:

     1,2d0
     < first line to be removed
     < second line to be removed

   We move past the transaction line and the two pre lines, so that the next
   thing we read will be the transaction line.
   */
  private class DeleteTransaction extends Transaction {
    public void execute() {
      iPatchLine += (preEnd - preStart + 1) + 1;
    }
  }
}

My primary goal was to keep the methods short. It didn’t really make sense to create a stateful object, since Patcher has no meaningful lifespan. Thus, I wanted a static method. The code was too long for a single static method. Breaking it up meant passing a bunch of parameters around to the helper methods, and that didn’t sound fun. The solution was to make all instance-related tasks private. That way I could safely store the “global” data in instance variables that I didn’t have to pass around. (This is safe because I’m the only that can alter the Patcher instance I create.) I’m not sure I like the design, but it does have minimal conditional statements and the methods are pretty short. They’re not really individually testable.