teaching machines

CS 245 Lab 5 – Binary Search

February 22, 2014 by . Filed under cs245, labs, spring 2014.

First, if you have checkpoints left over from last lab, get them inspected during the first 15 minutes of this lab.

Work in pairs, where possible. Prefer working with someone that you did not work with last week. The exchange of new ideas and perspectives is not an opportunity you want to rob yourself of.

Synopsis

In this lab, you will gaze intently at the beauty and power of binary search—by writing a spellchecker. Your Dictionary class will offer both linear and binary search to check if a word appears in a 600,000 element word list, and you will compare the two algorithms with an industrial-grade Stopwatch class.

Checkpoint 1

Person A types.

Write a class Stopwatch. Give it methods start, stop, and getElapsedSeconds. Users wanting to time an operation will start the watch, perform the operation, stop it, and then query for the time that elapsed between start and stop. Likely, these methods are all one-liners.

Do not use System.currentTimeMillis to get timing information, as this approach will also include time that your application was paused while some other application used the processor.

Instead, use ManagementFactory.getThreadMXBean. The bean that it returns has a getCurrentThreadCpuTime method, which returns the actual time your program has had on the CPU since it was started. Read the documentation for more details.

Also write a Dictionary class. In its constructor, load into an ArrayList the words in file W:/c s/johnch/words, which has the words sorted, lowercase, and one per line.

Checkpoint 2

Person B types.

Add two methods to your Dictionary class: isWordLinear and isWordBinary. (Note that these methods start with is-, suggesting they return booleans.) They return true if the word sent in as a parameter appears in the dictionary. One uses linear search, the other uses binary search.

Create a main class with a main method. Make instances of your Dictionary and Stopwatch classes. Grab a book off of Project Gutenberg and iterate through all of its words. Report which ones do not appear in the dictionary.

Scanner makes it pretty easy to strip out all punctuation. Instead of having Scanner break words up on whitespace, we’ll break words up on all non-alphabetic sequences. All we need is to set our  Scanner‘s delimiter to a special pattern—called a regular expression:

in.useDelimiter("[^A-Za-z]+");

[] tells the Scanner to match any character in the brackets, [^] means match any character not in the brackets, and + means match sequences of 1 or more of the preceding pattern. Effectively, we are stating our words are delimited by non-alphabetic sequences.

Time your iteration through the book. Try both the linear and binary checks. How do they compare?