CS 245 Lab 5 – Binary Search
First, if you have checkpoints left over from last lab, get them inspected during the first 15 minutes of this lab.
Work in pairs, where possible. Prefer working with someone that you did not work with last week. The exchange of new ideas and perspectives is not an opportunity you want to rob yourself of.
Synopsis
In this lab, you will gaze intently at the beauty and power of binary search—by writing a spellchecker. Your Dictionary
class will offer both linear and binary search to check if a word appears in a 600,000 element word list, and you will compare the two algorithms with an industrial-grade Stopwatch
class.
Checkpoint 1
Person A types.
Write a class Stopwatch
. Give it methods start
, stop
, and getElapsedSeconds
. Users wanting to time an operation will start the watch, perform the operation, stop it, and then query for the time that elapsed between start and stop. Likely, these methods are all one-liners.
Do not use System.currentTimeMillis
to get timing information, as this approach will also include time that your application was paused while some other application used the processor.
Instead, use ManagementFactory.getThreadMXBean
. The bean that it returns has a getCurrentThreadCpuTime
method, which returns the actual time your program has had on the CPU since it was started. Read the documentation for more details.
Also write a Dictionary
class. In its constructor, load into an ArrayList
the words in file W:/c s/johnch/words
, which has the words sorted, lowercase, and one per line.
Checkpoint 2
Person B types.
Add two methods to your Dictionary
class: isWordLinear
and isWordBinary
. (Note that these methods start with is-
, suggesting they return booleans.) They return true if the word sent in as a parameter appears in the dictionary. One uses linear search, the other uses binary search.
Create a main class with a main
method. Make instances of your Dictionary
and Stopwatch
classes. Grab a book off of Project Gutenberg and iterate through all of its words. Report which ones do not appear in the dictionary.
Scanner
makes it pretty easy to strip out all punctuation. Instead of having Scanner
break words up on whitespace, we’ll break words up on all non-alphabetic sequences. All we need is to set our Scanner
‘s delimiter to a special pattern—called a regular expression:
in.useDelimiter("[^A-Za-z]+");
[]
tells the Scanner
to match any character in the brackets, [^]
means match any character not in the brackets, and +
means match sequences of 1 or more of the preceding pattern. Effectively, we are stating our words are delimited by non-alphabetic sequences.
Time your iteration through the book. Try both the linear and binary checks. How do they compare?