We will discuss three common operations we want to perform on text that are only fun when regular expressions are involved:
Asserting that text matches a pattern.
Finding all matches of a pattern in a document.
Replacing all matches of a pattern with some other text.
Today we focus on the first two of these. We will meet the operators =~ and !~ and the method String.scan. We will also capture portions of the matching text using the regex’s capturing facilities.
Let’s write regex to do the following:
Assert that the user input is all letters.
Assert that the user input contains at least two numbers.
Assert that the user input has no whitespace.
Assert that the user input is an HTML element.
Assert that the user input is a binary string.
Extract the index out of an array subscripting (e.g., 5 in counts[5]).
Tease apart the username and domain from an email address. Regex is probably overkill for this problem.
Extract the URL from an img element.
Print the name of the month given a date like MM/DD/YYYY.
The name of the primary class of a Java source file fed in as standard input.
Identify all the fields of study listed in a dictionary—the -ology, -nomy, and -nomics words.
List all the identifiers in a file.
List all the string literals in a file.
Here’s your TODO list for next time:
Solve this puzzle on LinkedIn. You may wish to practice at regexcrossword.com. Write your solution on a quarter sheet to be turned in at the beginning of next class.
For an extra credit participation point, create your own non-trivial 5×5 regex crossword on regexcrossword.com and share a link on Piazza.