CS 330 Lecture 6 – Asserting and Find-All
Dear students,
We will discuss three common operations we want to perform on text that are only fun when regular expressions are involved:
- Asserting that text matches a pattern.
- Finding all matches of a pattern in a document.
- Replacing all matches of a pattern with some other text.
Today we focus on the first two of these. We will meet the operators =~
and !~
and the method String.scan
. We will also capture portions of the matching text using the regex’s capturing facilities.
Let’s write regex to do the following:
- Assert that the user input is all letters.
- Assert that the user input contains at least two numbers.
- Assert that the user input has no whitespace.
- Assert that the user input is an HTML element.
- Assert that the user input is a binary string.
- Extract the index out of an array subscripting (e.g.,
5
incounts[5]
). - Tease apart the username and domain from an email address. Regex is probably overkill for this problem.
- Extract the URL from an
img
element. - Print the name of the month given a date like
MM/DD/YYYY
. - The name of the primary class of a Java source file fed in as standard input.
- Identify all the fields of study listed in a dictionary—the -ology, -nomy, and -nomics words.
- List all the identifiers in a file.
- List all the string literals in a file.
Here’s your TODO list for next time:
- Solve this puzzle on LinkedIn. You may wish to practice at regexcrossword.com. Write your solution on a quarter sheet to be turned in at the beginning of next class.
- For an extra credit participation point, create your own non-trivial 5×5 regex crossword on regexcrossword.com and share a link on Piazza.
Sincerely,
all_letters.rb
#!/usr/bin/env ruby input = gets.chomp if input =~ /^[a-zA-Z]+$/ puts 'all letters' else puts 'No good, foobag.' end
two_numbers.rb
#!/usr/bin/env ruby input = gets.chomp if input =~ /\d.*\d/ puts 'two numbers' else puts 'No good, foobag.' end
no_whitespace.rb
#!/usr/bin/env ruby input = gets.chomp if input !~ /\s/ puts 'no whitespace' else puts 'No good, foobag.' end
html_element.rb
#!/usr/bin/env ruby input = gets.chomp if input =~ /^<[^<>]+>$/ puts 'html element' else puts 'No good, foobag.' end
email.rb
#!/usr/bin/env ruby input = gets.chomp input =~ /(.*)@(.*)/ puts $1 puts $2 # input =~ /(.*.*)/
month.rb
#!/usr/bin/env ruby input = gets.chomp months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"] input =~ %r{^(\d\d)/\d\d/\d\d\d\d$} imonth = $1.to_i puts "imonth: #{imonth}" puts months[imonth - 1]