CS 330 Lecture 5 – Regex
Agenda
- what ?s
- regex anatomy
- regex operations
- assert match (=~)
- find match (capturing groups)
- find all matches (scan, grep)
- replace matches (sub, gsub)
TODO
- Walk through RegexOne.
- On a 1/4 sheet, draft some regular expressions (and just the expressions) that match the following:
- Lines ending with a hyphenated word.
- Words with an internal uppercase letter.
- Lines lacking a semi-colon at their close. Don’t match lines that have a semi-colon followed by whitespace.
- Instances of the identifier
i
. Don’t match other occurrences ofi
.
- Start the regexercise homework. Pull from both repos! Due before February 18.
Note
With a little Ruby under our belt, we can now dive into talking about regular expressions or regex. What is regex? A language for describing patterns in strings. A simple find/replace of literal text is not sufficient for problems where we don’t know exactly what we’re looking for. For instance, on my phone, I am always hitting the period key instead of the space key, yielding text like “The.quick.brown.fox.” It’d be nice to be able to locate these faux periods and replace them with a space. If all I can do is search for literal text, I have a lot of work cut out for me.
Why study regex in a Programming Languages class? I can think of a few reasons:
- Regexes are frequently used by tools like compilers and syntax highlighters to break up source code into its parts and pieces.
- The general goal of most programs is to set up a software machine that translates input into some sort of output. The information age that we live in means that a lot of this input is big and clunky, and regex is a tool for parsing that input into workable units.
- Personally, I use regex quite a bit in editing my own source code.
Regexes can be broken down according to the following anatomy:
- what to match (literals, character classes, wildcards)
- how many to match (quantifiers)
- where to match (anchors)
We’ll solve a few problems that benefit from regex! Today we’ll stick with just asserting that text matches a pattern and optionally capturing portions of the matching text.
Code
isidentifier
/usr/lib/ruby/2.7.0/rubygems/dependency.rb:311:in `to_specs': Could not find 'coderay' (>= 0) among 56 total gem(s) (Gem::MissingSpecError) Checked in 'GEM_PATH=/.gem/ruby/2.7.0:/var/lib/gems/2.7.0:/usr/lib/ruby/gems/2.7.0:/usr/share/rubygems-integration/2.7.0:/usr/share/rubygems-integration/all:/usr/lib/x86_64-linux-gnu/rubygems-integration/2.7.0:/home/johnch/.gems', execute `gem env` for more information from /usr/lib/ruby/2.7.0/rubygems/dependency.rb:323:in `to_spec' from /usr/lib/ruby/2.7.0/rubygems/core_ext/kernel_gem.rb:62:in `gem' from ./coderay:24:in `'
govts
/usr/lib/ruby/2.7.0/rubygems/dependency.rb:311:in `to_specs': Could not find 'coderay' (>= 0) among 56 total gem(s) (Gem::MissingSpecError) Checked in 'GEM_PATH=/.gem/ruby/2.7.0:/var/lib/gems/2.7.0:/usr/lib/ruby/gems/2.7.0:/usr/share/rubygems-integration/2.7.0:/usr/share/rubygems-integration/all:/usr/lib/x86_64-linux-gnu/rubygems-integration/2.7.0:/home/johnch/.gems', execute `gem env` for more information from /usr/lib/ruby/2.7.0/rubygems/dependency.rb:323:in `to_spec' from /usr/lib/ruby/2.7.0/rubygems/core_ext/kernel_gem.rb:62:in `gem' from ./coderay:24:in `'
emails
/usr/lib/ruby/2.7.0/rubygems/dependency.rb:311:in `to_specs': Could not find 'coderay' (>= 0) among 56 total gem(s) (Gem::MissingSpecError) Checked in 'GEM_PATH=/.gem/ruby/2.7.0:/var/lib/gems/2.7.0:/usr/lib/ruby/gems/2.7.0:/usr/share/rubygems-integration/2.7.0:/usr/share/rubygems-integration/all:/usr/lib/x86_64-linux-gnu/rubygems-integration/2.7.0:/home/johnch/.gems', execute `gem env` for more information from /usr/lib/ruby/2.7.0/rubygems/dependency.rb:323:in `to_spec' from /usr/lib/ruby/2.7.0/rubygems/core_ext/kernel_gem.rb:62:in `gem' from ./coderay:24:in `'