teaching machines

CS 330: Lecture 4 – Find and Replace

February 5, 2018 by . Filed under cs330, lectures, spring 2018.

Dear students,

Last week we started examining regex, a language for recognizing languages. We examined their syntax and theoretical background. I want to spend two more days discussing them. Today we look at several applications of them inspired by real-life needs that I’ve encountered:

In solving these challenges, we’re going to see a bit more of the Ruby language. We’ll see a few different methods for processing files and arrays.

Loading a file as an array of lines can be done using File.readlines:

lines = File.readlines('file.txt')

lines.each do |line|
  # process line
end

Or you can load the file into one string and call String.lines to break it up:

all = File.read('file.txt')

all.lines.each do |line|
  # process line
end

If you need line numbers, you can call each_with_index and add a parameter to your block:

lines.each_with_index do |line, i|
  # process line at index i
end

Suppose instead of just find text, we want to replace the text that we match. For that we can use String.gsub (for a global substitution) or String.sub (for a single substitution). gsub and sub return new strings, while gsub! and sub! modify the invoking strings. The substitution can be expressed several ways:

text.gsub!(/pattern/, 'replacement text')
text.gsub!(/pattern/, 'replacement \1 with captures \2')
text.gsub!(/pattern/, "replacement \\1 with captures \\2 and \n double quotes")
text.gsub!(/pattern/) do
  compute the replacement text, using $1, $2, ...
end

Let’s examine gsub by solving these challenges:

Here’s your TODO list for next time:

Sincerely,

P.S. It’s time for a haiku!

President Y’s plan
gsub X’s policies
With this: 'not \1'

P.P.S. Here’s the code we wrote together:

imgripper.rb

#!/usr/bin/env ruby

html = File.read(ARGV[0])

html.scan(/<img\s+.*src\s*=\s*"([^"]*)"/) do 
  url = $1 
  if url =~ %r{^//}
    url = "https:#{url}"
  end
  system("curl -O #{url}")
end

methods.rb

#!/usr/bin/env ruby

path = '/Users/johnch/checkouts/speccheck/src/org/twodee/speccheck/SpecChecker.java'
java = File.read(path)

# java.scan(/public\s+.*?(\w+)\(/) do
  # puts $1
# end

java.scan(/public\s+.*?(\w+)\(/).each_with_index do |name, i|
  puts "#{i}. #{name}"
end

unity.rb

#!/usr/bin/env ruby

id = 'isUnderSiege'
id = 'isFalse'
id = 'isOneWord'

# id.gsub(pattern, replacement)

first = id[0].upcase
rest = id[1..-1]

newID = first + rest.gsub(/([A-Z])/, ' \1')
puts newID