teaching machines

CS 330: Lecture 6 – Lexing

February 9, 2018 by . Filed under cs330, lectures, spring 2018.

Dear students,

Okay, with regexes and the notion of state machines tucked away in our brains, we’re now on a quest to design our own programming language. Usually you’re supposed to start this kind of learning by writing a calculator. I couldn’t bring us to do that. Instead we’re going to design a language for drawing art based on primitive shapes.

One shouldn’t design a language until one has already programmed in it. We’ll look at a few examples and try to agree upon a syntax. Then we’ll start to tackle interpreting programs written in this language. We break interpreting up into two steps:

  1. Lexing, in which we decompose the source code into its “atoms.” The input is the source code. The output is a flat sequence of tokens.
  2. Parsing, in which we construct a model of our program. The input is the stream of tokens. The output is a meaningful representation of the program. If we store this representation of our program to disk, we have a compiler. If we execute it immediately, we have an interpreter.

But today, we don’t deal with meaning. We just want to rip our program into a thousand tiny pieces. Meaningless pieces. But not individual characters. That’s too fine-grained. We want the tokens. What are tokens? Literals, identifiers, keywords, operators, and separators.

Consider this chunk of code:

if i % 2 == 0
  print 'He loves me.'
else
  print 'I love him.'
end

The code looks like this from a lexer’s point of view:

if i % 2 == 0\n\tprint 'He loves me.'\nelse\n\tprint 'I love him.'\nend

The lexer’s job is to run through and produce a collection like this:

IF if
IDENTIFIER i
OPERATOR %
INTEGER 2
OPERATOR ==
IDENTIFIER print
STRING 'He loves me.'
ELSE else
IDENTIFIER print
STRING 'I love him.'
END end

Here’s your TODO for next time:

Sincerely,

P.S. Here’s the code we wrote together:

dt.rb

#!/usr/bin/env ruby

def jjeach list
  i = 0
  while i < list.size
    yield list[i]
    i += 1
  end
end

list = ['andy', 'crutches', 'why']
jjeach(list) do |item|
  puts item
end

# s = 'foo54-65capitalizm'

# s.gsub!(/(\d+)-(\d+)/) do |first, second|
  # puts "first: #{first}"
  # puts "second: #{second}"
# end

numbers

255
2556
100 + 89 * -32
1 2 3 4 5 6 7
20020 dog blue42

binbin.rb

#!/usr/bin/env ruby

src = File.read('numbers')

src.gsub!(/(?<!-)\b(\d|\d\d|[01]\d\d|2[0-4]\d|25[0-5])\b/) do
  $1.to_i.to_s(2) + 'b'
end

puts src