» CS 330: Lecture 6

CS 330: Lecture 6 – Lexing

February 9, 2018 by Chris Johnson. Filed under cs330, lectures, spring 2018.

Dear students,

Okay, with regexes and the notion of state machines tucked away in our brains, we’re now on a quest to design our own programming language. Usually you’re supposed to start this kind of learning by writing a calculator. I couldn’t bring us to do that. Instead we’re going to design a language for drawing art based on primitive shapes.

One shouldn’t design a language until one has already programmed in it. We’ll look at a few examples and try to agree upon a syntax. Then we’ll start to tackle interpreting programs written in this language. We break interpreting up into two steps:

Lexing, in which we decompose the source code into its “atoms.” The input is the source code. The output is a flat sequence of tokens.
Parsing, in which we construct a model of our program. The input is the stream of tokens. The output is a meaningful representation of the program. If we store this representation of our program to disk, we have a compiler. If we execute it immediately, we have an interpreter.

But today, we don’t deal with meaning. We just want to rip our program into a thousand tiny pieces. Meaningless pieces. But not individual characters. That’s too fine-grained. We want the tokens. What are tokens? Literals, identifiers, keywords, operators, and separators.

Consider this chunk of code:

if i % 2 == 0
  print 'He loves me.'
else
  print 'I love him.'
end

The code looks like this from a lexer’s point of view:

if i % 2 == 0\n\tprint 'He loves me.'\nelse\n\tprint 'I love him.'\nend

The lexer’s job is to run through and produce a collection like this:

IF if
IDENTIFIER i
OPERATOR %
INTEGER 2
OPERATOR ==
IDENTIFIER print
STRING 'He loves me.'
ELSE else
IDENTIFIER print
STRING 'I love him.'
END end

Here’s your TODO for next time:

Watch Stanford professor Alex Aiken’s The Economy of Programming Languages (20 minutes).
On a quarter sheet, make a prediction about you think the language landscape will look like in ten years. Use reason instead of arbitrary bias, and incorporate the ideas that Aiken discusses.

Sincerely,

P.S. Here’s the code we wrote together:

dt.rb

#!/usr/bin/env ruby

def jjeach list
  i = 0
  while i < list.size
    yield list[i]
    i += 1
  end
end

list = ['andy', 'crutches', 'why']
jjeach(list) do |item|
  puts item
end

# s = 'foo54-65capitalizm'

# s.gsub!(/(\d+)-(\d+)/) do |first, second|
  # puts "first: #{first}"
  # puts "second: #{second}"
# end

numbers

255
2556
100 + 89 * -32
1 2 3 4 5 6 7
20020 dog blue42

binbin.rb

#!/usr/bin/env ruby

src = File.read('numbers')

src.gsub!(/(?<!-)\b(\d|\d\d|[01]\d\d|2[0-4]\d|25[0-5])\b/) do
  $1.to_i.to_s(2) + 'b'
end

puts src