CS 430: Lecture 3 - Radish Interpreter

Dear students:

Given how lab went last week, I think we need more practice with lexing, parsing, and grammars. So today we're going to write out our own little calculator language that can be used to add numbers together and shift them left or right—but there's a twist. The numbers can be in different bases.

Radish

We'll call the language Radish since it derives from the Latin word radix. The word itself means root. Here are some sample utterances in Radish:

56
56_16
1001001_2 > 10
1001_2 + 1 > 2
32 << 1
(100 >> 2) + 3

The language supports three operators: <<, >>, and +. It supports integer literals. By default these are in base 10, but an explicit alternative base can be specified with a _ suffix. The final result can be displayed in a particular base using the > output designator.

Lexing

We have these token types in Radish:

Only one of these is very interesting. We'll draw a state machine for it, and then write a lexer.

Grammar

The grammar is a useful tool that will help us think about our language. It determines the precedence and associativity of our operators. It does this by pushing higher-precedence operations farther down the parser. To make that happen we'll need the operators to be broken up into levels. The lowest precedence level will be represented by the non-terminal expr0.

A “program'' in Radish will consist of a single statement that has one of two forms:

statement -> expr0 > integer_literal
           | expr0

The left- and right-shift operators are the lowest-precedence operators, so they are produced by expr0. We have to be careful with these productions, because it's easy to make a flawed grammar. We have the following constrains:

All told, we form these productions:

expr0 -> expr0 << expr1
       | expr0 >> expr1
       | expr1

The next precedence level follows a similar structure:

expr1 -> expr1 + operand
       | operand

We've only got these two precedence levels. Between the operators we have the operands:

operand -> INTEGER
         | ( expr0 )

Writing the parser is a lot easier if we have first sketched a grammar.

Abstract Syntax Tree

But before we write a parser, we'll write these classes to represent our program structures:

Every program structure will have an evaluate method that we can use to execute the program. Using these structures, we could create a program without even writing Radish code:

e = ExpressionAdd.new(ExpressionInteger.new(10), ExpressionInteger.new(5))

But we don't want to assemble these structures manually. Assembling them is the job of the parser.

Parsing

The parser takes the tokens from the lexer and assembles an abstract syntax tree. A useful strategy is to turn the non-terminals of the grammar into the parser's methods:

class Parser
  def initialize
  end

  def statement
  end

  def expr0
  end

  def expr1
  end

  def operand
  end
end

We'll implement these methods from the bottom up. The productions will guide us. Whenever we have multiple productions, we'll need some sort of conditional statement that arbitrates between the choices by peeking at the next tokens.

The trickiest productions to implement will be the ones that include self-reference, like this one:

expr1 -> expr1 + operand
       | operand

This would be a bad way to implement the expr1 method:

def expr1
  left = expr1
  if has(:plus)
    consume
    right = operand
  end
  ...
end

The first line leads to infinite recursion. A better way is to turn the recursion into iteration. Ultimately, the recursion has to bottom out at an operand, so we call operand first and then collect up any + operations into a new structure:

def expr1
  left = operand
  while has(:plus)
    consume
    right = operand
    left = ExpressionAdd.new(left, right)
  end
  left
end

Something similar happens with the shift operators, but with an extra conditional statement to decide which AST node to instantiate.

TODO

Here's your list of things to do before we meet next:

See you next time.

Sincerely,

P.S. It's time for a haiku!

I made a language We didn't need a new one Nor the ones we had