CS 430: Lecture 3 - Radish Interpreter
Given how lab went last week, I think we need more practice with lexing, parsing, and grammars. So today we're going to write out our own little calculator language that can be used to add numbers together and shift them left or right—but there's a twist. The numbers can be in different bases.
We'll call the language Radish since it derives from the Latin word radix. The word itself means root. Here are some sample utterances in Radish:
56 56_16 1001001_2 > 10 1001_2 + 1 > 2 32 << 1 (100 >> 2) + 3
The language supports three operators:
+. It supports integer literals. By default these are in base 10, but an explicit alternative base can be specified with a
_ suffix. The final result can be displayed in a particular base using the
> output designator.
We have these token types in Radish:
- integer literals with optional base
Only one of these is very interesting. We'll draw a state machine for it, and then write a lexer.
The grammar is a useful tool that will help us think about our language. It determines the precedence and associativity of our operators. It does this by pushing higher-precedence operations farther down the parser. To make that happen we'll need the operators to be broken up into levels. The lowest precedence level will be represented by the non-terminal
A “program” in Radish will consist of a single statement that has one of two forms:
statement -> expr0 > integer_literal | expr0
The left- and right-shift operators are the lowest-precedence operators, so they are produced by
expr0. We have to be careful with these productions, because it's easy to make a flawed grammar. We have the following constrains:
The grammar should support recursive formations like
512 >> 1 >> 1. That means the productions should include a self-reference somewhere.
If we have a production like
expr0 >> expr0, we allow recursion, but we also introduce ambiguity. We could parse
512 >> 1 >> 1in two different ways.
We want to limit the recursion to one operand. Because we want the leftmost occurrence of
>>to have higher precedence than any
>>to its right, we make the left operand recursive. That leads to the production
expr0 >> expr1.
All told, we form these productions:
expr0 -> expr0 << expr1 | expr0 >> expr1 | expr1
The next precedence level follows a similar structure:
expr1 -> expr1 + operand | operand
We've only got these two precedence levels. Between the operators we have the operands:
operand -> INTEGER | ( expr0 )
Writing the parser is a lot easier if we have first sketched a grammar.
Abstract Syntax Tree
But before we write a parser, we'll write these classes to represent our program structures:
Every program structure will have an
evaluate method that we can use to execute the program. Using these structures, we could create a program without even writing Radish code:
e = ExpressionAdd.new(ExpressionInteger.new(10), ExpressionInteger.new(5))
But we don't want to assemble these structures manually. Assembling them is the job of the parser.
The parser takes the tokens from the lexer and assembles an abstract syntax tree. A useful strategy is to turn the non-terminals of the grammar into the parser's methods:
class Parser def initialize end def statement end def expr0 end def expr1 end def operand end end
We'll implement these methods from the bottom up. The productions will guide us. Whenever we have multiple productions, we'll need some sort of conditional statement that arbitrates between the choices by peeking at the next tokens.
The trickiest productions to implement will be the ones that include self-reference, like this one:
expr1 -> expr1 + operand | operand
This would be a bad way to implement the
def expr1 left = expr1 if has(:plus) consume right = operand end ... end
The first line leads to infinite recursion. A better way is to turn the recursion into iteration. Ultimately, the recursion has to bottom out at an operand, so we call
operand first and then collect up any
+ operations into a new structure:
def expr1 left = operand while has(:plus) consume right = operand left = ExpressionAdd.new(left, right) end left end
Something similar happens with the shift operators, but with an extra conditional statement to decide which AST node to instantiate.
Here's your list of things to do before we meet next:
See you next time.
P.S. It's time for a haiku!