CS 430: Lecture 4 - Types

Dear students:

Your reading was on types, and today we'll work together some type-related exercises in Ruby. We'll implement a lightweight object-relational mapping that lets us migrate types from one system to another. Then we'll examine a type that is found in Ruby, but not many other languages: regular expressions. But first, let's engage in a little discussion.


Suppose you check in on PL Twitter and find that someone has posted the following:

Static typing will deliver us...into the 1980s.

In what ways do you agree with this post? In what ways do you disagree?

You scroll a bit farther and see someone post this claim:

Hashes > classes.

In what ways do you agree with this post? In what ways do you disagree?

Object-relational Mapping

We will examine metaprogramming in both Ruby. In particular, we will construct a lightweight system for object-relational mapping (ORM). In an ORM, you've got a representation of some data type in one system, and you'd like to use that same type in another system. Certainly you could specify the type twice, but that's prone to error. The two specifications will inevitably become unsynchronized.

ORM is often used with databases and the languages that access them. You might write a schema to describe a table in SQL, and you want the code to automatically understand the schema without you having to do any extra work. A full ORM will support actions like serializing and deserializing across systems and automatically querying to fetch related records. Our mapping will be less powerful, only wrapping around a JSON file:

  "headline": "Americans Scroll 2.5 Miles Per Day",
  "nwords": 849,
  "author": "Petey F.",
  "copy": "...",
  "tags": ["internet", "millenials"]

Out of the box, we can get reasonably close to turning this text into a Ruby object simply by parsing it using the JSON API:

json = File.read(ARGV[0])
article = JSON.parse(json, symbolize_names: true)
puts article[:author]

But this isn't really an object. It's a dictionary/hash/key-value pair manager. It'd be sweet if we could make Ruby behave more like Javascript, where there's an equivalence between dictionary lookup and field access:

var foo = {};
foo['name'] = 'Scout';

We'd like to make our Ruby ORM build structures that feel more like an object. We don't want to say this:

article[:author] = value

Instead, we want to say this:

article.author = value

Ruby's duck typing system makes this possible. We'll need a new class to be the foundation of our ORM:

class ExoObject
  # ...

What happens currently when we try to read a property of ExoObject?

article = ExoObject.new
puts article.author

When we run this code, we see that method author cannot be found. Is there any way that we can write a class that has methods for every field/property that our clients may want to assign? No way. We can't see that far into the future.

For this to happen, we need some metaprogramming. We need to generate these methods on the fly based on our schema. But first, let's write a minimal constructor for testing out some ideas:

def initialize(value = {})
  @properties = value

If the client provides no value, we'll just wrap around an empty dictionary, a blank data store.

Now, how do we handle all these infinite methods that are impossible to write? Easy. A catch-all method that Ruby will call on our objects when they don't support a method. It's called method_missing:

def method_missing(symbol, *args)
  # ...

One of the following must be true when this method is called:

  1. It might be a read operation for a key already in the dictionary.
  2. It might be a read operation for a key not already in the dictionary.
  3. It might be a write operation.

Let's handle the first case:

if @properties.has_key?(symbol)

If the key doesn't exist, let's raise an exception:

  raise "No such property: #{symbol.to_s}"

The last case is a bit more involved. We have to check if the method name suggests an assignment, but this needs to be done on the string version of the symbol:

elsif symbol.to_s.end_with?('=')
  @properties[symbol.to_s.chop.to_sym] = args[0]

Okay, let's test this out:

f = ExoObject.new(first: 'Roy', last: 'Biv')
f.middle = 'G'
puts "#{f.first} #{f.middle} #{f.last}"

Now let's get this to work with JSON data. Let's add a static method for loading an ExoObject from some other source:

def self.load src

If we have a URI or File, we'll open it and slurp up the JSON contents. Otherwise, we'll assume we have a JSON string. Once we know we have JSON, we can parse it:

if source.is_a?(URI)
  json = source.read
elsif source.is_a?(File)
  json = File.read(source)
elsif source.is_a?(String)
  json = source
  raise "Unknown source type."

ExoObject.new(JSON.parse(json, symbolize_names: true))

Now, let's try reading some literal JSON:

g = ExoObject.load '{"first": "Roy", "last": "Biv"}'
puts g.inspect

And some JSON from a web service:

todo = ExoObject.load(URI("https://jsonplaceholder.typicode.com/todos/1"))
puts todo.userId

Regular Expressions

Some languages contain a mini-language for building state machines like those used to form a lexer. They are called regular expressions. The name was chosen by the language theorists who categorized languages according to their expressive power. Regular languages are the boringest of the lot, but they are still useful.

These state machines have to be representable in serial text. That's a tall order, and the resulting syntax, which was popularized by Perl, is not intuitive. But it can be learned. People on the internet like to joke around that they are too difficult to learn, as if we don't do amazingly difficult things every day. Like walking.

The syntax can be broken down into three different types of symbols:

atoms quantifiers anchors
what to match how many to match where to match

These are the most common atoms that appear in regular expressions, which I will demonstrate in Vim, my preferred text editor:

symbol what to match
abcliteral text abc
.any single character
\wany single alphanumeric character or underscore
\dany single digit
\sany single whitespace
[abc]any single character that is a, b, or c
[^abc]any single character that is not a, b, or c
[A-Z]any uppercase letter
[a-z]any lowercase letter
[A-Za-z]any letter
a|ba or b
\Wany single non-alphanumeric character
\Dany single non-digit character
\Sany single non-whitespace character

We quantify how many times a preceding atom repeats with these quantifiers:

symbol how many to match
?0 or 1
*0 or more, as many as possible
+1 or more, as many as possible
*?0 or more, as few as possible
+?1 or more, as few as possible
{m}exactly m instances
{m,}at least m instances
{m,n}between m and n instances
{,n}no more than n instances

We prescribe where matches should occur within a string with these anchors:

symbol where to match
^at start of string or line
$at end of string or line
\bat word boundary
(?=abc)before abc
(?<=abc)after abc
(?!abc)not before abc
(?<!abc)not after abc

We use regular expressions for several tasks: validating input, finding matches in a body text, and substituting text in place of other text. We'll see how these tasks are completed in Ruby to meet these needs:


Here's your list of things to do before we meet next:

See you next time.


P.S. It's time for a haiku!

Black, woman, and young voter_t's no static type It got amended