teaching machines

CS 330: Lecture 28 – Filter, Map, and Fold

April 16, 2018 by . Filed under cs330, lectures, spring 2018.

Dear students:

Last time we talked about lambdas and higher-order functions. We wrote this count method:

count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count predicate (first : rest)
  | predicate first = 1 + count predicate rest
  | otherwise = count predicate rest

Let’s start with a Program This that uses this function:

Partially apply count to define function nUpper. It accepts a list of Cartesian coordinate pairs. It returns the number of pairs whose x-coordinate is less than its y-coordinate. If you cut space along the x == y diagonal, such points are in the upper triangle.

If you finish that one, write another to count the number of short Strings in a list of Strings.

Here’s my solution.
nUpper :: count (\(a, b) -> a < b)
shorties :: count (\s -> length s < 5)

nUpper [(1, 2), (3, 3), (4, 5), (11, 10)]
shorties ["I", "think", "it's", "impossible", "to", "really", "understand", "somebody", "what", "they", "want", "what", "they", "believe", "and", "not", "love", "them", "the", "way", "they", "love", "themselves"]

Lambdas are pretty great, but be careful. What’s wrong with these?

count (\x -> x > 0) [-3..3]
count (\c -> isUpper c) "You Get What You Need"
count (\x -> x % 2 == 0) [-4..4]

All three of these are cases of UFW—unnecessary function wrapping. We have existing functions that do the same thing, and there’s no need to define a new one:

count (> 0) [-3..3]
count isUpper "You Get What You Need"
count even [-4..4]

Now, we’ve seen how easy it is to define functions on the fly and hand them off to other functions. Think about what this means for those tired old patterns we have been writing in our code. How many loops have you written that have the exact same structure, with the only difference being how you act on an element. Or how you combine it with the others. We can pull out those repeated parts into a function but push the parts that change into a parameter. That parameter has to be an action—a function.

We’ll look at two of those patterns today and implement their abstractions in Haskell. Eventually we will look at them in other languages.

Let’s start with filter. In SQL, this is the WHERE clause. We want to reduce a collection to some subset matching some criteria. That criteria is the hole that we must leave blank, but the looping structure is the same everytime:

keepers = []
for each item in list
  if keeper meets criteria
    append item to keepers

Does this algorithm look familiar to you? It should. It’s a lot like count. In Haskell, we can write it like so:

filter' :: (a -> Bool) -> [a] -> [a]
filter' _ [] = []
filter' predicate (first : rest)
  | predicate first = first : filter' predicate rest
  | otherwise = filter' predicate rest

Now let’s create a collection on which to test this. First we’ll make a new Planet datatype:

-- A planet is a 3-tuple of name, distance from sun in AU, and planet-Earth size ratio
data Planet = Planet String Double Double deriving Show

Then we can make a list:

planets = [
    Planet "Mercury" 0.4 0.055,
    Planet "Venus" 0.7 0.815,
    Planet "Earth" 1.0 1.0,
    Planet "Mars" 1.5 0.107,
    Planet "Jupiter" 5.2 318.0,
    Planet "Saturn" 9.5 95.0,
    Planet "Uranus" 19.6 14.0,
    Planet "Neptune" 30.0 17.0
  ]

We can grab a list of all the planets with more mass than Earth:

massierThanEarth (Planet name, au, mass) = mass > 1.0
massiers = filter' biggerThanEarth planets
-- or using a lambda
massiers' = filter' (\(Planet _ _ mass) -> mass > 1.0) planets

We can grab a list of all the planets inside the asteroid belt, whose inner radius is about 2.2 AUs:

prebelters = filter' (\(Planet _ au _) -> au < 2.2) planets

We can grab a list of all the planets that have a U in their name:

us = filter' (\(Planet name _ _) -> elem 'u' name) planets

The key idea here is that we wrote filter' once. We wrote the iteration once. We never have to write it again to filter a list. We need only jam a lambda into the hole it leaves open in the algorithm: the predicate.

The next pattern we’ll visit is map. It abstracts this general algorithm:

news = array same size as olds
for each item in olds
  news[i] = transform olds[i]
return news

What does this look like in Haskell?

map' :: (a -> b) -> [a] -> [b]
map' _ [] = []
map' xform (first : rest) = xform first : map' xform rest

We can turn a list of planets into a list of planet names:

map' (\(Planet name _ _) -> name) planets

We can append , Lord of Catan to a bunch of names:

map' (++ ", Lord of Catan") ["Alice", "Bob", "Carol", "Diphthong"]

We can generate a never-ending sequence of even numbers:

map' (*2) [1..]

It’s time for the third big pattern. However, before we examine an abstraction, we should see concrete realizations of the thing we’re abstracting—some say three. Let’s write the following functions, all of which accept a list. They can all be written in the two-line recursive breakdown we’ve seen for processing lists.

  1. sum, which yields the sum of all the elements in the list
  2. product, which yields the product of all the elements in the list
  3. join, which concatenates together all the elements of the list
  4. any, which yields true if any of the boolean elements of the list is true, and false otherwise
  5. all, which yields true if all of the boolean elements of the list are true, and false otherwise

What do these solutions all have in common? Well, they all turn a list into some summary value. How are they different?

  1. In their “default” or “zero” value that is invoked in the base case.
  2. In their mixing or combining function, which stirs one of the elements into the summary value.

(In these cases, that summary value is a scalar, but it could have been another collection. Consider copy (first : rest) = first : copy rest.)

The abstraction of this algorithm is called fold or reduce or inject, and the zero value and the mixing function will be the holes that we leave open. Its interface will be something like this:

fold mix zero list

Let’s define it, postponing the type signature for a moment:

fold mix zero [] = zero
fold mix zero (first : rest) = mix first (fold mix zero rest)

We can apply a few syntactic niceties:

fold _ zero [] = zero
fold mix zero (first : rest) = mix first $ fold mix zero rest

Now, let’s rewrite all of our functions from above using fold:

sum = fold (+) 0
product = fold (*) 1
join = fold (++) ""
any = fold (||) False
all = fold (&&) True

With these three functional patterns abstracted away, we will almost never have to write code again, I’m pretty sure. They give us the power to be brief.

Here’s your TODO list:

See you next time!

Sincerely,

P.S. It’s time for a haiku!

Filter, map, and fold
They solve most of our problems
‘Cept for Syria

P.P.S. Here’s the code we wrote together:

lecture.hs

filter' :: (a -> Bool) -> [a] -> [a]
filter' pred [] = []
filter' pred (first : rest)
  | pred first = first : filter' pred rest
  | otherwise = filter' pred rest

data Planet = Planet String Double Double deriving Show

planets = [
    Planet "Mercury" 0.4 0.055,
    Planet "Venus" 0.7 0.815,
    Planet "Earth" 1.0 1.0,
    Planet "Mars" 1.5 0.107,
    Planet "Jupiter" 5.2 318.0,
    Planet "Saturn" 9.5 95.0,
    Planet "Uranus" 19.6 14.0,
    Planet "Neptune" 30.0 17.0
  ]

massierThanEarth (Planet _ _ mass) = mass > 1
massiers = filter' massierThanEarth planets
prebelters = filter' (\(Planet _ au _) -> au < 2.2) planets
us = filter' (\(Planet name _ _) -> elem 'u' name || elem 'U' name) planets

map' :: (a -> b) -> [a] -> [b]
map' xform [] = []
map' xform (first : rest) = xform first : map' xform rest

fold mix zero [] = zero
fold mix zero (first : rest) = mix first (fold mix zero rest)

sum' = fold (+) 0
product' = fold (*) 1
all' = fold (&&) True