» CS 430: Lecture 6 – Expressions and Control Structures

CS 430: Lecture 6 – Expressions and Control Structures

March 2, 2021 by Chris Johnson. Filed under lectures, programming languages, spring-2021.

Dear students,

The programs we write do many things: they facilitate communication, they walk a user through an interaction, they automate drudgery, they control physical artifacts, they store and retrieve data, and so on. Whatever a program’s final goal, many of its intermediate steps are concerned with producing and consuming information. This information is combined with other information using mathematics and logic. Code that yields information that be can consumed by other code is called an expression. We evaluate an expression to produce a value.

As the program goes about the task of producing and consuming values, it may reach a point where it should only run certain code under a certain condition. Or it needs to go back and re-execute code it just executed. This dynamic navigation of the code is called control flow, and the syntactic features of the language to guide control flow are called control structures.

Today we’ll have a closer look at the design choices and mechanics of expressions and control structures.

Anatomy of Expressions

Syntactically, an expression is made up of operators, operands, and parentheses.

Operators

An operator can be classified along several dimensions. It may be a unary operator taking one operand, a binary operator taking two operands, a ternary operator taking three, and so on. Interestingly, most languages have only a single ternary operator: the conditional operator. We call it “the ternary operator,” even though this communicates nothing about its semantic purpose.

When an operator precedes its operands, we say it is a prefix operator. What prefix operators have you used? Perhaps -, ~, *, &, ++, --, sizeof, and new. The last two of these are not function calls. Languages with prefix operators may allow traditional binary operators to accept more operators, as in Racket’s (+ 1 2 3 4). A postfix operator trails its operands. What postfix operators have you used? Perhaps ++ and --. An infix operator appears between its operands.

The grammar of the language decides the precedence and associativity of the operators.

Most languages use very few symbols for operators, relying on convention to communicate an operator’s semantic meaning. A binary * means multiplication, and + means addition. Many of of these operators are overloaded, with separate versions of multiplication for integers and floating point numbers. In Python 3, the language designers decided to remove overloading for /. Single-slash division yields a floating point result. For example, 5 / 6 is 0.833. Double-slash division yields a whole number result. For example, 5 // 6 yields 0.

Overloading is nice when the different underlying computations have roughly the same semantic meaning. But what does << mean? That may be Ruby’s push method for arrays or C++’s insertion operator for output to an ofstream or a left shift.

Some languages allow the builtin operators to be overloaded in user-defined ways. We write functions named after the overloaded operation for the types we want to have support the operation. One of the Bjarne Stroustrup’s goals in C++ was to make user types be indistinguishable from builtin types. He called this type parity, and operator overloading was crucial to achieving his goal. In Java, we don’t have type parity. Strings get +, but our classes don’t. Instead, we give our types an add method.

Ruby also has operator overloading. We can give our strings a / operator:

class Array
  def /(n)
    short = length / n
    long = short + 1
    nlongs = length % n 
    div = long * nlongs
    (self[0...div].each_slice(long) + self[div..-1].each_slice(short))
  end
end

class String
  def /(n)
    (chars / n).map(&:join)
  end
end

alphabet = "abcdefghijklmnopqrstuvwxyz"
puts alphabet / 5

Time for a riddle. What’s the output of this Java program?

public class Riddle {
  public static void main(String[] args) {
    getTrue() || getFalse();
  }

  public static boolean getTrue() {
    System.out.print("T");
    return true;
  }

  public static boolean getFalse() {
    System.out.print("F");
    return false;
  }
}

The || is satisfied by its left operand alone. The side effect of the right operand is never seen, which means the right operand is never even considered. We say that && and || are short-circuit operators. Do you know any others? The ternary conditional operator ?: is also a short-circuit operator. It only evaluates one of its second and third operands. Our modern languages are also gaining a couple of other handy short-circuit operators. One is the optional chaining operator. In languages with pointers or nullable references, we have to be careful not to invoke a method or access a property on a null object. We write an if statement to guard against the null pointer exception, like this:

if (pet !== null) {
  console.log(pet.name);
}

The optional chaining operator lets us write this shorter form:

console.log(pet?.name);

Ruby, C#, JavaScript are some of the languages that support this operator.

If pet is null, the expression evaluates to undefined instead of raising an exception. If we want to fall back on a default value, we can use the null coalescing operator:

console.log(pet?.name ?? 'NPE: no pet exception');

This operator checks the left operand. If it’s not null, it evaluates to that and is done. If it is null, it evaluates to the right operand. The expression x ?? y is shorthand for x ? x : y. Some languages use ?: instead of ?? and call this operator the Elvis operator.

Operands

Speaking of operands, let’s talk more about them. When you see an expression of the form f() + g(), how do you expect it will be evaluated? Probably f will be called first, right? Maybe. The C and C++ standards do not actually specify the order in which operands are evaluated. In a left-to-right scheme, we’d evaluate f and then g. In a right-to-left scheme, we’d evaluate g and then f. There may be still other schemes.

Sometimes operands do not have the same type, as in double x = 1. There’s a double on the left and an int on the right. What should happen here? In some languages, the compiler introduces an implicit cast to turn the twos-complement 1 into the equivalent floating-point 1. What if our assignment was int x = 1.0? In some languages, this requires an explicit cast. Java is one of the languages that behaves like this. It considers numeric types in the following ladder:

double
float
long
int
short
byte

When you move from a lower type to a higher type, this is called a widening conversion. Java does not require casts for widening conversions. When you move from a higher type to a lower type, you have a narrowing conversion. Java does require casts for narrowing conversions. Other languages are more picky. In Kotlin, this code does not compile:

val f : Float = 6

One must use an explicit cast when converting between types. Why does Kotlin not adopt the same practice as Java? Aren’t widening conversions safe? No. There’s still a potential for information loss even when going to a “bigger” type. Consider this Java code:

float f = 16777217;
System.out.printf("%8f%n", f);

What will the output be? It’s 16777216. In single-precision floating point numbers, we have 1 bit for the sign, 23 for the the fraction after the floating point, and 8 for the exponent, with the value computed as sign * 2 ^ exponent * (1 + fraction). The numbers 0 through 16777215 can be faithfully represented in the 23 bits of the fraction. The number 16777216 = 2^24 can be represented by setting the exponent to 24. Past that, not every whole number can be represented. Once you have to start involving the exponent, the numbers are only going to be even.

Parentheses

The last anatomical structure of expressions is parentheses. We use these to change or emphasize the predecence and association of operations. But they are overloaded. We also use them to group parameter lists, to create tuples in Haskell and Python, and as subscript operators in MATLAB. Lisp uses parentheses to surround entire expressions.

Expressions vs. Statements

Not every structure in a program yields a value. We also have statements, which produce side effects. Common side effects include output and changes to variables. Calls to void functions are statements. Our forebears sometimes preferred to call void functions procedures rather than functions. The mathematical notion of a function implies a mapping from a domain to a range, and void functions do no such mapping.

Some languages do not have a notion of statements. Haskell, for example, disallows side effects. The code is made up entirely of definitions and expressions. In general, removing side effects makes our programs safer, faster, and easier to reason about. When a function has no dependencies on external mutable state, then its operations will be predictable. Such a function has what we call referential transparency. Generally, a function is referentially transparent if multiple calls to it can be replaced with a single call whose value is stored in a variable, as in this code:

// Multiple calls
print f(3) + f(3)

// Cached call
value = f(3)
print value + value

If a function is referentially transparent, we can optimize how it compiles, cache its results, and not be concerned with multiple cores needing to synchronize their mutable data.

Many languages support a hybrid of statements and expressions. The assignment statement, for example, has a side effect of updating memory, and it evaluates to the assigned value. A function may both return a value and produce output. A call to generate a pseudorandom random number that uses a linear congruence algorithm will evaluate to a random value but also store its value to help generate the next number.

Assignment Statements

Programming language designers generally don’t seem to agree on assignment statements. The crux of the dispute is that = of mathematics is not assignment. Rather, it’s an observation that two values are the same. The idea of using = to store values in memory and update them later is heresy to some. In ALGOL and Pascal, they chose to use :=. In R, they use id <- expr or expr -> id or id = expr.

When we update a variable by applying an operation to it, we can use the compound assignment operators: ++, --, +=, and so on. Several languages do not support the increment and decrement operators. I don’t blame them. These operators generate confusion. What’s the output of this code?

int a = 3;
int b = 3;
System.out.println(a++);
System.out.println(++b);
System.out.println(a);
System.out.println(b);

If you write your code so that side-effects and expressions are cleanly separated, then there’s no practical difference between pre- and post-increment. Yet because the designers of B, the predecessor to C, wanted to write really short code, we are stuck with this confusion until enough languages abandon them.

Some languages all several variables to be assigned at once. For example, we can swap two variables with this multiple assignment in Ruby:

a, b = b, a

Evaluation Algorithm

The parser turns the flat source code of our program into a tree whose structure reflects the operator precedence and associativity. This tree can be evaluated directly through a traversal algorithm, or it can be turned into equivalent machine code. Let’s work through an example of constructing a tree and evaluating it using the mechanics we described above.

Suppose k is 5, n is 2, and a is {1,3,6,8,11,14,16}. What is a[k--] + 8 * n - k?

To build up the tree, we can work from the least predecent operator at the top. In this case, the operators following this precedence order:

+ -
*
--
[]

So, we start with the arithmetic operators. Since these are left-associative, we find the rightmost instance and put it at the top of our tree. That’s the -. It’s left child will be the next weakest operation +. It’s left child will be the [] operation and its right child will be the * operation.

Control Structures

We pivot now to a discussion of control structures. In the early days of computing, our programs were a flat sequence of machine instructions. To execute out of sequence, we wrote the address of the out-of-sequence instruction into the program counter. Usually we did this with a jump/branch instruction. When high-level languages started to appear, language designers wondered if they should expose the low-level jump instructions in their “evolved” syntaxes. Some said yes, some no. In C, we still have the goto instruction. We can add labels to our code and execute instructions out of order, like this:

int main(int argc, char **argv) {
  int value = atoi(argv[1]);
  goto sub;

post:
  printf("value: %d\n", value);
  return 0;

sub:
  value += 1;
  goto post;
}

C’s goto only jumps to labels within the same function. But it also supports non-local jumps using setjmp and longjmp. In the 1960s, before C was developed, computer scientists were advocated for structured programming. The execution flow should be predictable, they said. Edsgar Dijkstra was an advocate for structured programming, and he wrote down his position in his letter Go To Statement Considered Harmful, published in the Communications of the ACM. The structured programming side ultimately won out, but I doubt this made Dijkstra happy. High-level language designers invented syntaxes that papered over the underlying jumping mechanics with structures that matched the semantic intent: conditional statements, loops, and subroutines (which we’ll talk about another day).

These days most of us know little about low-level branching. We use conditional statements and loops exclusively to navigate the CPU through our source code. However, each language has a slightly different take on what these structures look like. For conditional structures, we have these choices:

How many choices do we permit? Just 2, as in an if-else statement? Or do we allow more choices?
Is the structure an expression or a statement?
Is the condition have a boolean expression? Or is it a value whose possible values we want to partition into cases?
If we find a matching case, do we bail from the decision structure? Or do we fall through to the next case?

For iteration structures, we have these choices:

Is the termination logic implicit or explicit?
Is there an iterator? If so, what is its scope? Can it be modified?
Is the termination logic tested before or after the loop body? Or somewhere in between?

A language doesn’t need loops. Pure functional languages don’t have loops, as loops imply the notion of mutable state. These languages use recursion for iteration.

Conclusion

Our programming experience is shaped by the choices that designers have made around expressions and control structures. When these tools fit how we think, we feel productive and smart. On other days, we curse the tools or our brains.

See you next time!

Sincerely,

P.S. It’s time for a haiku!

Considered harmful
goto soon radicalized
Jumped to its label