teaching machines

CS 240: Lecture 4 – Asymptotic Complexity

September 1, 2021 by . Filed under cs3, fall-2021, lectures.

Dear students:

If I asked you how fast you can run, you probably wouldn’t give me a number. Rather, you’d ask me how far. To get a complete picture of your speed, I might ask how fast you run a mile, a 5K, a 10K, and a marathon. Similarly, when I ask an algorithm how many steps it takes to complete, I don’t get a single number. Instead, I get a function. I tell that function how big the input is, and the function reports how many steps the algorithm takes. We need the whole function in order to make a sound judgement about the algorithm. Some algorithms are sprinters, and some are long-distance runners.

Today we discuss how these functions are dropped into a handful of categories.

Simplifying

When we examine an algorithm’s cost function, we ignore all terms but the one that grows fastest. For example:

Additionally, we ignore all constant terms and coefficients. For example:

We do not automatically throw out constants when they are the base of exponent nn or the exponent of base nn.

This discarding of information needs some justification. How do we know that throwing out these lower-order terms and coefficients won’t throw off our ranking of algorithms?

Let’s start with a visceral argument. Suppose we have the simplified functions n2n2 and nn. When we plot them, we clearly see that an algorithm with cost function nn is faster:

However, what if we hadn’t simplified? What if the quadratic function was actually n2100n2100? With that scale factor, the quadratic functions dips below the linear function and is therefore a better algorithm. But if you look at larger values of nn, you see that the win is only temporary. In the long run, the linear algorithm wins. Whatever constant coefficient you pick, at some point the linear algorithm will drop below the quadratic. Constant coefficients do not change which function ultimately wins in this case.

If the functions are of the same order, surely the coefficients are important. For example, when choosing between 50n50n and 5n5n, I will always pick the latter. But if I simplify them both down to nn, then I won’t know which is better. This is true. The problem is that getting accurate coefficients is really hard. They will be impacted by the programming language, compiler, and hardware.

We must also justify discarding the lower-order terms. Suppose you are comparing the cost functions n2+3nn2+3n and n2n2. Both of these simplify to n2n2, which means that we’re saying these two algorithms are equivalent. But that’s not what the plot shows:

Clearly the n2n2 algorithm is a better one. But when you zoom out, the gap between them becomes insignificant. The gap of nn will never entirely disappear, but it will be dwarfed by the n2n2 term.

Complexity Classes

Simplifying lands us on just a small set of different functions. Here are some of the most common ones:

Each of these functions forms a set of all the cost functions that simplify to it. We express that a cost function belongs to one of these sets with Big Theta notation:

We might say that an algorithm is Big Theta of n2n2 or it’s on the order of n2n2.

Each of these functions also forms a set containing all the cost functions that simplify to it or to something faster. This membership is expressed using Big O notation:

These different sets are called complexity classes.

Big O

Theoretical computer scientists have defined more formal criteria for deciding how to lumping these cost functions together in complexity classes. Here’s the definition given by your textbook for saying that a function is Big O of another:

For T(n)T(n), a non-negative function, T(n)O(f(n))T(n)O(f(n)) if and only if there exist positive constants cc and n0n0 such that
T(n)cf(n) for all n>n0T(n)cf(n) for all n>n0

What this means is that function T can be made to be below function f, though f might need to get scaled up by a constant, and we might have to look at large input sizes.

Based on this we can formally show how cost functions fit into certain complexity classes. For example, 50nO(n)50nO(n) because 50ncn50ncn, where c=50c=50 and n0=1n0=1.

This example shows that throwing away the coefficients is an acceptable practice. We can also show that throwing away lower-order terms fits the definition of Big O, but the reasoning requires a little creativity. We may feel, for example, that n2+40nO(n2)n2+40nO(n2). We need to show this:

n2+40ncn2n2+40ncn2

That seems hard. However, we do know this:

n2+40nn2+40n2n2+40nn2+40n2

If we combine terms, then we end up finding a value for cc that makes the first inequality true:

n2+40nn2+40n2n2+40n41n2

This works for n0=1. That means we have n2+40nO(n2).

You can think of Big O as imposing a less-than-or-equal-to relationship on functions. Function f is an upper bound to all the functions in its set.

This reveals a weakness in Big O. We can say that nO(n2) because ncn2, for c=1 and n>1. But this isn’t saying much. A very fast algorithm will be Big O of many functions.

Big Omega

Because Big O only tells us vaguely that an algorithm is no worse than another, theorists have also defined Big Omega, which states that an algorithm is no better than another. It has this definition:

For T(n), a non-negative function, T(n)Ω(f(n)) if and only if there exist positive constants c and n0 such that
T(n)cf(n) for all n>n0

This is very much like Big O, but it states that function f can be made to dip below and stay below function T. The following memberships can be shown:

You can think of Big Omega as imposing a greater-than-or-equal-to relation on functions. But this too is not precise. Saying that an algorithm with a factorial cost function is Ω(n) true and unhelpful.

Peer Instruction Exercise

Suppose you have developed an algorithm for routing n pizza deliveries in the most fuel efficient way. The algorithm has cost function 2n+1. You are considering selling your algorithm and want to advertise it in the most favorable but truthful way. In which of the following sets do you advertise its membership?

Big Theta

To have meaningful measures of an algorithm’s time efficiency, we want sandwich its cost function both above and below. Sandwiching is provided by the Big Theta classification, which has this definition:

For T(n), a non-negative function, T(n)Θ(f(n)) if and only if T(n)O(f(n)) and T(n)Ω(f(n)).

This means that one scaling of f can be made up stay on or above function T, and another scaling can be made to stay on or below. You can think of Big Theta as imposing an equality relationship on functions. Given this intuition, which of the following statements is true?

TODO

See you next time.

Sincerely,

P.S. It’s time for a haiku!

My car is a boat
But no slower than walking
It’s a Big O car