» CS 352 Lecture 34 – Memory Hierarchy Cont’d

CS 352 Lecture 34 – Memory Hierarchy Cont’d

December 5, 2016 by Chris Johnson. Filed under cs352, fall 2016, lectures.

Dear students,

Last time we introduced the memory hierarchy. This hierarchy is not unlike a league of superheroes, each with its strengths and weaknesses. But the weaknesses are minimized, as long as you rotate in the proper superhero for each villain you encounter. For the memory hierarchy, the goal of stacking together these technologies is to provide a storage system that is fast, cheap, and capacious. It’ll only be fast if most of our time is spent with registers and cache. It’ll only be cheap if we don’t have a lot of the fast/expensive technology. It’ll only be capacious if the hierarchy widens at the bottom.

Remember the quicksort algorithm? It’s held up as one of the better sorting algorithms. But its worst case performance is O(n^2). How can we call that good? Well, worst case is not important if we rarely visit the worst case. The average case is a better measure of quicksort, and so it is with the memory hierarchy. It is designed to make our average access times be closer to the ideal times.

Let’s examine some canonical architecture-ish problems that help us see if the memory hierarchy accomplishes its goal of making average access times closer to the best case than the worst case:

Problem #1

Problem #2

Problem #3

In general, we can say that the access time for a given level of the memory is this:

average access time = hit percentage * hit time + (1 - hit percentage) * miss time

When you implement quicksort, you don’t just cross your fingers and hope you pick an average pivot point. One strategy is to randomly pick three pivot candidates, and choose the middle one, expecting that it will bisect the list in two similarly-sized sublists. Similarly, we can make sure that the average metrics are better in the memory hierachy by doing a couple of things:

Issue all data requests through the top of the hierarchy.
If the top level cannot service the request, it delegates to the level below.
Once a lower level satisfies the request, the upper level holds on to that data for a while.
The lower level returns not only the specific data that was requested, but also the nearby data.

Items 3 and 4 are necessary because many of our data accesses follow two trends:

temporal locality: once we access a value, we are likely to access it again in the near future
spatial locality: once we access a value, we are likely to access neighboring data in the near future

Consider this code:

int sum = 0;
for (int i = 0; i < 100; ++i) {
  sum += numbers[i];
}

Where do we see temporal locality in this code? Spatial locality?

The key to achieving our goal is to ensure that most accesses hit at the top level of the hierarchy. Let’s discuss some strategies for maximizing locality:

favor stack over heap
cluster scalar values together
do as much to an element in a “hot” (innermost) loop as possible (in particular, avoid spreading processing across loops)
traverse multidimensional arrays in their storage order (eg)
serialize dynamically-allocated multidimensional arrays (eg)
choose between collections of structs and parallel arrays based on access patterns (eg)
favor arrays over linked lists (eg)

Here’s your TODO list for next time:

Listen to Radiolab’s Speed episode, especially the segment on algorithm trading. On a 1/4 sheet, identify a hierarchy from your own life whose topmost levels are quick but small and whose bottommost levels are slow but capacious.
If you envision big data or high-performance computing in your future, check out What Every Programmer Should Know About Memory.

See you next class!

Sincerely,

P.S. It’s Haiku Monday!

Fridge: “Sorry, I’m late”
“Store had no cheese that you like”
“So Store ran to Farm”