teaching machines

CS 330 Lecture 15 – Choices in C++

March 1, 2017 by . Filed under cs330, lectures, spring 2017.

Dear students,

C++ is a language with choices. Some call that a curse. I’d argue that asking that all doors be closed but one gives to much authority to an outside entity that can’t possibly understand our situation and can’t be trusted to know which door is best.

Let’s discuss a few of these choices today:

First, let’s discuss our choice between inheritance and composition. We’ve been looking at how subtype polymorphism has some pretty stellar qualities:

  1. Our conductor code can process a whole hierarchy of types.
  2. Adding new types to the hierarchy requires no changes to the conductor code.
  3. Virtual table lookups make the dynamic dispatch happen very quickly, with no conditional statements inhibiting performance.

Let’s now muddy the water. The issue is that inheritance is the vehicle for subtype polymorphism. And inheritance is dangerous. Suppose we’re writing a raffle class. Something like this:

class Raffle {
  public:
    Raffle();
    void Add(const std::string &name);
    string Draw();
};

Then we realize that a raffle is just a bunch of names, a bunch of strings, and we realize that we can reuse code! This is just the job for vector. Let’s inherit from it:

class Raffle : public vector<string> {
  public:
    Raffle() :
      vector<string>() {
    }

    void Add(const std::string &name) {
      if (name != "Jill") {
        push_back(name);
      }
    }

    string Draw() {
      int i = (int) (size() * (double) rand() / RAND_MAX);
      string name = at(i);
      erase(begin() + i);
      return name; 
    }
};

Note that in the Add method, I’ve outlawed Jill from buying raffle tickets. Jill, of course, was the first girl that broke up with me. I haven’t forgotten. Watch her try to buy raffle tickets:

Raffle raffle;
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Jill");
raffle.Add("Bud");
std::cout << raffle.Draw() << std::endl;

Unfortunately, because I have inherited from vector, I have opened myself up to vulnerabilities. Jill can just do this:

raffle.push_back("Jill");

Darn. I know, I’ll just override push_back.

void push_back(const string &name) {
  Add(name);
}

Be careful here. We’ve just introduced mutual recursion, one of the hazards of overriding methods. We’ll need to qualify the push_back call in Add as vector<string>::push_back, otherwise (due to the mechanics of dynamic dispatch) we’ll call the overridden version, which calls Add, which calls the overridden version, and so on.

We’ve thwarted Jill’s exploit. Jill is smart though. Last I knew, she was on her way to a private Catholic school to learn how to be a lawyer. She finds another route:

raffle.insert(raffle.begin(), "Jill");

I might block insert, but then she does this:

raffle[0] = "Jill";

Argh! I can’t win! And all I smell is her vanilla scent!

Finally, I wise up, and realize that my problem here is inheritance. A Raffle really isn’t a vector. It’s a much smaller deal. I should have favored composition over inheritance:

class Raffle {
  public:
    Raffle();
    void Add(const string &name);
    string Draw();

  private:
    vector<string> names;
};

Not Raffle is-a vector. Raffle has-a vector. The people that build big software tell us to favor has-a relationships. Compose an object out of other classes, try not to inherit from them.

Sadly, this might mean a lot of grunt work. Suppose the superclass has ten methods I actually do want to inherit (and 99 that I don’t). With composition, I’ve got to write ten little methods to defer the work that I would have preferred to automatically inherit. Like this one:

void Raffle::size() {
  return names.size();
}

Good news. C++ gives me another option somewhere between composition and inheritance. We can use private inheritance:

class Raffle : private vector<string> {
  public:
    Raffle();
    void Add(const string &name);
    string Draw();
};

Inside the class, I can call supertype methods. But no one outside the class can. That’s slick. I get the code reuse of inheritance without opening myself up to all the dangerous entry points I have inherited. Sometimes private inheritance is viewed as implementation inheritance, but not interface inheritance.

How about C I/O vs. C++ I/O? Which of these do you like better?

printf("%02d. %s\n", line_number, label);
std::cout << std::setfill('0') << std::setw(2) << line_number << ". " << label << std::endl;

Okay, I concede that the C++ iostream API is a little much. But there are two issues here that cause me to respect the C++ way of doing things:

  1. Consider sqrt(-10). Should this work? What do you expect would happen to this code? Should it compile? Normally we don’t expect the compiler to look at the values. It inspects types. -10 is a double and sqrt wants a double. What about divide(8, 0)? Should this code compile? Probably, because we don’t expect the compiler to know whether two parameter values are combined in appropriate ways. Then we come to printf. Should the compiler let anything through as long the first thing is a C string? If so, we don’t know until runtime that we tried to print a float as a string. Or should the compiler ensure the parameters match up to the percent-codes in the format string? If so, we set a dangerous precedent of asking the compiler to be aware of implementation details. With C++’s iostream, we get strong type checking because the operator<< method that we dispatch is based on the data’s type.
  2. The intent behind C++’s iostream is important. Stroustrup believed strongly in type parity—that user-defined types should be indistinguishable for standard library types. With iostream, all objects are printed the same way, through their operator<< method. In C, printf cannot be made to work with custom types. We must reduce them to the primitives that printf knows about. In Java, we can’t make the subscript operator work with our own types like it does with arrays. We can’t get + to work with our types like it does with String. Object-oriented programming is entirely focused on creating new types and relating them to existing types, and type parity seems like a reasonable thing to expect. I encourage you to demand languages that allow you override the builtin operators and have a common way of performing universal operations like printing and producing a string representation of your objects.

Let’s now discuss our third choice: should we allocate memory on the stack or on the heap?

When should we use stack-allocated memory? Most of the time. It is cheaper to allocate, is automatically reclaimed at the end of the function, and promotes good cache hygiene.

When should we use heap-allocated memory? When a chunk of non-primitive memory has to outlive the function that creates it. (But C++’s move semantics put this criteria in question.) For example, if we create a ReallyBigImage inside a function, what will happen in the following code?

class ReallyBigImage {
  public:
    ReallyBigImage() {
      std::cout << "ctor" << std::endl; 
    }

    ~ReallyBigImage() {
      std::cout << "dtor" << std::endl; 
    }

  private:
    int pixels[100 * 200 * 3]; 
};

ReallyBigImage generate() {
  ReallyBigImage img;
  return img;
}

int main(int argc, char **argv) {
  ReallyBigImage img = generate();
}

Under the hood, C++ may invoke a special constructor called the copy constructor that creates a second instance of ReallyBigImage and shallowly copies over all the instance variables. Then it calls the deconstructor on generate‘s image. That’s unnecessarily expensive. We just want generate‘s image to live beyond the end of generate. That’s where the heap comes in. The heap is not bound up in the lifecycle of functions like the stack is.

ReallyBigImage *generate() {
  ReallyBigImage *img = new ReallyBigImage();
  return img;
}

int main(int argc, char **argv) {
  ReallyBigImage *img = generate();
  delete img;
}

The copy that happens in this code is a mere 8 bytes—the address.

In reality, a good compiler will recognize that the copy is unnecessary and will just have generate construct the ReallyBigImage in the memory of the caller. But you shouldn’t rely on compiler optimizations. Also, C++11 introduce a notion of move semantics that turns this compiler optimization into standardized behavior. The advances of C++ have greatly diminished the need for actively thinking about the heap.

Needing the address of our data is not a good enough reason to head to the heap. Just because we are working with memory, it doesn’t mean that the memory needs to be on the heap. Stack is memory too. Suppose for some reason you need to get both the quotient and remainder of a division operation. We might use pointers to get around the fact that C and C++ only allow a single return value:

void divmod(int numerator, int denominator, int *quotient, int *remainder) {
  *quotient = numerator / denominator;
  *remainder = numerator % denominator;
}
The caller of this code can just as easily use stack variables to hold the out parameters:
int main(int argc, char **argv) {
  int row, col;
  int width = 3;
  divmod(7, width, &row, &col);
  return 0;
}

However, there’s a better way to write this code: references. These give you the cheap sharing that pointers afford you, without the indirection:

void divmod(int numerator, int denominator, int &quotient, int &remainder) {
  quotient = numerator / denominator;
  remainder = numerator % denominator;
}

int main(int argc, char **argv) {
  int row, col;
  int width = 3;
  divmod(7, width, row, col);
  return 0;
}

I guess that amounts to a fourth choice: should we use references or pointers? Favor references if you are not dealing with dynamically allocated memory.

Here’s your TODO list for next time:

See you then!

Sincerely,