CS 240: Lecture 22 - Graphs

Dear students:

We've collected up many abstract data types this semester: lists, sets, stacks, queues, priority queues, trees, and dictionaries. These will be your tools on every software development project you ever undertake. But there's one more to add to your repertoire: the graph. Like a tree, a graph is used to model relationships. A tree's relationships are hierarchical. Some entity is the parent of others. The only connections are between parents and children. With a graph, there is no hierarchy. Entities, or vertices, all live in the same plane. They may be connected to any other entity. A vertex may have many edges.

Handshake Problem

But first, let's start with a warmup exercise. It's one you may have seen before if you hang out with math problems. Here it is:

There are \(n\) people at a party. All party-goers shake hands with the other party-goers. How many handshakes occur? Even more important than the count, what is your approach to thinking about this problem?

Graphs

Graphs are often expressed as two sets: \(\mathbf{G} = (\mathbf{V}, \mathbf{E})\). \(\mathbf{V}\) is the set of vertices and \(\mathbf{E}\) is the set of edges.

The kinds of problems that you solve with graphs are the ones that must examine relationships. Here are several applications I can think of:

Of all the abstract data types that we've discussed, graphs may be the most abstract. Since each problem you encounter will likely model relationships differently, there is no canonical graph interface. However, some features are recurring:

Suppose you have an unweighted directed graph with 7 vertices. What is the maximum number of edges that the graph may contain? We'll answer this as a peer instruction exercise.

Representation

There are two common means of representing a graph in memory. The first is to create a table of cells. Each row represents a starting vertex. Each column represents an ending vertex. If there's an edge between the start and end vertex, then the cell holds a flag or a weight. This table is called an adjacency matrix. It marks which vertices are adjacent to another. The matrix can be stored as a 2D array, which makes for fast lookups. But space might be wasted.

An alternative is an adjacency list. Each vertex has an entry in a one-dimensional array. Stored in a vertex's cell is a linked list of its neighbors.

Which representation you use depends on the situation. Iterating through the neighbors is faster in an adjacency list since there aren't any empty cells to skip over. Checking whether two vertices are connected is slower in an adjacency list because you must iterate. If there aren't that many edges in the graph, the matrix will waste a lot of space.

Traversing

Just as we traversed trees, we also traverse graphs. However, trees have it easy. There's a clear direction of flow: you start at the root and work your way down to the leaves. With graphs, there's no obvious traversal order. One could just iterate through the set of vertices. However, we tend to want to process vertices in a couple of different manners. Either we follow paths through the graph, or we visit neighborhoods. To ensure that we visit every single vertex, we must track whether or not a vertex has been visited and start an exploration at every single vertex. Such a traversal algorithm might look like this in pseudocode:

function traverseGraph
  for each vertex v
    v.isVisited = false
  for each vertex v
    if v not visited
      visit(v)

To trace out entire paths at a time, we perform a depth-first traversal by recursively visiting the starting vertex's unvisited neighbors:

function depthVisit(seed)
  seed.isVisited = true
  visit seed, whatever that means
  
  for each of seed's neighbors
    if neighbor not visited
      depthVisit(neighbor)

To explore the local neighborhood first, we perform a bread-first traversal. Instead of immediately visiting the starting vertex's neighbors, we queue them up. Each iteration of the loop visits the vertex that's been in the queue the longest:

function breadthVisit(seed)
  queue = new Queue
  queue.enqueue(seed)
  seed.isVisited = true
  
  while queue is not empty
    v = queue.dequeue
    visit v, whatever that means
    for each of v's neighbors
      if v not visited
        v.isVisited = true
        v.enqueue(neighbor)

Exercise

Together we'll make the notion of a graph a bit more concrete by implementing a Graph class that uses an adjacency matrix. We'll decompose the problem into these four tasks:

TODO

You've got a few things to do before we meet again:

Read chapter 17 in the textbook, which I'll post later today and will be due after break.
Keep working on PA 4, which is due December 6. Early starts mean less panic.

See you next time!

Sincerely,

P.S. It's time for a haiku:

Socialist network Don't amass too many friends We'll redistribute