CS 240: Lecture 36 – Graphs
Dear students:
We’ve collected up many abstract data types this semester: lists, sets, stacks, queues, priority queues, trees, and dictionaries. These will be your tools on every software development project you ever undertake. But there’s one more to add to your repertoire: the graph. Like a tree, a graph is used to model relationships. A tree’s relationships are hierarchical. Some entity is the parent of others. The only connections are between parents and children. With a graph, there is no hierarchy. Entities, or vertices, all live in the same plane. They may be connected to any other entity. A vertex may have many edges.
Graphs are often expressed as two sets: $\mathbf{G} = (\mathbf{V}, \mathbf{E})$. $\mathbf{V}$ is the set of vertices and $\mathbf{E}$ is the set of edges.
The kinds of problems that you solve with graphs are the ones that must examine relationships. Here are several applications I can think of:
- Social networks. You are a vertex. Your friendships are edges. You receive friend suggestions for friends of your friends.
- Path planning. Each intersection is a vertex. Each road between intersections is an edge. You want to find a route through this graph that minimizes time and cost. Similar ideas are used for both driving directions and the routing of enemies in a game.
- Epidemic modeling. Communities are vertices. They are connected to other communities through work, commerce, and recreation. You examine the ways disease spreads from one community to the next.
Of all the abstract data types that we’ve discussed, graphs may be the most abstract. Since each problem you encounter will likely model relationships differently, there is no canonical graph interface. However, some features are recurring:
- Edges may be undirected or directed depending on the mutuality of relationships. On Facebook, if X has a friend Y, then Y has a friend X. On Twitter, the relationships are asymmetric. Following someone doesn’t mean they follow you. To represent a mutual relationship in a directed graph, you need two edges.
- Edges may be weighted. In a graph of a road network, the weights might indicate the distance between two intersections. In a weighted graph, a vertex might have multiple edges to another vertex. For example, there might be two ways to get from your location to your destination. There might be a gravel road with a weight of 97 and a paved road with a weight of 3.
- Some graphs allow cycles, while others do not. Care must be taken to avoid infinite loops when traversing a graph with cycles.
Suppose you have an unweighted directed graph with 7 vertices. What is the maximum number of edges that the graph may contain? We’ll answer this as a peer instruction exercise.
Representation
There are two common means of representing a graph in memory. The first is to create a table of cells. Each row represents a starting vertex. Each column represents an ending vertex. If there’s an edge between the start and end vertex, then the cell holds a flag or a weight. This table is called an adjacency matrix. It marks which vertices are adjacent to another. The matrix can be stored as a 2D array, which makes for fast lookups. But space might be wasted.
An alternative is an adjacency list. Each vertex has an entry in a one-dimensional array. Stored in a vertex’s cell is a linked list of its neighbors.
Which representation you use depends on the situation. Iterating through the neighbors is faster in an adjacency list since there aren’t any empty cells to skip over. Checking whether two vertices are connected is slower in an adjacency list because you must iterate. If there aren’t that many edges in the graph, the matrix will waste a lot of space.
Traversing
Just as we traversed trees, we also traverse graphs. However, trees have it easy. There’s a clear direction of flow: you start at the root and work your way down to the leaves. With graphs, there’s no obvious traversal order. One could just iterate through the set of vertices. However, we tend to want to process vertices in a couple of different manners. Either we follow paths through the graph, or we visit neighborhoods. To ensure that we visit every single vertex, we must track whether or not a vertex has been visited and start an exploration at every single vertex. Such a traversal algorithm might look like this in pseudocode:
function traverseGraph for each vertex v v.isVisited = false for each vertex v if v not visited visit(v)
To trace out entire paths at a time, we perform a depth-first traversal by recursively visiting the starting vertex’s unvisited neighbors:
function depthVisit(seed) seed.isVisited = true // process seed for each of seed's neighbors if neighbor not visited depthVisit(neighbor)
To explore the local neighborhood first, we perform a bread-first traversal. Instead of immediately visiting the starting vertex’s neighbors, we queue them up. Each iteration of the loop visits the vertex that’s been in the queue the longest:
function breadthVisit(seed) queue = new Queue queue.enqueue(seed) seed.isVisited = true while queue is not empty v = queue.dequeue // process v for each of v's neighbors if v not visited v.isVisited = true v.enqueue(neighbor)
Exercises
For the remainder of our time, we will work through some exercises on graphs.
TODO
You have some work to do before the next class:
- Submit a muddiest point today.
- Keep working on PA4.
See you next time.
P.S. It’s time for a haiku!
Socialist network
Don’t amass too many friends
We’ll redistribute