teaching machines

CS 488: Lecture 4 – Spaces and Orthographic Projections

February 1, 2021 by . Filed under graphics-3d, lectures, spring-2021.

Dear students:

Let’s step back and look at where we are. We can make shapes out of triangles. We can color them. We can move them around and resize them using a compact and fast system of matrices. We can tie these transformations to mouse events. That’s pretty nice. Where do we go next? It’d be nice if we used the third dimension a bit more. Before that, though, let’s address the issue of our shapes appearing distorted as the window changes size.

Our solution to this distortion addresses a larger organizational problem the undergirds all of computer graphics. We operate in many different spaces.

Model Space

The 3D models that we render are usually designed in a 3D modeling program like Blender, Maya, Cinema4D, ZBrush, and so on. The modeler uses a coordinate system that is convenient for the modeling process. Perhaps the origin is between a character’s feet or within an orb’s center. The coordinate system chosen by the modeler is called model space (or object space).

World Space

When we load the models defined in model space into our renderers. Suppose we get two characters from a 3D modeler and both have their feet planted at the origin in model space. If we render them both as is, they will appear on top of each other. To fix this, we introduce the notion of a world space that lays out a coordinate system for the world these characters occupy.

We translate, rotate, and scale the two characters to make sure they occupy different locations. Let’s call the matrix that takes our models from model space to their position in world space the model-to-world transformation.

NDC Space

Meanwhile, the GPU has its own notion of space. It looks at vertices whose x-coordinates are -1 and plops them down on the left side of the viewport. Vertices whose x-coordinates are 1 land on the right side of the viewport. The GPU refuses to acknowledge any other view of space. All that’s important to your scene must be in this box from -1 to 1. Coordinates in this space are called normalized device coordinates (NDC).

If we want a particular chunk of the world to be on screen, we have to squeeze it into this box. How we take a region of the world and shimmy it into this box? By scaling and translating!

I’ve been calling this normalized coordinate system clip space. The two spaces are the same when there’s no perspective. But let’s talk about that another day.

Orthographic Projection

To convert our world into NDC, we start by deciding what chunk of the world we want rendered in the viewport. Let’s call the x-coordinate of the left face of this box $l$ and the x-coordinate of the right face $r$. We want $l$ to transform into -1 and $r$ to 1. Let’s dream up a function $\hat{x}$ that can do this conversion.

We know these two things about function $\hat{x}$:

$$\begin{array}{rcl}\hat{x}(l) &=& -1 \\\hat{x}(r) &=& 1\end{array}$$

In between $l$ and $r$, we want the NDC coordinates to increase predictably. The center of the world box should map to 0. The most predictable scheme would be to make $\hat{x}$ a linear function, which means that $\hat{x}(x) = mx + b$. Let’s use our knowledge of lines to figure out what $m$ and $b$ should be:

$$\begin{array}{rcl}m &=& \frac{rise}{run} \\\mathrm{run} &=& r-l \\\mathrm{rise} &=& 1-{-1} \\m &=& \frac{2}{r-l} \\\end{array}$$

Now let’s drop $m$ into our equation for $\hat{x}(r)$ and solve the linear system:

$$\begin{array}{rcl}\hat{x}(r) &=& m \cdot r + b \\ 1 &=& \frac{2}{r-l} \cdot r + b \\ b &=& 1-\frac{2}{r-l} \cdot r \\ &=& \frac{r-l}{r-l}-\frac{2r}{r-l} \\ &=& \frac{r-l-2r}{r-l} \\ &=& \frac{-r-l}{r-l} \\ &=& \frac{-(r+l)}{r-l} \\ &=& -\frac{r+l}{r-l} \\\end{array}$$

Now we’ve pinned down our function:

$$\begin{array}{rcl}\hat{x}(x) &=& \frac{2}{r-l} \cdot x-\frac{r+l}{r-l}\end{array}$$

What do you notice about this function? It’s a scale and translate! That means we can use a matrix to transform world space into NDC space. It’ll start off like this:

$$\begin{array}{rcl}\begin{bmatrix}\frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\? & ? & ? & ? \\? & ? & ? & ? \\? & ? & ? & ?\end{bmatrix} \cdot \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} &=& \begin{bmatrix}\frac{2}{r-l} \cdot x – \frac{r+l}{r-l} \\? \\? \\?\end{bmatrix}\end{array}$$

The function for the y-coordinate should behave similarly, mapping bottom $b$ to -1 and top $t$ to 1.

$$\begin{array}{rcl}\begin{bmatrix}\frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\? & ? & ? & ? \\? & ? & ? & ?\end{bmatrix} \cdot \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} &=& \begin{bmatrix}\frac{2}{r-l} \cdot x – \frac{r+l}{r-l} \\\frac{2}{t-b} \cdot y – \frac{t+b}{t-b} \\? \\?\end{bmatrix}\end{array}$$

The z-coordinate is treated a bit differently in graphics systems. I am tempted to treat it just like the x- and y-coordinates for simplicity, but I feel like that would be doing you a disservice. Just as we had to define the left, right, bottom, and top bounds of the box, so must we define the near and far bounds. But for a reason that I am not able to explain, the convention is to flip their sign. Let’s call the z-coordinate of the nearest face of the box $-n$ and the z-coordinate of the right face $-f$. We want $-n$ to transform into -1 and $-f$ to 1. Let’s dream up a function $\hat{z}$ that can do this conversion.

We know these two things about function $\hat{z}$:

$$\begin{array}{rcl}\hat{z}(-n) &=& -1 \\\hat{z}(-f) &=& 1\end{array}$$

We take the same approach of treating $\hat{z}$ as a linear function and solving for its slope and y-intercept:

$$\begin{array}{rcl}m &=& \frac{rise}{run} \\\mathrm{run} &=& -f-{-n} \\ &=& n-f \\\mathrm{rise} &=& 1-{-1} \\m &=& \frac{2}{n-f} \\\end{array}$$

Now let’s drop $m$ into our equation for $\hat{z}(-f)$ and solve the linear system:

$$\begin{array}{rcl}\hat{z}(-f) &=& m \cdot -f + b \\ 1 &=& \frac{2}{n-f} \cdot -f + b \\ b &=& 1-\frac{2}{n-f} \cdot -f \\ &=& 1+\frac{2f}{n-f} \\ &=& \frac{n-f}{n-f}+\frac{2f}{n-f} \\ &=& \frac{n-f+2f}{n-f} \\ &=& \frac{n+f}{n-f}\end{array}$$

Now we’ve pinned down our function:

$$\begin{array}{rcl}\hat{z}(x) &=& \frac{2}{n-f} \cdot z+\frac{n+f}{n-f}\end{array}$$

That lets us complete our third row of the transformation matrix:

$$\begin{array}{rcl}\begin{bmatrix}\frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\0 & 0 & \frac{2}{n-f} & 0 & \frac{n+f}{n-f} \\? & ? & ? & ?\end{bmatrix} \cdot \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} &=& \begin{bmatrix}\frac{2}{r-l} \cdot x-\frac{r+l}{r-l} \\\frac{2}{t-b} \cdot y-\frac{t+b}{t-b} \\\frac{2}{n-f} \cdot z+\frac{n+f}{n-f} \\?\end{bmatrix}\end{array}$$

For the fourth row, we want to main the homogeneous coordinate of our vector, which is 1. We achieve this by zeroing out the x-, y-, and z-components. Our final orthographic projection matrix looks like this:

$$\begin{array}{rcl}\begin{bmatrix}\frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\0 & 0 & \frac{2}{n-f} & \frac{n+f}{n-f} \\0 & 0 & 0 & 1\end{bmatrix} \cdot \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} &=& \begin{bmatrix}\frac{2}{r-l} \cdot x-\frac{r+l}{r-l} \\\frac{2}{t-b} \cdot y-\frac{t+b}{t-b} \\\frac{2}{n-f} \cdot z+\frac{n+f}{n-f} \\1\end{bmatrix}\end{array}$$

Aspect Ratio

What happens if the viewport is 200×100 but the chunk of the world is 3×3? The shapes inside will be stretched wide. We have to fix this so that the world chunk has the same width:height ratio or aspect ratio.

Let’s assume that the world chunk is centered on the origin, that $l = -r$ and that $b = -t$. We express the equivalence of the boxes’ ratios:

$$\begin{array}{rcl}\dfrac{\mathrm{viewport\ width}}{\mathrm{viewport\ height}} &=& \dfrac{r}{t} \\ \mathrm{aspect\ ratio} &=& \dfrac{r}{t} \\\end{array}$$

The aspect ratio is effectively a constant set by the viewport. To prevent distortion, we adjust either $r$ or $t$.

$$\begin{array}{rcl}r &=& t \cdot \mathrm{aspect\ ratio} \\t &=& \dfrac{r}{\mathrm{aspect\ ratio}} \\\end{array}$$

We can pick either one. If the aspect ratio is at least 1, then we’ll grow the horizontal span if we use the first equation. If it goes below one, the first equation will shrink the horizontal span, which might cut off the chunk of the world we hoped to render. To prevent this, we can switch to the other equation once the aspect ratio goes below 1, which will grow the vertical span.

Code

Given its dependence on the aspect ratio, the projection matrix must be calculated whenever the canvas changes size. We add this code to our resize event listener:

let worldToClip;

function onSizeChanged() {
  // ...
  const aspectRatio = canvas.width / canvas.height;

  let right;
  let top;

  if (aspectRatio < 1) {
    right = 3;
    top = right / aspectRatio;
  } else {
    top = 3;
    right = top * aspectRatio;
  }

  worldToClip = Matrix4.ortho(-right, right, -top, top, 0.01, 10);

  // ...
}

function render() {
  // ...
  shaderProgram.bind();
  shaderProgram.setUniformMatrix4('worldToClip', worldToClip);
  // ...
}

In the vertex shader, the projection matrix is always leftmost in the matrix-vector multiplication that calculates the clip space position of the vertex. It’s applied last. The preceding transforms are applied first, as shown in this vertex shader:

uniform mat4 worldToClip;
uniform mat4 modelToWorld;

in vec3 position;

void main() {
  gl_Position = worldToClip * modelToWorld * vec4(position, 1.0);
}

Indexed Triangles

Let’s talk about something completely different for a moment.

In most of our computer games, triangles appear not individualized but in groups. They are stitched together to form the surfaces of our characters and objects. Neighboring triangles share vertices. In the drawing routine described above, we’d need to have duplicate entries for these shared vertices. But that’s wasteful and slow. A faster alternative is indexed geometry.

To share vertices, we abandon the implicit grouping of vertices by their sequence. In its place, we provide an explicit grouping via a list of vertex indices. To render a quadrilateral with two shared vertices, we’d build our vertex attributes up like this:

const positions = [
  -0.5, -0.5, 0,
   0.5, -0.5, 0,
  -0.5,  0.5, 0,
   0.5,  0.5, 0,
];

const colors = [
  1, 0, 0,
  0, 1, 0,
  0, 0, 1,
  0, 0, 0,
];

const faces = [
  0, 1, 2,
  1, 3, 2
];

const attributes = new VertexAttributes();
attributes.addAttribute('position', 4, 3, positions);
attributes.addAttribute('color', 4, 3, colors);
attributes.addIndices(faces);

Under the sequential system, we’d have needed 6 vertices to express the 2 triangles. The storage savings is more significant as our models become more complex. Additionally, with indexed geometry, the GPU can use the index as a key into a cache of the results of running the vertex shader. After we process the first triangle, the GPU only needs to run the vertex shader for the solitary uncached vertex of the second triangle.

Horizon

Now that we are able to mark of a 3D box of the world that we want to render, we are ready to start filling that box with shapes that span all three dimensions. We’ll stitch triangles together to make a complete 3D shape and get it rotating so we can see all sides.

TODO

Here’s your very first TODO list:

See you next time.

Sincerely,

P.S. It’s time for a haiku!

Which is the center?
Some say the sun, others Earth
They’re probably right