teaching machines

CS 488: Lecture 12 – Untransforming the Mouse

March 8, 2021 by . Filed under graphics-3d, lectures, spring-2021.

Dear students:

Interactive graphics is funny because we work so hard to give the illusion of a full 3D world, but then we run this 2D cursor over it. As a human, you’ve probably experienced how hard it can be to figure out what someone is pointing at. Figuring out what the mouse is pointing at is equally challenging, but we have tools to help us: matrix inverses. Today we examine these inverses and use them to manipulate eye and world space.

Traveling Back in Space

Sometimes we want to figure out where in the world the user has clicked, perhaps because the user is trying to click where a new object will appear. Sometimes we want to figure out which vertex in a model the user has clicked on, perhaps because the user is trying to drag the vertex around. Our vehicle for all things related to mouse clicks are event listeners. The event object reports the mouse’s location—in pixel space.

The mouse coordinates are in pixel space, and the model’s coordinates are in model space. Before we can compare the mouse coordinates to any vertex coordinates, we need the coordinates to be in the same space. We have two options for aligning our spaces.

First, we could transform the model coordinates from model space into pixel space. This would require multiplying all the vertex positions by our gauntlet of matrices. This would likely be done on the CPU, as the GPU is hardwired to rasterize triangles, not perform arbitrary computations.

Second, we could “untransform” the pixel coordinates back into model space. How do we do that? Well, the same way we transformed them: with matrices. But the matrices we transform with are inverse matrices. An inverse matrix works in reverse. The inverse of a translation matrix untranslates. The inverse of a rotation matrix unrotates. The inverse of a scale matrix unscales.

A useful property of inverse matrices is that they multiply up to the identity matrix, thereby canceling each other out:

$$\mathbf{M}^{-1} \times \mathbf{M} = \mathbf{I}$$

If we have a vector $\mathbf{w}$ that is really a transformed $\mathbf{v}$, we can get back the original $\mathbf{v}$ by applying the inverse matrix just as we would in non-matrix algebra:

$$\begin{aligned}\mathbf{M} \times \mathbf{v} &= \mathbf{w} \\\mathbf{M}^{-1} \times \mathbf{M} \times \mathbf{v} &= \mathbf{M}^{-1} \times \mathbf{w} \\\mathbf{I} \times \mathbf{v} &= \mathbf{M}^{-1} \times \mathbf{w} \\\mathbf{v} &= \mathbf{M}^{-1} \times \mathbf{w} \\\end{aligned}$$

If $\mathbf{M}$ is a translation matrix, its inverse is:

$$\begin{bmatrix}1 & 0 & 0 & -\Delta_x \\0 & 1 & 0 & -\Delta_y \\0 & 0 & 1 & -\Delta_z \\0 & 0 & 0 & 1\end{bmatrix}$$

If $\mathbf{M}$ is a scale matrix, its inverse is:

$$\begin{bmatrix}\frac{1}{s_x} & 0 & 0 & 0 \\0 & \frac{1}{s_y} & 0 & 0 \\0 & 0 & \frac{1}{s_z} & 0 \\0 & 0 & 0 & 1\end{bmatrix}$$

If $\mathbf{M}$ is a rotation matrix, its inverse is the transpose of the matrix, in which the elements are flipped across the diagonal. The rows becomes the columns, like this:

$$\begin{bmatrix}a & b & c & d \\e & f & g & h \\i & j & k & l \\m & n & o & p \\\end{bmatrix}^\mathrm{T} = \begin{bmatrix}a & e & i & m \\b & f & j & n \\c & g & k & o \\d & h & l & p \\\end{bmatrix}$$

If $\mathbf{M}$ is a product of two transformations, we’ll have to find the inverse of the product:

$$\begin{aligned}\mathbf{M} &= \mathbf{A} \times \mathbf{B} \\\mathbf{M}^{-1} &= (\mathbf{A} \times \mathbf{B})^{-1} \\\end{aligned}$$

Using the definition of the inverse, we can work out how to compute the inverse of the product:

$$\begin{aligned}\mathbf{M} \times \mathbf{M}^{-1} &= \mathbf{I} \\\mathbf{A} \times \mathbf{B} \times (\mathbf{A} \times \mathbf{B})^{-1} &= \mathbf{I} \\\mathbf{A}^{-1} \times \mathbf{A} \times \mathbf{B} \times (\mathbf{A} \times \mathbf{B})^{-1} &= \mathbf{A}^{-1} \times \mathbf{I} \\\mathbf{B}^{-1} \times \mathbf{B} \times (\mathbf{A} \times \mathbf{B})^{-1} &= \mathbf{B}^{-1} \times \mathbf{A}^{-1} \\(\mathbf{A} \times \mathbf{B})^{-1} &= \mathbf{B}^{-1} \times \mathbf{A}^{-1} \\\end{aligned}$$

We apply this idea of untransforming in several contexts.

Positioning a Light in Eye Space

Suppose we want to allow the user to position a light source in eye space using the mouse, as if they were adjusting the position of the headlamp that they are wearing.

This is the pipeline that our coordinates go through:

$$\mathrm{position}_\mathrm{pixels} = \mathrm{pixelsFromClip} \times \mathrm{clipFromEye} \times \mathrm{eyeFromModel} \times \mathrm{position}_\mathrm{model}$$

On a mouse event, we have $\mathrm{position}_\mathrm{pixels}$ and want to figure out $\mathrm{position}_\mathrm{eye}$. We peel off the matrices on the right-hand side by multiplying by their inverses in reverse order:

$$\mathrm{clipFromEye}^{-1} \times \mathrm{pixelsFromClip}^{-1} \times \mathrm{position}_\mathrm{pixels} = \mathrm{position}_\mathrm{eye}$$

Since we don’t really have a $\mathrm{pixelsFromClip}$ matrix, it’s probably easier to turn mouse coordinates into clip coordinates by hand. We turn the mouse coordinates first into proportions of the window and then scale and bias them to fit into the unit cube that WebGL projects to the viewport:

function onMouseMove(event)
  mousePixels = new Vector4(event.mouseX, event.mouseY) 
  mouseClip = mousePixels / new Vector2(canvas.width, canvas.height) * 2 - 1

To go from clip space to eye space, we apply the inverse of the $\mathrm{clipFromEye}$ matrix. Recall that an orthographic projection matrix has this form:

$$\begin{bmatrix}\frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\0 & 0 & \frac{2}{n-f} & \frac{n+f}{n-f} \\0 & 0 & 0 & 1\end{bmatrix}$$

This matrix is the product of a scale and a translation, so its inverse will be the product of the inverse translation and the inverse scale. The eyeFromClip matrix looks like this:

$$\begin{bmatrix}\frac{r-l}{2} & 0 & 0 & \frac{r+l}{2} \\0 & \frac{t-b}{2} & 0 & \frac{t+b}{2} \\0 & 0 & \frac{n-f}{2} & \frac{f+n}{2} \\0 & 0 & 0 & 1\end{bmatrix}$$

Before we can multiply our mouse clip coordinates by this matrix, we need to turn the mouse coordinates into a 4-vector. Right now we have a 2-vector. The mouse doesn’t have a depth, so we need to add one. What should the z-coordinate be? Well, for this particular problem, we can use any value since we’re going to hardcode the z-coordinate in eye space in a moment. Let’s just set z = 0. We set the homogeneous coordinate to 1 since the coordinate is a position rather than a vector.

We add our untransformation to our pseudocode:

function onMouseMove(event)
  mousePixels = new Vector4(event.mouseX, event.mouseY) 
  mouseClip = mousePixels / new Vector2(canvas.width, canvas.height) * 2 - 1
  mouseClip = mouseClip.toVector4(0, 1)
  mouseEye = (eyeFromClip * mouseClip).toVector3()

The mouseEye vector is the mouse’s location in eye space. Let’s put the light there, but let’s tweak its z-coordinate so that the light has the same z-coordinate as the eye. We also send the light position up as a uniform to our shader:

function onMouseMove(event)
  mousePixels = new Vector4(event.mouseX, event.mouseY) 
  mouseClip = mousePixels / new Vector2(canvas.width, canvas.height) * 2 - 1
  mouseClip = mouseClip.toVector4(0, 1)
  mouseEye = (eyeFromClip * mouseClip).toVector3()
  mouseEye.z = 0

  shaderProgram.bind()
  shaderProgram.setUniform3f("lightPosition", mouseEye.x, mouseEye.y, mouseEye.z)
  shaderProgram.unbind()

  render()

Position a Model in World Space

Suppose we want to add a model to the world when the user clicks. Untransforming back into eye space won’t be enough. We have to peel back one additional layer. In our camera class, we set up the eyeFromWorld matrix as a product between a rotation and a translation:

this.matrix = rotation * Matrix4.translate(-this.from);

The inverse is the product of the inverse translation and the inverse rotation:

this.inverseMatrix = Matrix4.translate(this.from) * rotation.transpose()

Our code to find the new model’s world coordinates follows the same journey we took in finding the light’s eye coordinates. We just go one extra step back into world space use the camera’s inverse:

function onMouseUp(event)
  mousePixels = new Vector4(event.mouseX, event.mouseY) 
  mouseClip = mousePixels / new Vector2(canvas.width, canvas.height) * 2 - 1
  mouseClip = mouseClip.toVector4(0, 1)
  mouseEye = eyeFromClip * mouseClip
  mouseWorld = camera.inverseMatrix * mouseEye

Once again, we have a choice in how we guess the clip space z-position. Here we just set z = 0, which will give us a position with middle depth in the unit cube.

We use mouseWorld to construct the transform that will position the new model into world space such that its model space origin aligns with the clicked world position:

newModel.worldFromModel = Matrix4.translate(mouseWorld)

TODO

Here’s your very first TODO list:

See you next time.

Sincerely,

P.S. It’s time for a haiku!

$\mathbf{M}$ takes $\mathbf{p}$ to $\mathbf{q}$
$\mathbf{M}$ inverse takes $\mathbf{q}$ to $\mathbf{p}$
Math takes “why” to see