Untransforming the Mouse

The mouse skates over the viewport, hovering over specific objects in the scene. Identifying which object or location is under the cursor is a common need in interactive renderers. WebGL won't help much with this task. The burden is ours as graphics developers. In particular, we'll need to work backward through the gauntlet of transformations that put our models on the screen. We must untransform the mouse.

Recall that a vertex passes through this chain of spaces:

model space
↓
world space
↓
eye space
↓
clip space
↓
normalized space
↓
pixel space

A vertex generally moves from one space to another by way of a matrix multiplication. The only exception is the transition from clip space to normalized space, which is the result of the perspective divide. All together, this is the mathematical gauntlet through which each vertex passes:

$$\begin{aligned} \mathbf{p}_\mathrm{world} &= \mathrm{worldFromModel} \times \mathbf{p}_\mathrm{model} \\ \mathbf{p}_\mathrm{eye} &= \mathrm{eyeFromWorld} \times \mathbf{p}_\mathrm{world} \\ \mathbf{p}_\mathrm{clip} &= \mathrm{clipFromEye} \times \mathbf{p}_\mathrm{eye} \\ \mathbf{p}_\mathrm{norm} &= \frac{\mathbf{p}_\mathrm{clip}}{w_\mathrm{clip}} \\ \mathbf{p}_\mathrm{pixel} &= \mathrm{pixelFromNormalized} \times \mathbf{p}_\mathrm{norm} \\ \end{aligned}$$

We've been handling the first three transformations in the vertex shader, and WebGL has been handling the last two. To get the mouse position from pixel space into one of the earlier spaces, we must work backward through these operations, even the ones that WebGL has been handling. Let's step through the untransformation one space at a time.

From Pixel Space to Normalized Space

When a mouse event occurs, we query its coordinates, which are in pixel space, with this event listener:

window.addEventListener('pointerdown', event => {
  const mousePixel = new Vector3(
    event.clientX,
    canvas.height - 1 - event.clientY,
    0,
  );
});

window.addEventListener('pointerdown', event => {
  const mousePixel = new Vector3(
    event.clientX,
    canvas.height - 1 - event.clientY,
    0,
  );
});

We must step backward from pixel space to normalized space. The gauntlet above shows how we step forward via a matrix named $\mathrm{pixelFromNormalized}$, which is normally handled by WebGL. To step backward, we must undo this matrix multiplication. But what exactly does this matrix do?

The $\mathrm{pixelFromNormalized}$ matrix turns coordinates in the $[-1, 1]$ range of normalized space into the ranges $[0, \mathrm{width}]$ and $[0, \mathrm{height}]$ of pixel space. It jumps from a normalized x-coordinate to an x-coordinate in pixel space through this sequence of range-mapping operations:

$$\begin{align} [-1, 1] \xrightarrow{+~1} [0, 2] \xrightarrow{\div~2} [0, 1] \xrightarrow{\times~\mathrm{width}} [0, \mathrm{width}] \end{align}$$

To go from pixel space to normalized space, we apply the inverse operations in reverse order:

$$\begin{align} [-1, 1] \xleftarrow{-~1} [0, 2] \xleftarrow{\times~2} [0, 1] \xleftarrow{\div~\mathrm{width}} [0, \mathrm{width}] \end{align}$$

The y-coordinate is computed similarly using $\mathrm{height}$.

Soon we'll learn about an input system called a virtual trackball that operates in normalized space. Since it doesn't need to visit any early spaces, its work of untransforming the mouse position stops at this stage.

From Normalized Space to Clip Space

Sometimes we need to go farther back in the transformation pipeline. Perhaps we are building a 3D modeling tool and want to move a vertex to a new position in model space. Maybe we are trying to place a building or character in world space. To go farther back, we must pass from normalized space into clip space. The forward step from clip space into normalized space is the perspective divide:

$$\mathbf{p}_\mathrm{norm} = \frac{\mathbf{p}_\mathrm{clip}}{w_\mathrm{clip}} \\$$

In the backward step, we have $\mathbf{p}_\mathrm{norm}$ and are trying to solve for $\mathbf{p}_\mathrm{clip}$, so we apply the inverse operation:

$$\mathbf{p}_\mathrm{norm} \times w_\mathrm{clip} = \mathbf{p}_\mathrm{clip} \\$$

This is a problem. We don't have $w_\mathrm{clip}$, which is the homogeneous coordinate of $\mathbf{p}_\mathrm{clip}$. If we had the clip space position, we wouldn't be trying to solve for it. The good news is that we don't actually have to undo the division. Positions $\mathbf{p}_\mathrm{clip}$ and $\mathbf{p}_\mathrm{norm}$ are technically equivalent. The only difference is that one has a homogeneous coordinate and one doesn't. Since they are equivalent, we can untransform the normalized position directly into eye space without ever calculating the intermediate position in clip space.

From Normalized Space to Earlier Spaces

Untransforming into eye, world, or model space is a matter of undoing the matrix multiplications that got us into the later space. But what does it mean to undo a matrix multiplication? To undo a scalar multiplication, we perform a scalar division. To undo a matrix multiplication, we can't perform a matrix division, as division is not defined for matrices. Instead, we undo by performing another matrix multiplication. In particular, we multiply by the inverse matrix, which applies the opposite transformation. If we multiply a matrix and its inverse together, they cancel each other out and form the identity matrix:

$$\begin{align} \mathbf{M} \times \mathbf{M}^{-1} &= \mathbf{I} \\ \mathbf{M}^{-1} \times \mathbf{M} &= \mathbf{I} \end{align}$$

A translation matrix adds an offset to a position vector and has this form:

$$\mathbf{T} = \begin{bmatrix} 1 & 0 & 0 & \textrm{offset}_x \\ 0 & 1 & 0 & \textrm{offset}_y \\ 0 & 0 & 1 & \textrm{offset}_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\$$

The opposite or inverse of $\mathbf{T}$ subtracts the offset and therefore has this form:

$$\mathbf{T}^{-1} = \begin{bmatrix} 1 & 0 & 0 & -\textrm{offset}_x \\ 0 & 1 & 0 & -\textrm{offset}_y \\ 0 & 0 & 1 & -\textrm{offset}_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\$$

A scale matrix multiplies by scale factors and has this form:

$$\mathbf{S} = \begin{bmatrix} \textrm{factor}_x & 0 & 0 & 0 \\ 0 & \textrm{factor}_y & 0 & 0 \\ 0 & 0 & \textrm{factor}_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

The inverse of $\mathbf{S}$ divides by the scale factors and has this form:

$$\mathbf{S}^{-1} = \begin{bmatrix} \frac{1}{\textrm{factor}_x} & 0 & 0 & 0 \\ 0 & \frac{1}{\textrm{factor}_y} & 0 & 0 \\ 0 & 0 & \frac{1}{\textrm{factor}_z} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \\$$

To find the inverse of a rotation matrix, we recall the properties of rotation matrices that we discussed when building our camera abstractions:

the rows of the matrix are the axes of the incoming space that become the x-, y-, and z-axes of the outgoing space
the columns of the matrix are the axes of the outgoing space that the x-, y-, and z-axes of the incoming space become
all rows are perpendicular to each other

Suppose the top row of a rotation matrix $\mathbf{R}$ is $\begin{bmatrix}a & b & c\end{bmatrix}$, with the other rows named accordingly:

$$\mathbf{R} = \begin{bmatrix} a & b & c & 0 \\ d & e & f & 0 \\ g & h & i & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$$

When we multiply the matrix and the axis $\begin{bmatrix}a & b & c\end{bmatrix}$, we have:

$$\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} a & b & c & 0 \\ d & e & f & 0 \\ g & h & i & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} a \\ b \\ c \\ 0 \end{bmatrix}$$

A normalized vector dotted with itself is 1, which explains the x-component of the product vector. The other rows are perpendicular to the axis, so their dot products are 0.

The inverse matrix, whatever it is, must work in reverse, turning the x-axis back into $\begin{bmatrix}a & b & c\end{bmatrix}$:

$$\begin{bmatrix} a \\ b \\ c \\ 0 \end{bmatrix} = \begin{bmatrix} ? & ? & ? & 0 \\ ? & ? & ? & 0 \\ ? & ? & ? & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}$$

The second fact of rotation matrices tells us that the axis itself forms the first column of this inverse matrix:

$$\begin{bmatrix} a \\ b \\ c \\ 0 \end{bmatrix} = \begin{bmatrix} a & ? & ? & 0 \\ b & ? & ? & 0 \\ c & ? & ? & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}$$

The other axes are treated similarly. Thus the inverse of a rotation matrix is just the matrix flipped about its diagonal so the rows become the columns:

$$\mathbf{R}^{-1} = \begin{bmatrix} a & b & c & 0 \\ d & e & f & 0 \\ g & h & i & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}^{-1} = \begin{bmatrix} a & d & g & 0 \\ b & e & h & 0 \\ c & f & i & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$$

The flip of a matrix about its diagonal is called the transpose. So, the inverse of a rotation matrix is its transpose.

In general, point $\mathbf{p}$ in an earlier space is transformed by matrix $\mathbf{M}$ into point $\mathbf{p}'$ in a later space. The backward step, which untransforms $\mathbf{p}'$ into $\mathbf{p}$, is determined with some algebra:

$$\begin{align} \mathbf{p}' &= \mathbf{M} \times \mathbf{p} \\ \mathbf{M}^{-1} \times \mathbf{p}' &= \mathbf{M}^{-1} \times \mathbf{M} \times \mathbf{p} \\ \mathbf{M}^{-1} \times \mathbf{p}' &= \mathbf{p} \end{align}$$

There's only one hitch. Finding the inverse matrix is only straightforward when a matrix is a pure rotation, translation, or scaling. Knowing the inverses of the three standard transformations is of limited use.

Inverting an Arbitrary Matrix

Normally we transform through a complex chain of translations, scales, and rotations, not just a single transformation. Consider this chain of three transformations:

$$\mathbf{p}' = \mathbf{A} \times \mathbf{B} \times \mathbf{C} \times \mathbf{p}$$

Suppose we have $\mathbf{p'}$ and want to solve for $\mathbf{p}$. We start chipping away at the right-hand side by multiplying both sides by the inverse of $\mathbf{A}$:

$$\begin{aligned} \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{A}^{-1} \times \mathbf{A} \times \mathbf{B} \times \mathbf{C} \times \mathbf{p} \\ \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{B} \times \mathbf{C} \times \mathbf{p} \\ \end{aligned}$$

After applying a few more inverses, we arrive at $\mathbf{p}$:

$$\begin{aligned} \mathbf{B}^{-1} \times \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{C} \times \mathbf{p} \\ \mathbf{C}^{-1} \times \mathbf{B}^{-1} \times \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{p} \\ \end{aligned}$$

This shows that if we know $\mathbf{A}$, $\mathbf{B}$, and $\mathbf{C}$, then we may compute the inverse of their product by computing the product of their inverses in reversed order. However, we probably don't have our matrices broken down into separate transformations like this. We've been accumulating them up into a single matrix.

What we really need is a magic method that that will invert any invertible matrix without prior knowledge of how it was constructed. Here is that method, offered without explanation or justification:

class Matrix4 {
  // ...

  inverse() {
    let m = new Matrix4();

    let a0 = this.get(0, 0) * this.get(1, 1) - this.get(0, 1) * this.get(1, 0);
    let a1 = this.get(0, 0) * this.get(1, 2) - this.get(0, 2) * this.get(1, 0);
    let a2 = this.get(0, 0) * this.get(1, 3) - this.get(0, 3) * this.get(1, 0);

    let a3 = this.get(0, 1) * this.get(1, 2) - this.get(0, 2) * this.get(1, 1);
    let a4 = this.get(0, 1) * this.get(1, 3) - this.get(0, 3) * this.get(1, 1);
    let a5 = this.get(0, 2) * this.get(1, 3) - this.get(0, 3) * this.get(1, 2);

    let b0 = this.get(2, 0) * this.get(3, 1) - this.get(2, 1) * this.get(3, 0);
    let b1 = this.get(2, 0) * this.get(3, 2) - this.get(2, 2) * this.get(3, 0);
    let b2 = this.get(2, 0) * this.get(3, 3) - this.get(2, 3) * this.get(3, 0);

    let b3 = this.get(2, 1) * this.get(3, 2) - this.get(2, 2) * this.get(3, 1);
    let b4 = this.get(2, 1) * this.get(3, 3) - this.get(2, 3) * this.get(3, 1);
    let b5 = this.get(2, 2) * this.get(3, 3) - this.get(2, 3) * this.get(3, 2);

    let determinant = a0 * b5 - a1 * b4 + a2 * b3 + a3 * b2 - a4 * b1 + a5 * b0;

    if (determinant != 0) {
      let inverseDeterminant = 1 / determinant;
      m.set(0, 0, (+this.get(1, 1) * b5 - this.get(1, 2) * b4 + this.get(1, 3) * b3) * inverseDeterminant);
      m.set(0, 1, (-this.get(0, 1) * b5 + this.get(0, 2) * b4 - this.get(0, 3) * b3) * inverseDeterminant);
      m.set(0, 2, (+this.get(3, 1) * a5 - this.get(3, 2) * a4 + this.get(3, 3) * a3) * inverseDeterminant);
      m.set(0, 3, (-this.get(2, 1) * a5 + this.get(2, 2) * a4 - this.get(2, 3) * a3) * inverseDeterminant);
      m.set(1, 0, (-this.get(1, 0) * b5 + this.get(1, 2) * b2 - this.get(1, 3) * b1) * inverseDeterminant);
      m.set(1, 1, (+this.get(0, 0) * b5 - this.get(0, 2) * b2 + this.get(0, 3) * b1) * inverseDeterminant);
      m.set(1, 2, (-this.get(3, 0) * a5 + this.get(3, 2) * a2 - this.get(3, 3) * a1) * inverseDeterminant);
      m.set(1, 3, (+this.get(2, 0) * a5 - this.get(2, 2) * a2 + this.get(2, 3) * a1) * inverseDeterminant);
      m.set(2, 0, (+this.get(1, 0) * b4 - this.get(1, 1) * b2 + this.get(1, 3) * b0) * inverseDeterminant);
      m.set(2, 1, (-this.get(0, 0) * b4 + this.get(0, 1) * b2 - this.get(0, 3) * b0) * inverseDeterminant);
      m.set(2, 2, (+this.get(3, 0) * a4 - this.get(3, 1) * a2 + this.get(3, 3) * a0) * inverseDeterminant);
      m.set(2, 3, (-this.get(2, 0) * a4 + this.get(2, 1) * a2 - this.get(2, 3) * a0) * inverseDeterminant);
      m.set(3, 0, (-this.get(1, 0) * b3 + this.get(1, 1) * b1 - this.get(1, 2) * b0) * inverseDeterminant);
      m.set(3, 1, (+this.get(0, 0) * b3 - this.get(0, 1) * b1 + this.get(0, 2) * b0) * inverseDeterminant);
      m.set(3, 2, (-this.get(3, 0) * a3 + this.get(3, 1) * a1 - this.get(3, 2) * a0) * inverseDeterminant);
      m.set(3, 3, (+this.get(2, 0) * a3 - this.get(2, 1) * a1 + this.get(2, 2) * a0) * inverseDeterminant);
    } else {
      throw Error('Matrix is singular.');
    }

    return m;
  }
}

class Matrix4 {
  // ...

  inverse() {
    let m = new Matrix4();

    let a0 = this.get(0, 0) * this.get(1, 1) - this.get(0, 1) * this.get(1, 0);
    let a1 = this.get(0, 0) * this.get(1, 2) - this.get(0, 2) * this.get(1, 0);
    let a2 = this.get(0, 0) * this.get(1, 3) - this.get(0, 3) * this.get(1, 0);

    let a3 = this.get(0, 1) * this.get(1, 2) - this.get(0, 2) * this.get(1, 1);
    let a4 = this.get(0, 1) * this.get(1, 3) - this.get(0, 3) * this.get(1, 1);
    let a5 = this.get(0, 2) * this.get(1, 3) - this.get(0, 3) * this.get(1, 2);

    let b0 = this.get(2, 0) * this.get(3, 1) - this.get(2, 1) * this.get(3, 0);
    let b1 = this.get(2, 0) * this.get(3, 2) - this.get(2, 2) * this.get(3, 0);
    let b2 = this.get(2, 0) * this.get(3, 3) - this.get(2, 3) * this.get(3, 0);

    let b3 = this.get(2, 1) * this.get(3, 2) - this.get(2, 2) * this.get(3, 1);
    let b4 = this.get(2, 1) * this.get(3, 3) - this.get(2, 3) * this.get(3, 1);
    let b5 = this.get(2, 2) * this.get(3, 3) - this.get(2, 3) * this.get(3, 2);

    let determinant = a0 * b5 - a1 * b4 + a2 * b3 + a3 * b2 - a4 * b1 + a5 * b0;

    if (determinant != 0) {
      let inverseDeterminant = 1 / determinant;
      m.set(0, 0, (+this.get(1, 1) * b5 - this.get(1, 2) * b4 + this.get(1, 3) * b3) * inverseDeterminant);
      m.set(0, 1, (-this.get(0, 1) * b5 + this.get(0, 2) * b4 - this.get(0, 3) * b3) * inverseDeterminant);
      m.set(0, 2, (+this.get(3, 1) * a5 - this.get(3, 2) * a4 + this.get(3, 3) * a3) * inverseDeterminant);
      m.set(0, 3, (-this.get(2, 1) * a5 + this.get(2, 2) * a4 - this.get(2, 3) * a3) * inverseDeterminant);
      m.set(1, 0, (-this.get(1, 0) * b5 + this.get(1, 2) * b2 - this.get(1, 3) * b1) * inverseDeterminant);
      m.set(1, 1, (+this.get(0, 0) * b5 - this.get(0, 2) * b2 + this.get(0, 3) * b1) * inverseDeterminant);
      m.set(1, 2, (-this.get(3, 0) * a5 + this.get(3, 2) * a2 - this.get(3, 3) * a1) * inverseDeterminant);
      m.set(1, 3, (+this.get(2, 0) * a5 - this.get(2, 2) * a2 + this.get(2, 3) * a1) * inverseDeterminant);
      m.set(2, 0, (+this.get(1, 0) * b4 - this.get(1, 1) * b2 + this.get(1, 3) * b0) * inverseDeterminant);
      m.set(2, 1, (-this.get(0, 0) * b4 + this.get(0, 1) * b2 - this.get(0, 3) * b0) * inverseDeterminant);
      m.set(2, 2, (+this.get(3, 0) * a4 - this.get(3, 1) * a2 + this.get(3, 3) * a0) * inverseDeterminant);
      m.set(2, 3, (-this.get(2, 0) * a4 + this.get(2, 1) * a2 - this.get(2, 3) * a0) * inverseDeterminant);
      m.set(3, 0, (-this.get(1, 0) * b3 + this.get(1, 1) * b1 - this.get(1, 2) * b0) * inverseDeterminant);
      m.set(3, 1, (+this.get(0, 0) * b3 - this.get(0, 1) * b1 + this.get(0, 2) * b0) * inverseDeterminant);
      m.set(3, 2, (-this.get(3, 0) * a3 + this.get(3, 1) * a1 - this.get(3, 2) * a0) * inverseDeterminant);
      m.set(3, 3, (+this.get(2, 0) * a3 - this.get(2, 1) * a1 + this.get(2, 2) * a0) * inverseDeterminant);
    } else {
      throw Error('Matrix is singular.');
    }

    return m;
  }
}

Add this method to your Matrix4 class.

In summary, we untransform positions from a later space into an earlier space by multiplying by matrix inverses. If we have a mouse position in normalized space, we turn it into an eye, world, or model space position with this TypeScript:

const eyeFromClip = clipFromEye.inverse();
const worldFromEye = eyeFromWorld.inverse();
const modelFromWorld = worldFromModel.inverse();

const mouseEye = eyeFromClip.multiplyMatrix(mouseNormalized);
const mouseWorld = worldFromEye.multiplyMatrix(mouseEye);
const mouseModel = modelFromWorld.multiplyMatrix(mouseWorld);

const eyeFromClip = clipFromEye.inverse();
const worldFromEye = eyeFromWorld.inverse();
const modelFromWorld = worldFromModel.inverse();

const mouseEye = eyeFromClip.multiplyMatrix(mouseNormalized);
const mouseWorld = worldFromEye.multiplyMatrix(mouseEye);
const mouseModel = modelFromWorld.multiplyMatrix(mouseWorld);

Observe how the spaces in the names of the matrices flip when inverting them. For example, the inverse of clipFromEye is named eyeFromClip.

We now have a strategy for untransforming any position or vector. Let's examine several ways we can untransform the mouse to drive interaction with the scene.

← Pointer Events Trackball →

How to 3D

Untransforming the Mouse

From Pixel Space to Normalized Space

From Normalized Space to Clip Space

From Normalized Space to Earlier Spaces

Inverting an Arbitrary Matrix