Untransforming

The mouse cursor skates over the flat window, knowing nothing about the complex 3D scene behind it. When a pointer event occurs, we query its coordinates, which are in pixel space, with this event listener:

window.addEventListener('pointerdown', event => {
  const mousePixel = new Vector2(
    event.clientX,
    canvas.height - event.clientY
  );
});

window.addEventListener('pointerdown', event => {
  const mousePixel = new Vector2(
    event.clientX,
    canvas.height - event.clientY
  );
});

Note that we take the complement of the y-coordinate. This action bridges between the two competing standards for arranging the origin and the y-axis. In the browser's pixel space, in which clientX and clientY are defined, the origin is at the top-left corner with the y-axis pointing down. In WebGL and our transformation system, the origin is at the bottom-left with the y-axis pointing up.

If we wish to identify which object the mouse clicked on, we'll have to convert the 2D mouse position into one of the preceding 3D spaces. Let's figure out how to do that. Recall that a vertex passes through this chain of spaces:

model space
↓
world space
↓
eye space
↓
clip space
↓
normalized space
↓
pixel space

A vertex generally moves from one space to another by way of a matrix multiplication. The only exception is the transition from clip space to normalized space, which is the result of the perspective divide. All together, this is the mathematical gauntlet through which each vertex passes:

$$\begin{aligned} \mathbf{p}_\mathrm{world} &= \mathrm{worldFromModel} \times \mathbf{p}_\mathrm{model} \\ \mathbf{p}_\mathrm{eye} &= \mathrm{eyeFromWorld} \times \mathbf{p}_\mathrm{world} \\ \mathbf{p}_\mathrm{clip} &= \mathrm{clipFromEye} \times \mathbf{p}_\mathrm{eye} \\ \mathbf{p}_\mathrm{norm} &= \frac{\mathbf{p}_\mathrm{clip}}{w_\mathrm{clip}} \\ \mathbf{p}_\mathrm{pixel} &= \mathrm{pixelFromNormalized} \times \mathbf{p}_\mathrm{norm} \\ \end{aligned}$$

To get the mouse position from pixel space into one of the earlier spaces, we must work backward through these operations. Let's step through one space at a time.

From Pixel Space to Normalized Space

The first step backward is from pixel space to normalized space. The gauntlet above shows how we step forward via a matrix named $\mathrm{pixelFromNormalized}$. To step backward, we must undo this matrix multiplication. But what exactly does this matrix do? We haven't used it in our renderers because WebGL performs this final transformation, not us.

The $\mathrm{pixelFromNormalized}$ matrix turns coordinates in the $[-1, 1]$ range of normalized space into the ranges $[0, \mathrm{width}]$ and $[0, \mathrm{height}]$ of pixel space. It jumps from a normalized x-coordinate to an x-coordinate in pixel space through this sequence of range-mapping operations:

$$\begin{align} [-1, 1] \xrightarrow{+~1} [0, 2] \xrightarrow{\div~2} [0, 1] \xrightarrow{\times~\mathrm{width}} [0, \mathrm{width}] \end{align}$$

To go from pixel space to normalized space, we apply the inverse operations in reverse order:

$$\begin{align} [-1, 1] \xleftarrow{-~1} [0, 2] \xleftarrow{\times~2} [0, 1] \xleftarrow{\div~\mathrm{width}} [0, \mathrm{width}] \end{align}$$

The y-coordinate is computed similarly using $\mathrm{height}$. The untransformation has this pseudocode form:

normalizedPosition.x = pixelPosition.x / width * 2 - 1
normalizedPosition.y = pixelPosition.y / height * 2 - 1

normalizedPosition.x = pixelPosition.x / width * 2 - 1
normalizedPosition.y = pixelPosition.y / height * 2 - 1

Soon you'll read about an input system that operates in normalized space. Since it doesn't need to visit any early spaces, its work of untransforming the mouse position stops at this stage.

From Normalized Space to Clip Space

Sometimes we need to go farther back in the transformation pipeline. Perhaps we are trying to move a vertex on a 3D model to a new position in model space. Maybe we are trying to place a building or character in world space. To go farther back, we must pass from normalized space into clip space. The forward step from clip space into normalized space is the perspective divide:

$$\mathbf{p}_\mathrm{norm} = \frac{\mathbf{p}_\mathrm{clip}}{w_\mathrm{clip}} \\$$

The backward step must undo this division, so we apply the inverse operation to the normalized coordinates:

$$\mathbf{p}_\mathrm{norm} \times w_\mathrm{clip} = \mathbf{p}_\mathrm{clip} \\$$

This is a problem. We don't have $w_\mathrm{clip}$, which is the homogeneous coordinate of $\mathbf{p}_\mathrm{clip}$. If we had the clip space position, we wouldn't be trying to solve for it. Let's postpone resolving this circular dependency. For the moment, we'll pretend that clip space doesn't exist and untransform from normalized space directly into eye space.

From Normalized Space to Earlier Spaces

Untransforming into eye, world, or model space is a matter of undoing the matrix multiplications that got us into the later space. But what does it mean to undo a matrix multiplication? To undo a scalar multiplication, we perform a scalar division. To undo a matrix multiplication, we can't perform a matrix division, as division is not defined for matrices. Instead, we undo by performing another matrix multiplication. In particular, we multiply by the inverse matrix, which applies the opposite transformation.

A translation matrix adds an offset to a position vector and has this form:

$$\mathbf{T} = \begin{bmatrix} 1 & 0 & 0 & \textrm{offset}_x \\ 0 & 1 & 0 & \textrm{offset}_y \\ 0 & 0 & 1 & \textrm{offset}_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\$$

The opposite or inverse of $\mathbf{T}$ subtracts the offset and therefore has this form:

$$\mathbf{T}^{-1} = \begin{bmatrix} 1 & 0 & 0 & -\textrm{offset}_x \\ 0 & 1 & 0 & -\textrm{offset}_y \\ 0 & 0 & 1 & -\textrm{offset}_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \\$$

A scale matrix multiplies by scale factors and has this form:

$$\mathbf{S} = \begin{bmatrix} \textrm{factor}_x & 0 & 0 & 0 \\ 0 & \textrm{factor}_y & 0 & 0 \\ 0 & 0 & \textrm{factor}_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

The inverse of $\mathbf{S}$ divides by the scale factors and has this form:

$$\mathbf{S}^{-1} = \begin{bmatrix} \frac{1}{\textrm{factor}_x} & 0 & 0 & 0 \\ 0 & \frac{1}{\textrm{factor}_y} & 0 & 0 \\ 0 & 0 & \frac{1}{\textrm{factor}_z} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \\$$

You've seen four different rotation matrices at this point. If you know the axis and angle that were used to build the matrix, you can build the inverse in the exact same way using the same axis but the negated angle. For example, to undo a rotation of 45 degrees around the x-axis, you perform a rotation of -45 degrees around the x-axis.

If you don't know the axis or angle, you can make use of this handy mathematical truth: the inverse of a rotation matrix is its own transpose. You transpose a matrix by flipping its components about the diagonal running from its top-left corner to its bottom-right. The rows of the original matrix become the columns of the transpose. For example, suppose you have this rotation matrix:

$$\mathbf{R} = \begin{bmatrix} a & b & c & 0 \\ d & e & f & 0 \\ g & h & i & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

Its transpose is flipped about the diagonal:

$$\mathbf{R}^{-1} = \mathbf{R}^{T} = \begin{bmatrix} a & d & g & 0 \\ b & e & h & 0 \\ c & f & i & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

The elements on the diagonal do not move.

In general, point $\mathbf{p}$ in an earlier space is transformed by matrix $\mathbf{M}$ into $\mathbf{p}'$ in a later space. The backward step, which untransforms $\mathbf{p}'$ into the earlier space, is determined with some algebra:

$$\begin{align} \mathbf{p}' &= \mathbf{M} \times \mathbf{p} \\ \mathbf{M}^{-1} \times \mathbf{p}' &= \mathbf{M}^{-1} \times \mathbf{M} \times \mathbf{p} \\ \mathbf{M}^{-1} \times \mathbf{p}' &= \mathbf{p} \end{align}$$

This works because multiplying a matrix by its inverse cancels them both out.

Inverting a Matrix

Knowing the inverses of the three standard transformations is of limited use. Normally you are applying a complex chain of translations, scales, and rotations, not just a single transformation. Consider this chain of three transformations:

$$\mathbf{p}' = \mathbf{A} \times \mathbf{B} \times \mathbf{C} \times \mathbf{p}$$

Suppose you have $\mathbf{p'}$ and want to solve for $\mathbf{p}$. You start chipping away at the right-hand side by multiply both sides by the inverse of $\mathbf{A}$:

$$\begin{aligned} \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{A}^{-1} \times \mathbf{A} \times \mathbf{B} \times \mathbf{C} \times \mathbf{p} \\ \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{B} \times \mathbf{C} \times \mathbf{p} \\ \end{aligned}$$

After applying a few more inverses, you arrive at $\mathbf{p}$:

$$\begin{aligned} \mathbf{B}^{-1} \times \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{C} \times \mathbf{p} \\ \mathbf{C}^{-1} \times \mathbf{B}^{-1} \times \mathbf{A}^{-1} \times \mathbf{p}' &= \mathbf{p} \\ \end{aligned}$$

This shows that if we know $\mathbf{A}$, $\mathbf{B}$, and $\mathbf{C}$, then we may compute the inverse of their product by computing the product of their inverses in reversed order. However, we probably don't have our matrices broken down into separate transformations like this. We've been accumulating them up into a single matrix.

What we really need is a magic method that that will invert any invertible matrix without prior knowledge of how it was constructed. Here is that method, offered without explanation or justification:

class Matrix4 {
  // ...

  inverse() {
    let m = new Matrix4();

    let a0 = this.get(0, 0) * this.get(1, 1) - this.get(0, 1) * this.get(1, 0);
    let a1 = this.get(0, 0) * this.get(1, 2) - this.get(0, 2) * this.get(1, 0);
    let a2 = this.get(0, 0) * this.get(1, 3) - this.get(0, 3) * this.get(1, 0);

    let a3 = this.get(0, 1) * this.get(1, 2) - this.get(0, 2) * this.get(1, 1);
    let a4 = this.get(0, 1) * this.get(1, 3) - this.get(0, 3) * this.get(1, 1);
    let a5 = this.get(0, 2) * this.get(1, 3) - this.get(0, 3) * this.get(1, 2);

    let b0 = this.get(2, 0) * this.get(3, 1) - this.get(2, 1) * this.get(3, 0);
    let b1 = this.get(2, 0) * this.get(3, 2) - this.get(2, 2) * this.get(3, 0);
    let b2 = this.get(2, 0) * this.get(3, 3) - this.get(2, 3) * this.get(3, 0);

    let b3 = this.get(2, 1) * this.get(3, 2) - this.get(2, 2) * this.get(3, 1);
    let b4 = this.get(2, 1) * this.get(3, 3) - this.get(2, 3) * this.get(3, 1);
    let b5 = this.get(2, 2) * this.get(3, 3) - this.get(2, 3) * this.get(3, 2);

    let determinant = a0 * b5 - a1 * b4 + a2 * b3 + a3 * b2 - a4 * b1 + a5 * b0;

    if (determinant != 0) {
      let inverseDeterminant = 1 / determinant;
      m.set(0, 0, (+this.get(1, 1) * b5 - this.get(1, 2) * b4 + this.get(1, 3) * b3) * inverseDeterminant);
      m.set(0, 1, (-this.get(0, 1) * b5 + this.get(0, 2) * b4 - this.get(0, 3) * b3) * inverseDeterminant);
      m.set(0, 2, (+this.get(3, 1) * a5 - this.get(3, 2) * a4 + this.get(3, 3) * a3) * inverseDeterminant);
      m.set(0, 3, (-this.get(2, 1) * a5 + this.get(2, 2) * a4 - this.get(2, 3) * a3) * inverseDeterminant);
      m.set(1, 0, (-this.get(1, 0) * b5 + this.get(1, 2) * b2 - this.get(1, 3) * b1) * inverseDeterminant);
      m.set(1, 1, (+this.get(0, 0) * b5 - this.get(0, 2) * b2 + this.get(0, 3) * b1) * inverseDeterminant);
      m.set(1, 2, (-this.get(3, 0) * a5 + this.get(3, 2) * a2 - this.get(3, 3) * a1) * inverseDeterminant);
      m.set(1, 3, (+this.get(2, 0) * a5 - this.get(2, 2) * a2 + this.get(2, 3) * a1) * inverseDeterminant);
      m.set(2, 0, (+this.get(1, 0) * b4 - this.get(1, 1) * b2 + this.get(1, 3) * b0) * inverseDeterminant);
      m.set(2, 1, (-this.get(0, 0) * b4 + this.get(0, 1) * b2 - this.get(0, 3) * b0) * inverseDeterminant);
      m.set(2, 2, (+this.get(3, 0) * a4 - this.get(3, 1) * a2 + this.get(3, 3) * a0) * inverseDeterminant);
      m.set(2, 3, (-this.get(2, 0) * a4 + this.get(2, 1) * a2 - this.get(2, 3) * a0) * inverseDeterminant);
      m.set(3, 0, (-this.get(1, 0) * b3 + this.get(1, 1) * b1 - this.get(1, 2) * b0) * inverseDeterminant);
      m.set(3, 1, (+this.get(0, 0) * b3 - this.get(0, 1) * b1 + this.get(0, 2) * b0) * inverseDeterminant);
      m.set(3, 2, (-this.get(3, 0) * a3 + this.get(3, 1) * a1 - this.get(3, 2) * a0) * inverseDeterminant);
      m.set(3, 3, (+this.get(2, 0) * a3 - this.get(2, 1) * a1 + this.get(2, 2) * a0) * inverseDeterminant);
    } else {
      throw Error('Matrix is singular.');
    }

    return m;
  }
}

class Matrix4 {
  // ...

  inverse() {
    let m = new Matrix4();

    let a0 = this.get(0, 0) * this.get(1, 1) - this.get(0, 1) * this.get(1, 0);
    let a1 = this.get(0, 0) * this.get(1, 2) - this.get(0, 2) * this.get(1, 0);
    let a2 = this.get(0, 0) * this.get(1, 3) - this.get(0, 3) * this.get(1, 0);

    let a3 = this.get(0, 1) * this.get(1, 2) - this.get(0, 2) * this.get(1, 1);
    let a4 = this.get(0, 1) * this.get(1, 3) - this.get(0, 3) * this.get(1, 1);
    let a5 = this.get(0, 2) * this.get(1, 3) - this.get(0, 3) * this.get(1, 2);

    let b0 = this.get(2, 0) * this.get(3, 1) - this.get(2, 1) * this.get(3, 0);
    let b1 = this.get(2, 0) * this.get(3, 2) - this.get(2, 2) * this.get(3, 0);
    let b2 = this.get(2, 0) * this.get(3, 3) - this.get(2, 3) * this.get(3, 0);

    let b3 = this.get(2, 1) * this.get(3, 2) - this.get(2, 2) * this.get(3, 1);
    let b4 = this.get(2, 1) * this.get(3, 3) - this.get(2, 3) * this.get(3, 1);
    let b5 = this.get(2, 2) * this.get(3, 3) - this.get(2, 3) * this.get(3, 2);

    let determinant = a0 * b5 - a1 * b4 + a2 * b3 + a3 * b2 - a4 * b1 + a5 * b0;

    if (determinant != 0) {
      let inverseDeterminant = 1 / determinant;
      m.set(0, 0, (+this.get(1, 1) * b5 - this.get(1, 2) * b4 + this.get(1, 3) * b3) * inverseDeterminant);
      m.set(0, 1, (-this.get(0, 1) * b5 + this.get(0, 2) * b4 - this.get(0, 3) * b3) * inverseDeterminant);
      m.set(0, 2, (+this.get(3, 1) * a5 - this.get(3, 2) * a4 + this.get(3, 3) * a3) * inverseDeterminant);
      m.set(0, 3, (-this.get(2, 1) * a5 + this.get(2, 2) * a4 - this.get(2, 3) * a3) * inverseDeterminant);
      m.set(1, 0, (-this.get(1, 0) * b5 + this.get(1, 2) * b2 - this.get(1, 3) * b1) * inverseDeterminant);
      m.set(1, 1, (+this.get(0, 0) * b5 - this.get(0, 2) * b2 + this.get(0, 3) * b1) * inverseDeterminant);
      m.set(1, 2, (-this.get(3, 0) * a5 + this.get(3, 2) * a2 - this.get(3, 3) * a1) * inverseDeterminant);
      m.set(1, 3, (+this.get(2, 0) * a5 - this.get(2, 2) * a2 + this.get(2, 3) * a1) * inverseDeterminant);
      m.set(2, 0, (+this.get(1, 0) * b4 - this.get(1, 1) * b2 + this.get(1, 3) * b0) * inverseDeterminant);
      m.set(2, 1, (-this.get(0, 0) * b4 + this.get(0, 1) * b2 - this.get(0, 3) * b0) * inverseDeterminant);
      m.set(2, 2, (+this.get(3, 0) * a4 - this.get(3, 1) * a2 + this.get(3, 3) * a0) * inverseDeterminant);
      m.set(2, 3, (-this.get(2, 0) * a4 + this.get(2, 1) * a2 - this.get(2, 3) * a0) * inverseDeterminant);
      m.set(3, 0, (-this.get(1, 0) * b3 + this.get(1, 1) * b1 - this.get(1, 2) * b0) * inverseDeterminant);
      m.set(3, 1, (+this.get(0, 0) * b3 - this.get(0, 1) * b1 + this.get(0, 2) * b0) * inverseDeterminant);
      m.set(3, 2, (-this.get(3, 0) * a3 + this.get(3, 1) * a1 - this.get(3, 2) * a0) * inverseDeterminant);
      m.set(3, 3, (+this.get(2, 0) * a3 - this.get(2, 1) * a1 + this.get(2, 2) * a0) * inverseDeterminant);
    } else {
      throw Error('Matrix is singular.');
    }

    return m;
  }
}

This method belongs in your Matrix4 class.

In summary, we turn positions from a later space into an earlier space by multiplying the coordinates in the later space by the matrix inverses. If we have a normalized space position, we can turn it into an eye, world, or model space position with this TypeScript:

const eyeFromClip = clipFromEye.inverse();
const worldFromEye = eyeFromWorld.inverse();
const modelFromWorld = worldFromModel.inverse();

const normalizedPosition = ...;
const eyePosition = eyeFromClip.multiplyMatrix(normalizedPosition);
const worldPosition = worldFromEye.multiplyMatrix(eyePosition);
const modelPosition = modelFromWorld.multiplyMatrix(worldPosition);

const eyeFromClip = clipFromEye.inverse();
const worldFromEye = eyeFromWorld.inverse();
const modelFromWorld = worldFromModel.inverse();

const normalizedPosition = ...;
const eyePosition = eyeFromClip.multiplyMatrix(normalizedPosition);
const worldPosition = worldFromEye.multiplyMatrix(eyePosition);
const modelPosition = modelFromWorld.multiplyMatrix(worldPosition);

Observe how the spaces in the names of the matrices flip when inverting them. For example, the inverse of clipFromEye is named eyeFromClip.

We now have a mechanism for answering the question, “Where in the world is the mouse?” Let's examine several ways we can use these pointer events to drive interaction with the scene.

← The Fourth Wall Trackball →

How to 3D

Untransforming

From Pixel Space to Normalized Space

From Normalized Space to Clip Space

From Normalized Space to Earlier Spaces

Inverting a Matrix