Look Matrix

We wish to drop a camera into a virtual world. What information must we provide in order to unambiguously situate it? These two pieces of information probably come to mind first:

The location of the camera
The direction in which the camera is looking

For example, we might position a player at the top of a hill looking down at the charred ruins of a village. Or at the bottom of a mountain looking up to its clouded peak. Or at a table watching the eyes of a shifty card player from a neighboring planet.

Position and focal direction alone are not enough to uniquely specify the view. We don't know if the player has their feet on the ground or is hanging like a bat. To clarify which view we want, we must provide one additional piece of information:

Which way is up

With these three pieces of information, all of which are in world space coordinates, we build a matrix that transforms the world into the camera's line of sight. Let's call this the look matrix and provide a function with this interface to build it:

class Matrix4 {
  look(from: Vector3, forward: Vector3, worldUp: Vector3) {
    // build a matrix
  }
}

class Matrix4 {
  look(from: Vector3, forward: Vector3, worldUp: Vector3) {
    // build a matrix
  }
}

The from parameter places the camera, the normalized forward direction aims it, and the normalized worldUp direction spins the camera around its focal direction.

The matrix that transforms world space into eye space is composed of two operations: a translation that first puts the camera at the origin, and a rotation that then swings the focal direction to the negative z-axis. These two matrices combine to form the eyeFromWorld matrix.

The translation matrix must turn the camera's world space position into $\begin{bmatrix}0&0&0\end{bmatrix}$, which it does by subtracting away the camera's position.

The rotation matrix is more involved. To construct it, we must be aware of these helpful properties of all rotation matrices:

The first row of a rotation matrix corresponds to the axis of the incoming space that will become the x-axis of the outgoing space. For example, if we want the world vector $\begin{bmatrix}a&b&c\end{bmatrix}$ to become eye vector $\begin{bmatrix}1&0&0\end{bmatrix}$, then we'd form this rotation matrix:

$$\begin{bmatrix} a & b & c & 0 \\ \ldots & \ldots & \ldots & 0 \\ \ldots & \ldots & \ldots & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$$

The second row of a rotation matrix corresponds to the axis of the incoming space that will become the y-axis of the outgoing space.
The third row of a rotation matrix corresponds to the axis of the incoming space that will become the z-axis of the outgoing space.
All rows must be pendendicular to each other. Mathematicians call a set of independent perpendicular vectors a basis. Basis vectors form the axes of a coordinate system. This fact is important because if we have two rows of the matrix figured out, we can automatically compute the third as their cross product.

We derive the rows of the rotation matrix using these facts and the three parameters to look. Let's start with third row. The world vector that maps to eye vector $\begin{bmatrix}0&0&1\end{bmatrix}$ is the normalized vector that leads from the object of focus to the camera's position. We already have the forward vector, which runs from the camera to the object of focus. It maps to the eye vector $\begin{bmatrix}0&0&-1\end{bmatrix}$. We drops its inverse, the backward vector, in our matrix:

$$\begin{bmatrix} ? & ? & ? & 0 \\ ? & ? & ? & 0 \\ -\mathrm{forward}_x & -\mathrm{forward}_y & -\mathrm{forward}_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$$

We also need the right vector, which is the world vector that becomes eye vector $\begin{bmatrix}1&0&0\end{bmatrix}$. It drops into the first row of the matrix. This vector aligns with the viewer's outstretched right arm. At first blush, the parameters given to look don't seem to offer much information about this right direction. However, we do know the forward and up directions. The right vector is perpendicular to both of these. If we cross them and normalize the result, we'll have our right vector.

Our rotation matrix is a little more complete:

$$\begin{bmatrix} \mathrm{right}_x & \mathrm{right}_y & \mathrm{right}_z & 0 \\ ? & ? & ? & 0 \\ \mathrm{-forward}_x & \mathrm{-forward}_y & \mathrm{-forward}_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$$

The only row missing is the world vector that becomes eye vector $\begin{bmatrix}0&1&0\end{bmatrix}$. One of the parameters to our look function is the world's up vector, so it must be the one to form the middle row, right? No, not always.

What if the player is standing at the top of the hill looking down into the village below? The world's up vector is probably $\begin{bmatrix}0&1&0\end{bmatrix}$. But if the player is looking down, then the forward and up vectors are not perpendicular. In rotation matrices, all rows must be perpendicular. Generally we can't use worldUp in our matrix. We treat it as a general notion of upness, but not as a basis vector.

The up vector that goes in our matrix must be perpendicular to the right and forward vectors we already have. We cross them to get the mathematically correct up vector.

The camera's up vector forms the second row of our matrix:

$$\begin{bmatrix} \mathrm{right}_x & \mathrm{right}_y & \mathrm{right}_z & 0 \\ \mathrm{up}_x & \mathrm{up}_y & \mathrm{up}_z & 0 \\ \mathrm{backward}_x & \mathrm{backward}_y & \mathrm{backward}_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$$

This rotation matrix effectively swings the world so that any objects along the focal direction fall onto the negative z-axis of eye space. When combined with the translation matrix described earlier, we have our eyeFromWorld matrix.

Assemble all the code snippets from this page into a complete Matrix4.look method in your lib/matrix.ts.

← Abstracting the Eye First-person Camera →

How to 3D

Look Matrix