Orthographic Projection

To go from eye space to clip space, we need to designate a chunk of the world and then develop a transformation that squeezes the chunk into the unit box that WebGL expects.

An orthographic projection matrix is built out of a custom box that we define using six eye space distances. Four of the distances make some physiological sense. We decide how far left the eye can see, how far right, how far down, and how far up. The other two parameters describe how far forward the eye starts seeing and how far forward it stops seeing. Without these near and far parameters putting bounds on the z-axis, the depths would span an infinite range.

The chunk of eye space that transforms into the unit box is called the viewing volume. Explore how the six parameters define the viewing volume in this interactive renderer:

The white sphere represents the viewer's eye. The wireframe box reflects the six parameters used to build the viewing volume. Move the box around by editing the inputs. The preview in the bottom-left shows what the viewer sees. Ask yourself a few questions:

How do we show just the right half of the torus?
How do we show just the bottom-left quadrant of the torus?
What happens when the near side of the box is beyond the torus?
What happens when the near or far side slices through the torus?
What happens when the near side of the box is behind the eye?

Recall that in eye space the viewer is located at the origin, looking down the negative z-axis. The six parameters are distances from the eye. That means that positive $\mathbf{near}$ and $\mathbf{far}$ values correspond to negative z-coordinates. Therefore the box spans these intervals:

$$\begin{aligned} x &\rightarrow \mathrm{left} \ldots \mathrm{right} \\ y &\rightarrow \mathrm{bottom} \ldots \mathrm{top} \\ z &\rightarrow -\mathrm{near} \ldots -\mathrm{far} \\ \end{aligned}$$

From these intervals, we compute the box's dimensions:

$$\begin{aligned} \mathrm{width} &= \mathrm{right} - \mathrm{left} \\ \mathrm{height} &= \mathrm{top} - \mathrm{bottom} \\ \mathrm{depth} &= \mathrm{far} - \mathrm{near} \\ \end{aligned}$$

The box must be transformed into the unit box of normalized space. This is done in two steps:

Translate the center of the box to the origin.
Scale the box so it fits into the range [-1, 1] on all dimensions.

To translate the box to the origin, we subtract away the box's midpoint. The midpoint is the average of its coordinates:

$$\begin{aligned} \mathrm{midpoint}_x &= \frac{\mathrm{right} + \mathrm{left}}{2} \\ \mathrm{midpoint}_y &= \frac{\mathrm{top} + \mathrm{bottom}}{2} \\ \mathrm{midpoint}_z &= -\frac{\mathrm{near} + \mathrm{far}}{2} \end{aligned}$$

The transformation that subtracts away the midpoint is a translation represented by this matrix:

$$\begin{bmatrix} 1 & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{2} \\ 0 & 1 & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{2} \\ 0 & 0 & 1 & \frac{\mathrm{near} + \mathrm{far}}{2} \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

After applying this translation, the box is centered around the origin, with half of the box on each side of the origin. That means the box now has the following bounds:

$$\begin{aligned} x &\rightarrow -\frac{\mathrm{width}}{2} \ldots \frac{\mathrm{width}}{2} \\ y &\rightarrow -\frac{\mathrm{height}}{2} \ldots \frac{\mathrm{height}}{2} \\ z &\rightarrow \frac{\mathrm{depth}}{2} \ldots -\frac{\mathrm{depth}}{2} \\ \end{aligned}$$

The near face of the box now intersects the positive z-axis, while the far face intersects the negative z-axis.

The bounds of the unit box into which we are trying to squeeze spans these intervals:

$$\begin{aligned} x &\rightarrow -1 \ldots 1 \\ y &\rightarrow -1 \ldots 1 \\ z &\rightarrow -1 \ldots 1\\\ \end{aligned}$$

Note that WebGL expects the near face to map to -1 and the far face to 1. This is flipped from the eye space convention.

We go from our box to the unit box by dividing by the half-width, half-height, and negative half-depth. This scale matrix performs that division by multiplying by the reciprocals of the half-dimensions:

$$\begin{bmatrix} \frac{2}{\mathrm{width}} & 0 & 0 & 0 \\ 0 & \frac{2}{\mathrm{height}} & 0 & 0 \\ 0 & 0 & \frac{2}{\mathrm{-depth}} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

The z-factor is negated in order to flip the z-coordinates and match what WebGL expects. If only our computer graphics forebears had decided to make the viewer look along the positive z-axis, we wouldn't have to deal with these inconsistencies.

Since both the translation and scaling are represented as 4x4 matrices, we may chain them together as a product:

$$\begin{bmatrix} \frac{2}{\mathrm{width}} & 0 & 0 & 0 \\ 0 & \frac{2}{\mathrm{height}} & 0 & 0 \\ 0 & 0 & \frac{2}{\mathrm{-depth}} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \times \begin{bmatrix} 1 & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{2} \\ 0 & 1 & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{2} \\ 0 & 0 & 1 & \frac{\mathrm{near} + \mathrm{far}}{2} \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

The translation must be applied first, so it goes on the right. Multiplying them through yields this matrix:

$$\begin{bmatrix} \frac{2}{\mathrm{width}} & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{\mathrm{width}} \\ 0 & \frac{2}{\mathrm{height}} & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{\mathrm{height}} \\ 0 & 0 & \frac{2}{\mathrm{-depth}} & -\frac{\mathrm{near} + \mathrm{far}}{\mathrm{depth}} \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

You will often see this matrix expressed strictly in terms of the original six parameters:

$$\begin{bmatrix} \frac{2}{\mathrm{right} - \mathrm{left}} & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{\mathrm{right} - \mathrm{left}} \\ 0 & \frac{2}{\mathrm{top} - \mathrm{bottom}} & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{\mathrm{top} - \mathrm{bottom}} \\ 0 & 0 & \frac{2}{\mathrm{near} - \mathrm{far}} & \frac{\mathrm{near} + \mathrm{far}}{\mathrm{\mathrm{near} - \mathrm{far}}} \\ 0 & 0 & 0 & 1 \end{bmatrix}$$

This is the matrix that turns an orthographic viewing volume into the unit cube. In the renderers we've written so far, we've used an implicit orthographic projection with $\textrm{left} = \textrm{bottom} = \textrm{near} = -1$ and $\textrm{right} = \textrm{top} = \textrm{far} = 1$.

Add this method to your Matrix4 class.

Orthographic projections don't provide the perspective cues that viewers expect as they make sense of a 3D scene. They are mostly used when the viewer needs to precisely manipulate objects on a Cartesian grid, such as in a 3D modeling program or in a game with an isometric view.

← Spaces Non-square Viewports →

How to 3D

Orthographic Projection