Untransforming the Mouse
The mouse skates over the viewport, hovering over specific objects in the scene. Identifying which object or location is under the cursor is a common need in interactive renderers. WebGL won't help much with this task. The burden is ours as graphics developers. In particular, we'll need to work backward through the gauntlet of transformations that put our models on the screen. We must untransform the mouse.
Recall that a vertex passes through this chain of spaces:
A vertex generally moves from one space to another by way of a matrix multiplication. The only exception is the transition from clip space to normalized space, which is the result of the perspective divide. All together, this is the mathematical gauntlet through which each vertex passes:
We've been handling the first three transformations in the vertex shader, and WebGL has been handling the last two. To get the mouse position from pixel space into one of the earlier spaces, we must work backward through these operations, even the ones that WebGL has been handling. Let's step through the untransformation one space at a time.
From Pixel Space to Normalized Space
When a mouse event occurs, we query its coordinates, which are in pixel space, with this event listener:
window.addEventListener('pointerdown', event => {
const mousePixel = new Vector4(
event.clientX,
canvas.height - event.clientY,
0,
1
);
});
window.addEventListener('pointerdown', event => { const mousePixel = new Vector4( event.clientX, canvas.height - event.clientY, 0, 1 ); });
Note that we take the complement of the y-coordinate with respect to the canvas height. This action corrects for the browser's and WebGL's mismatched coordinate systems. In the browser's pixel space, in which clientX
and clientY
are defined, the origin is at the top-left corner with the y-axis pointing down. In WebGL's pixel space, the origin is at the bottom-left with the y-axis pointing up.
We must step backward from pixel space to normalized space. The gauntlet above shows how we step forward via a matrix named \(\mathrm{pixelFromNormalized}\), which is normally handled by WebGL. To step backward, we must undo this matrix multiplication. But what exactly does this matrix do?
The \(\mathrm{pixelFromNormalized}\) matrix turns coordinates in the \([-1, 1]\) range of normalized space into the ranges \([0, \mathrm{width}]\) and \([0, \mathrm{height}]\) of pixel space. It jumps from a normalized x-coordinate to an x-coordinate in pixel space through this sequence of range-mapping operations:
To go from pixel space to normalized space, we apply the inverse operations in reverse order:
The y-coordinate is computed similarly using \(\mathrm{height}\).
Soon we'll learn about an input system called a virtual trackball that operates in normalized space. Since it doesn't need to visit any early spaces, its work of untransforming the mouse position stops at this stage.
From Normalized Space to Clip Space
Sometimes we need to go farther back in the transformation pipeline. Perhaps we are building a 3D modeling tool and want to move a vertex to a new position in model space. Maybe we are trying to place a building or character in world space. To go farther back, we must pass from normalized space into clip space. The forward step from clip space into normalized space is the perspective divide:
In the backward step, we have \(\mathbf{p}_\mathrm{norm}\) and are trying to solve for \(\mathbf{p}_\mathrm{clip}\), so we apply the inverse operation:
This is a problem. We don't have \(w_\mathrm{clip}\), which is the homogeneous coordinate of \(\mathbf{p}_\mathrm{clip}\). If we had the clip space position, we wouldn't be trying to solve for it. Let's postpone resolving this circular dependency. For the moment, we'll pretend that clip space doesn't exist and untransform from normalized space directly into eye space.
From Normalized Space to Earlier Spaces
Untransforming into eye, world, or model space is a matter of undoing the matrix multiplications that got us into the later space. But what does it mean to undo a matrix multiplication? To undo a scalar multiplication, we perform a scalar division. To undo a matrix multiplication, we can't perform a matrix division, as division is not defined for matrices. Instead, we undo by performing another matrix multiplication. In particular, we multiply by the inverse matrix, which applies the opposite transformation. If we multiply a matrix and its inverse together, they cancel each other out and form the identity matrix:
A translation matrix adds an offset to a position vector and has this form:
The opposite or inverse of \(\mathbf{T}\) subtracts the offset and therefore has this form:
A scale matrix multiplies by scale factors and has this form:
The inverse of \(\mathbf{S}\) divides by the scale factors and has this form:
To find the inverse of a rotation matrix, we recall the properties of rotation matrices that we discussed when building our camera abstractions:
- the rows of the matrix are the axes of the incoming space that become the x-, y-, and z-axes of the outgoing space
- the columns of the matrix are the axes of the outgoing space that the x-, y-, and z-axes of the incoming space become
- all rows are perpendicular to each other
Suppose the top row of a rotation matrix \(\mathbf{R}\) is \(\begin{bmatrix}a & b & c\end{bmatrix}\), with the other rows named accordingly:
When we multiply the matrix and the axis \(\begin{bmatrix}a & b & c\end{bmatrix}\), we have:
A normalized vector dotted with itself is 1, which explains the x-component of the product vector. The other rows are perpendicular to the axis, so their dot products are 0.
The inverse matrix, whatever it is, must work in reverse, turning the x-axis back into \(\begin{bmatrix}a & b & c\end{bmatrix}\):
The second fact of rotation matrices tells us that the axis itself forms the first column of this inverse matrix:
The other axes are treated similarly. Thus the inverse of a rotation matrix is just the matrix flipped about its diagonal so the rows become the columns:
The flip of a matrix about its diagonal is called the transpose. So, the inverse of a rotation matrix is its transpose.
In general, point \(\mathbf{p}\) in an earlier space is transformed by matrix \(\mathbf{M}\) into \(\mathbf{p}'\) in a later space. The backward step, which untransforms \(\mathbf{p}'\) into the earlier space, is determined with some algebra:
There's only one hitch. Finding the inverse matrix is only straightforward when a matrix is a pure rotation, translation, or scaling.
Inverting a Matrix
Knowing the inverses of the three standard transformations is of limited use. Normally we are applying a complex chain of translations, scales, and rotations, not just a single transformation. Consider this chain of three transformations:
Suppose we have \(\mathbf{p'}\) and want to solve for \(\mathbf{p}\). We start chipping away at the right-hand side by multiply both sides by the inverse of \(\mathbf{A}\):
After applying a few more inverses, we arrive at \(\mathbf{p}\):
This shows that if we know \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\), then we may compute the inverse of their product by computing the product of their inverses in reversed order. However, we probably don't have our matrices broken down into separate transformations like this. We've been accumulating them up into a single matrix.
What we really need is a magic method that that will invert any invertible matrix without prior knowledge of how it was constructed. Here is that method, offered without explanation or justification:
class Matrix4 {
// ...
inverse() {
let m = new Matrix4();
let a0 = this.get(0, 0) * this.get(1, 1) - this.get(0, 1) * this.get(1, 0);
let a1 = this.get(0, 0) * this.get(1, 2) - this.get(0, 2) * this.get(1, 0);
let a2 = this.get(0, 0) * this.get(1, 3) - this.get(0, 3) * this.get(1, 0);
let a3 = this.get(0, 1) * this.get(1, 2) - this.get(0, 2) * this.get(1, 1);
let a4 = this.get(0, 1) * this.get(1, 3) - this.get(0, 3) * this.get(1, 1);
let a5 = this.get(0, 2) * this.get(1, 3) - this.get(0, 3) * this.get(1, 2);
let b0 = this.get(2, 0) * this.get(3, 1) - this.get(2, 1) * this.get(3, 0);
let b1 = this.get(2, 0) * this.get(3, 2) - this.get(2, 2) * this.get(3, 0);
let b2 = this.get(2, 0) * this.get(3, 3) - this.get(2, 3) * this.get(3, 0);
let b3 = this.get(2, 1) * this.get(3, 2) - this.get(2, 2) * this.get(3, 1);
let b4 = this.get(2, 1) * this.get(3, 3) - this.get(2, 3) * this.get(3, 1);
let b5 = this.get(2, 2) * this.get(3, 3) - this.get(2, 3) * this.get(3, 2);
let determinant = a0 * b5 - a1 * b4 + a2 * b3 + a3 * b2 - a4 * b1 + a5 * b0;
if (determinant != 0) {
let inverseDeterminant = 1 / determinant;
m.set(0, 0, (+this.get(1, 1) * b5 - this.get(1, 2) * b4 + this.get(1, 3) * b3) * inverseDeterminant);
m.set(0, 1, (-this.get(0, 1) * b5 + this.get(0, 2) * b4 - this.get(0, 3) * b3) * inverseDeterminant);
m.set(0, 2, (+this.get(3, 1) * a5 - this.get(3, 2) * a4 + this.get(3, 3) * a3) * inverseDeterminant);
m.set(0, 3, (-this.get(2, 1) * a5 + this.get(2, 2) * a4 - this.get(2, 3) * a3) * inverseDeterminant);
m.set(1, 0, (-this.get(1, 0) * b5 + this.get(1, 2) * b2 - this.get(1, 3) * b1) * inverseDeterminant);
m.set(1, 1, (+this.get(0, 0) * b5 - this.get(0, 2) * b2 + this.get(0, 3) * b1) * inverseDeterminant);
m.set(1, 2, (-this.get(3, 0) * a5 + this.get(3, 2) * a2 - this.get(3, 3) * a1) * inverseDeterminant);
m.set(1, 3, (+this.get(2, 0) * a5 - this.get(2, 2) * a2 + this.get(2, 3) * a1) * inverseDeterminant);
m.set(2, 0, (+this.get(1, 0) * b4 - this.get(1, 1) * b2 + this.get(1, 3) * b0) * inverseDeterminant);
m.set(2, 1, (-this.get(0, 0) * b4 + this.get(0, 1) * b2 - this.get(0, 3) * b0) * inverseDeterminant);
m.set(2, 2, (+this.get(3, 0) * a4 - this.get(3, 1) * a2 + this.get(3, 3) * a0) * inverseDeterminant);
m.set(2, 3, (-this.get(2, 0) * a4 + this.get(2, 1) * a2 - this.get(2, 3) * a0) * inverseDeterminant);
m.set(3, 0, (-this.get(1, 0) * b3 + this.get(1, 1) * b1 - this.get(1, 2) * b0) * inverseDeterminant);
m.set(3, 1, (+this.get(0, 0) * b3 - this.get(0, 1) * b1 + this.get(0, 2) * b0) * inverseDeterminant);
m.set(3, 2, (-this.get(3, 0) * a3 + this.get(3, 1) * a1 - this.get(3, 2) * a0) * inverseDeterminant);
m.set(3, 3, (+this.get(2, 0) * a3 - this.get(2, 1) * a1 + this.get(2, 2) * a0) * inverseDeterminant);
} else {
throw Error('Matrix is singular.');
}
return m;
}
}
class Matrix4 { // ... inverse() { let m = new Matrix4(); let a0 = this.get(0, 0) * this.get(1, 1) - this.get(0, 1) * this.get(1, 0); let a1 = this.get(0, 0) * this.get(1, 2) - this.get(0, 2) * this.get(1, 0); let a2 = this.get(0, 0) * this.get(1, 3) - this.get(0, 3) * this.get(1, 0); let a3 = this.get(0, 1) * this.get(1, 2) - this.get(0, 2) * this.get(1, 1); let a4 = this.get(0, 1) * this.get(1, 3) - this.get(0, 3) * this.get(1, 1); let a5 = this.get(0, 2) * this.get(1, 3) - this.get(0, 3) * this.get(1, 2); let b0 = this.get(2, 0) * this.get(3, 1) - this.get(2, 1) * this.get(3, 0); let b1 = this.get(2, 0) * this.get(3, 2) - this.get(2, 2) * this.get(3, 0); let b2 = this.get(2, 0) * this.get(3, 3) - this.get(2, 3) * this.get(3, 0); let b3 = this.get(2, 1) * this.get(3, 2) - this.get(2, 2) * this.get(3, 1); let b4 = this.get(2, 1) * this.get(3, 3) - this.get(2, 3) * this.get(3, 1); let b5 = this.get(2, 2) * this.get(3, 3) - this.get(2, 3) * this.get(3, 2); let determinant = a0 * b5 - a1 * b4 + a2 * b3 + a3 * b2 - a4 * b1 + a5 * b0; if (determinant != 0) { let inverseDeterminant = 1 / determinant; m.set(0, 0, (+this.get(1, 1) * b5 - this.get(1, 2) * b4 + this.get(1, 3) * b3) * inverseDeterminant); m.set(0, 1, (-this.get(0, 1) * b5 + this.get(0, 2) * b4 - this.get(0, 3) * b3) * inverseDeterminant); m.set(0, 2, (+this.get(3, 1) * a5 - this.get(3, 2) * a4 + this.get(3, 3) * a3) * inverseDeterminant); m.set(0, 3, (-this.get(2, 1) * a5 + this.get(2, 2) * a4 - this.get(2, 3) * a3) * inverseDeterminant); m.set(1, 0, (-this.get(1, 0) * b5 + this.get(1, 2) * b2 - this.get(1, 3) * b1) * inverseDeterminant); m.set(1, 1, (+this.get(0, 0) * b5 - this.get(0, 2) * b2 + this.get(0, 3) * b1) * inverseDeterminant); m.set(1, 2, (-this.get(3, 0) * a5 + this.get(3, 2) * a2 - this.get(3, 3) * a1) * inverseDeterminant); m.set(1, 3, (+this.get(2, 0) * a5 - this.get(2, 2) * a2 + this.get(2, 3) * a1) * inverseDeterminant); m.set(2, 0, (+this.get(1, 0) * b4 - this.get(1, 1) * b2 + this.get(1, 3) * b0) * inverseDeterminant); m.set(2, 1, (-this.get(0, 0) * b4 + this.get(0, 1) * b2 - this.get(0, 3) * b0) * inverseDeterminant); m.set(2, 2, (+this.get(3, 0) * a4 - this.get(3, 1) * a2 + this.get(3, 3) * a0) * inverseDeterminant); m.set(2, 3, (-this.get(2, 0) * a4 + this.get(2, 1) * a2 - this.get(2, 3) * a0) * inverseDeterminant); m.set(3, 0, (-this.get(1, 0) * b3 + this.get(1, 1) * b1 - this.get(1, 2) * b0) * inverseDeterminant); m.set(3, 1, (+this.get(0, 0) * b3 - this.get(0, 1) * b1 + this.get(0, 2) * b0) * inverseDeterminant); m.set(3, 2, (-this.get(3, 0) * a3 + this.get(3, 1) * a1 - this.get(3, 2) * a0) * inverseDeterminant); m.set(3, 3, (+this.get(2, 0) * a3 - this.get(2, 1) * a1 + this.get(2, 2) * a0) * inverseDeterminant); } else { throw Error('Matrix is singular.'); } return m; } }
Add this method to your Matrix4
class.
In summary, we revert positions from a later space into an earlier space by multiplying by matrix inverses. If we have a mouse position in normalized space, we turn it into an eye, world, or model space position with this TypeScript:
const eyeFromClip = clipFromEye.inverse();
const worldFromEye = eyeFromWorld.inverse();
const modelFromWorld = worldFromModel.inverse();
const mouseEye = eyeFromClip.multiplyMatrix(mouseNormalized);
const mouseWorld = worldFromEye.multiplyMatrix(mouseEye);
const mouseModel = modelFromWorld.multiplyMatrix(mouseWorld);
const eyeFromClip = clipFromEye.inverse(); const worldFromEye = eyeFromWorld.inverse(); const modelFromWorld = worldFromModel.inverse(); const mouseEye = eyeFromClip.multiplyMatrix(mouseNormalized); const mouseWorld = worldFromEye.multiplyMatrix(mouseEye); const mouseModel = modelFromWorld.multiplyMatrix(mouseWorld);
Observe how the spaces in the names of the matrices flip when inverting them. For example, the inverse of clipFromEye
is named eyeFromClip
.
We now have a mechanism for untransforming the mouse, or any position or vector. Let's now examine several ways we can use these pointer events to drive interaction with the scene.