Picking by Color
Untransforming involves some intricate mathematics. Sometimes all we want to do is figure out what object was clicked on, an operation called picking. We don't need an exact location of the mouse in 3D space. In such cases, we may be able to take this very different and non-mathematical approach: render each object in a unique color and, on a mouse event, read the color under the mouse to identify the clicked object. The color-identifier rendering is not shown to the viewer.
Click on the spheres in this renderer to change which is selected:
This renderer has two different render methods and two different shaders. The default shader shades the spheres with diffuse lighting. The other shader shades the spheres with a unique shade of red. Toggle the checkbox to see this second method. No lighting is applied; the shade of red must be constant across a sphere to serve as a reliable identifier.
On a mouse event, the scene is first rendered in red. WebGL's readPixels
method is then called in order to get the color of the pixel under the mouse. The pixel's red component is the index of the sphere that was clicked on. The picking logic might look something like this up event listener:
window.addEventListener('pointerup', event => {
renderRed();
const pixel = new Uint8Array(4);
gl.readPixels(event.clientX, canvas.height - event.clientY, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, pixel);
pickedIndex = pixel[0];
render();
});
window.addEventListener('pointerup', event => { renderRed(); const pixel = new Uint8Array(4); gl.readPixels(event.clientX, canvas.height - event.clientY, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, pixel); pickedIndex = pixel[0]; render(); });
This code only works if there are no more than 256 spheres in the scene. That's the most that can be uniquely identified using only the red byte of the color. If we have more than 256 objects, we'll need to use more than just the red channel.
Summary
A renderer without interaction is just a cut scene. To make the viewer feel like a participant, we respond to keyboard and mouse events. An input device's state is often discrete, and we smooth out the abrupt transitions with a low-pass filter before applying the state to the rendering. If a model responds to an input event by changing its animation, we must similarly smoothly blend between the new and old animations. Mouse inputs are directed at particular locations in the scene, but we must figure out what a 2D input event means for a 3D scene. To get a precise location, we must untransform the mouse position back through the transformation pipeline with inverse matrices. Since the mouse doesn't have a z-coordinate, we represent it as a ray and detect its collisions with bounding spheres and boxes. Alternatively, we may render each model in a unique flat color and read the color back to determine what was clicked on.