## A simple path tracer on the GPU

There comes a time in any graphics programmer's life that the siren song becomes too loud to endure:

"You should write a raytracer"

For me, the moment came after writing a fragment-shader raytracer in Futureproof: it was a terrible raytracer, but a lot of fun to write.

After a few weeks down the rabbit hole, I'm proud to present `rayray`: a tiny, slightly less-terrible GPU raytracer.

To start, here are renderings of two classic images: a Cornell Box and the test scene from Ray Tracing in One Weekend.

(Click for high resolution, and keep scrolling for more pretty pictures)

## Architecture

`rayray` is a traditional forward path tracer. It casts rays from each pixel in the image, which scatter until they hit a light or exceed some max number of bounces. Averaging over thousands of samples creates the final image.

Drag the slider to compare 100 vs. 1000 samples while observing the orb:

The architecture is inspired by Ray Tracing in One Weekend, but unlike the reference implementation, it runs completely on the GPU!

This require a data-driven design (rather than using inheritance), and leads to dramatically faster rendering.

## All you need is `trace(...)`

The core of my raytracer is a function with the signature

``````bool trace(inout uint seed, inout vec3 pos, inout vec3 dir, inout vec4 color);
``````

This function casts a ray to the next (nearest) object in the scene, starting at `pos` and traveling in direction `dir`. After finding this object, the ray scatters depending on material, modifying `pos`, `dir`, and `color`. It returns a flag indicating whether to terminate (if the ray hit a light or escaped the world).

This makes the actual raytracing function incredibly clean:

``````#define BOUNCES 6
vec3 bounce(vec3 pos, vec3 dir, inout uint seed, vec4 color) {
for (int i=0; i < BOUNCES; ++i) {
// Walk to the next object in the scene, updating the system state
// using a set of inout variables
if (trace(seed, pos, dir, color)) {
return color.xyz;
}
}
return vec3(0);
}
``````

The grid of images below shows `pos`, `dir`, and `color` after one call to `trace(...)`, then the result of a single `bounce(...)` sample.

Pixels which were lucky enough to hit the light are colored based on their paths, and everything else is black. Over a few thousand samples, things add up to the correct image!

## Rendering pipelines

Of course, we haven't yet specified how `trace(...)` is actually implemented!

While a scene is being edited, we use a preview implementation of `trace(...)`, which is compiled once (at startup). This implementation renders an encoded scene from a storage buffer. This means that updates to the scene are cheap: they're just a buffer write operation.

The scene is packed into a `vec4` array using a custom binary format:

``````// Header
[0] number of shapes | 0 | 0 | 0

// Shapes
[1] shape type | shape data offset | material data offset | material type
[2] shape type | shape data offset | material data offset | material type
[3] shape type | shape data offset | material data offset | material type
...

// Data section
[.] material data
...
[.] shape data
...
``````

For example, encoding a scene with two spheres – one a diffuse red material, and the other a white light – takes 6 slots in the array, or 96 bytes:

``````// Header
[0] 2 | 0 | 0 | 0 // There are 2 shapes

// Shapes
[1] SHAPE_SPHERE | 5 | 3 | MAT_DIFFUSE
[2] SHAPE_SPHERE | 6 | 4 | MAT_LIGHT

// Data section
[3] 1.0 | 0.5 | 0.5 | 0.0 // Red (for diffuse material)
[4] 1.0 | 1.0 | 1.0 | 0.0 // White (for light)
[5] 0.0 | 0.0 | 0.0 | 0.5 // Sphere [x, y, z], r
[6] 0.5 | 0.5 | 0.0 | 0.3 // Sphere [x, y, z], r
``````

The preview shader implements `trace(...)` as a tiny scene interpreter. It iterates over each shape in the scene, finding the first hit along the current ray and updating the ray's `pos`. Then, that shape's material is used to update the ray's `dir` and `color`.

After the scene has stabilized (defined as no user interaction for one second), the application builds an optimized kernel, which encodes the scene data directly. This takes longer to build, because it's building a full compute pipeline, but is much faster to evaluate.

The generated shader unrolls the shape-hit loop, then packs normal and material calculations directly into a series of `switch` statements. Compare the interpreter against one example of the generated `trace(...)`.

Building a scene-specific shader makes a big difference! Testing across various scenes, I see a 2-6x speedup between the interpreter and the scene-optimized pipeline.

## Focal blur

Focal blur is accomplished by jittering the ray at the camera plane, while maintaining focus at a particular distance:

This is another case where accumulating random samples just works: by picking a random jitter for each sample, we can get a lovely depth-of-field effect.

Antialiasing works the same way: every pixel is jittered within a 1-pixel radius from its nominal position, which smooths out edges.

## Spectral rendering

The final chapter of Ray Tracing: The Rest of Your Life suggests

If you want to do hard-core physically based renderers, convert your renderer from RGB to spectral. I am a big fan of each ray having a random wavelength and almost all the RGBs in your program turning into floats. It sounds inefficient, but it isn't!

This sounded like fun, so I built a spectral mode into `rayray` (invoked by the `-s` command-line argument).

On each frame, the GPU uniformly selects a wavelength between 400 and 700 nm, converts it to an RGB value, then renders as before, with the wavelength stored in the `w` coordinate of the `vec4 color`.

(This is why `bounce(...)` takes a `vec4 color` but only returns a `vec3`)

In spectral mode, glass materials use the `w` coordinate to tweak their index of refraction, to imitate real-world dispersion.

This means we can make a prism:

(Aside: Rendering a 2D scene with a 3D raytracer was challenging, as rays scattered randomly off the rear plane were unlikely to hit the narrow light. I fixed this using a special material which scatters light perpendicular to its normal; this means after the first bounce, all of the rays are scattered in `XY` and are more likely to hit the prism and light)

One expected-but-neat side effect is chromatic aberration. Compare the glass sphere with normal and spectral rendering:

In the spectral image, the refracted blue sphere has blue and red distortions around the edges, and the caustic shows a faint rainbow.

## Editor GUI

Like all good graphics tools, `rayray` integrates Dear ImGui for a debug UI.

The UI is integrated into the raytracer's scene representation, so changes are instantly reflected in the image:

At the time of implementation, there was no canonical WebGPU backend for Dear ImGui, so I wrote one (there's now an experimental implementation in the main repository).

The editor is extremely rough – it's mostly useful for tweaking parameters of scenes defined in code, then copying those parameters back into the original source. Still, being able to drag colors around in real-time is entertaining!

## Importing `.mol` files

In search of more interesting models, I discovered MolView, which is an online database of chemicals. Each chemical can be downloaded as a `.mol` file, so I wrote a small importer which converts into a `rayray` scene.

For example, caffeine:

## Aside: RNG on the GPU

A raytracer needs a good random number generator (RNG) to model ray scatter off a diffuse material (among other things).

As it turns out, RNGs on the GPU are a fun and contentious topic.

You'll often see the one-liner

``````float rand(vec2 co) {
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}
``````

I have philosophical objections to this function, because it only works once, and only in a 2D grid. For raytracing, each ray requires a stream of random values as it bounces around the scene. You'd need some way to adjust `co.xy` each time `rand(...)` is called, and it would be easy to accidentally add a pattern to the noise.

Ideally, we'd have a RNG which takes a seed and generates a continuous stream of random values, ad infinitum. With that in mind, I decided to use a hash-based approach, where the seed is a `uint` transformed repeatedly by some hash function.

These two blog posts informed my final strategy:

• Generate an initial `uint` seed with a slow-but-good hash function
• Use a fast hash function to modify the seed and get the next random value
• Bithack the value from a `uint` directly into a `float`

Here's my slow hash function, used to generate the initial seed:

``````// Jenkins hash function, specialized for a uint key
uint hash(uint key) {
uint h = 0;
for (int i=0; i < 4; ++i) {
h += (key >> (i * 8)) & 0xFF;
h += h << 10;
h ^= h >> 6;
}
h += h << 3;
h ^= h >> 11;
h += h << 15;
return h;
}
``````

(Note that this is slightly different from the "Jenkins' OAT algorithm" in the second blog, which doesn't handle each byte; mine is faithful to the canonical definition)

Applying this hash function to values 0 through 65535, we see the following:

This seems suitably random as a per-pixel seed value!

My fast hash is based on a linear congruential generator using a particular magic number from the literature: it's simply `seed = 0xadb4a92 * seed + 1`.

Here's what repeated iterations look like, starting from `seed = 1` and mapping the top 24 bits to RGB values:

As you can see, this is pretty good: there aren't any obvious visual patterns or repetition.

It's important to use the high bits, because the low bits show more obvious patterns. Here's what the bottom 24 bits look like:

There's a clear cycle in the lowest 8 bits, visible in the blue channel!

Here's the complete `rand(...)` function, including bithacking to convert the highest 23 bits of the hashed seed directly into a float mantissa:

``````// Returns a pseudorandom value between -1 and 1
float rand(inout uint seed) {
// 32-bit LCG Multiplier from
// "Computationally Easy, Spectrally Good Multipliers for
//  Congruential Pseudorandom Number Generators" [Steele + Vigna]
seed = 0xadb4a92d * seed + 1;

// Low bits have less randomness [L'ECUYER '99], so we'll shift the high
// bits into the mantissa position of an IEEE float32, then mask with
// the bit-pattern for 2.0
uint m = (seed >> 9) | 0x40000000u;

float f = uintBitsToFloat(m);   // Range [2:4]
return f - 3.0;                 // Range [-1:1]
}
``````

This function is tuned to return values in the range -1 to 1, to save an extra scaling step.

I came up with one neat trick myself: By specifying `seed` as an `inout` value, the function returns random floats while mutating the seed automatically, so you can call `rand(seed)` repeatedly to get your stream of values. For example, generating a random `vec3` is simply

``````vec3 rand3(inout uint seed) {
return vec3(rand(seed), rand(seed), rand(seed));
}
``````

Finally, we generate our initial seed based on the sample count (which changes every frame) and the invocation ID (which changes for each pixel):

``````void main() {
// Set up our random seed based on the frame and pixel position
uint seed = hash(hash(u.samples) ^ hash(gl_GlobalInvocationID.x));
...
}
``````

## Performance

Performance data was captured on 2017 Macbook Pro, with a Radeon Pro 560 GPU and a 2.9 GHz Intel Core i7 CPU.

Two figures are presented: steady-state absolute performance (in mega-rays per second), and equivalent speed when rendering a 1200 x 1200 image.

SceneAbsolute speed
(Mray/sec)
12002 image
(frames/sec)
Cornell Box 304211
Ray Tracing in One Weekend 19.113.3
the orb 3322
Golden sphere grid 8660
Prism 955663
Caffeine molecule 193134

To compare, the reference implementation for Ray Tracing in One Weekend takes about 577 seconds to render a 1200 x 1200 image at 500 samples per pixel, (max depth of 6 bounces, compiled with `-O3`, output disabled).

`rayray` takes 43.4 seconds to render the same image, which is a 13x speedup (even including startup and shader compilation time).

Of course, the reference implementation isn't optimized for performance – but then again, neither is `rayray`!

## Infrastructure

Like Futureproof, `rayray` is built on a stack of (unnecessarily) modern tools and technologies:

While building the scene-specific pipeline, I ran into a pathological complexity explosion in SPIRV-Cross. Coincidentally, it had been fixed a few days earlier, so I worked to push that fix all the way upstream to `wgpu-native` (and also `wgpu-rs`, to be polite).

This was a very positive experience: the `gfx-rs` and `wgpu` folks were consistently responsive and helpful, so props to that community.

(I also found a bug in `cbindgen` along the way, which was promptly fixed!)

## Future plans

Like Futureproof, this was a toy project / proof of concept, and I'm not planning to maintain it into the future.

As always, the code is on Github, and I'd be happy to link any fork which achieves critical momentum.

Now that `naga` is released, it may be time to transition back to Rust for graphics work – my motivation for using Zig was seamless interoperability with C libraries (mainly `shaderc`), and that's not longer needed in `wgpu` 0.7.0.