Gradients are the new intervals

At the New England Symposium on Graphics, James Tompkin compared graphics researchers to magpies: they're easily distracted by shiny objects and pretty renderings.

While this is true, the analogy also holds from a different angle: when I'm reading graphics papers, I'm constantly looking for ideas to steal bring back to my nest.

Researchers at IRIT and Adobe Research recently published a paper that's full of interesting ideas, and I'd like to talk about it.

Picture of the paper, showing the abstract and title

This blog post assumes a vague understanding of implicit surface rasterization, and how interval arithmetic is used to both skip regions of space and simplify complex expressions. See this section of the Fidget writeup for a short introduction, or my colloquium talk for a longer explanation.

Here's the key sentence from the paper's abstract:

We introduce an efficient hierarchical tree pruning method based on the Lipschitz property of SDFs, which is compatible with hard and smooth CSG operators.

In this case, "the Lipschitz property" means that the gradient of the distance value is bounded, e.g. $||\nabla f(x, y, z)|| \lt 1$. You'll also see it referred to as Lipschitz continuity.

Lipschitz continuity provides a bound on the maximum change in the distance value between two points: if you have points $\vec{p}$ and $\vec{q}$ in 3D space, then

$$|f(\vec{p}) - f(\vec{q})| \le ||\vec{p} - \vec{q}||$$

Given this requirement on the distance fields, the authors present two core tricks that can be applied to CSG-tree-flavored models:

These optimizations make rendering dramatically cheaper, allowing for interactive visualizations of complex models.

The first optimization also sounds familiar – I presented something similar back in 2020 (which itself builds on work from 1992):

To render a model, its underlying expression is evaluated in a shallow hierarchy of spatial regions, using a high branching factor for efficient parallelization. Interval arithmetic is used to both skip empty regions and construct reduced versions of the expression. The latter is the optimization that makes our algorithm practical: in one benchmark, expression complexity decreases by two orders of magnitude between the original and reduced expressions.

Here's the big difference: my implementation used interval arithmetic on arbitrary math expressions, while this new paper uses single-point evaluation on Lipschitz-continuous primitives.

Single-point evaluation is cheaper, and also doesn't suffer from the conservative nature of interval arithmetic (where intervals tend to grow over time). However, their shapes are limited to a set of well-behaved primitives and transforms; something as simple as scaling a model can break Lipschitz continuity if not handled carefully.

After thinking about it for a few days, I ended up taking home a slightly different conclusion than the paper presented:

If your distance field is Lipschitz-continuous, then you can use single-point evaluation to get interval-ish results, then apply all the usual tricks which normally require interval arithmetic.

I've now gone far too long without showing a pretty picture (the magpies in the audience are growing rowdy), so let's dig into some examples. I'm going to take this idea — of using single-point evaluations to get pseudo-intervals — and see how it applies to my usual box of tools.

Note that I'm not hewing to the implementation from the paper (which you should go read, it's great!); I'm taking this idea and running in a slightly different direction based on my background knowledge.


Basic rendering

Here's a circle of radius 1, $f(x, y) = \sqrt{x^2 + y^2} - 1$:

Picture of a circle with field lines shown

It's rendered with orange / blue indicating whether we're inside or outside the shape, and field lines showing the distance field values. I'm not sure who developed the original coloring strategy, but it's pervasive on Shadertoy; I borrowed it from Inigo Quilez.

My typical strategy for rendering is to do several rounds of evaluation using interval arithmetic, then pixel-by-pixel evaluation of the remaining ambiguous regions. The goal is to produce a binary image: each pixel is either inside or outside, depending on sign.

Interval evaluation takes intervals for $x$ and $y$, and returns an interval output.
For example, we can evaluate a region in the top-left quadrant:

$$f([0.5, 1.5], [0.5, 1.5]) = [-0.3, 1.12]$$

Picture of a circle with field lines and a rectangle region

Because the interval result is ambiguous (contains 0), we can't prove this region inside or outside the shape, so we subdivide and recurse. If it was unambiguously less than or greater than zero, we could stop processing right away!

The full algorithm produces a set of colored regions (proven inside / outside), and pixel-level results for ambiguous regions below a certain size:

Circle with interval regions shown

Intervals are shown as large uniformly-shaded squares; pixel-by-pixel evaluation is concentrated at the edge of the circle.

Using gradients instead

Our circle model has a gradient of 1 everywhere (proof is left as an exercise to the reader). If we want to evaluate an interval region $[x_\text{min}, x_\text{max}], [y_\text{min}, y_\text{max}]$, we can sample a single point in the center:

$$v = f\left((x_\text{min} + x_\text{max}) / 2, (y_\text{min} + y_\text{max}) / 2\right)$$

From this point, the maximum distance to a corner is given by $$d = \frac{\sqrt{(x_\text{max} - x_\text{min})^2 + (y_\text{max} - y_\text{min})^2}}{2}$$

Because our gradient is bounded, we know that the value anywhere in the region must be in the range $[v - d, v + d]$.

In other words, we can evaluate a single point in the center of the region, then construct a pseudo-interval result by adding the distance to a corner.

Visually, this is like drawing a circle over the rectangular region:

Region with a circle around it

The red box is our target region (i.e. intervals in X and Y); the green elements shows our center point, radius, and effective circle.

This is a drop-in replacement for our previous interval evaluator; we'll call it a "pseudo-interval evaluator". We can run the exact same algorithm to produce an image, swapping in this new evaluator. For the original circle, it produces the same picture:

Pseudo-interval evaluation of the circle SDF

By carefully positioning the circle, we can see a case where pseudo-interval evaluation requires more subdivision (interval evaluation on the left, pseudo-interval evaluation on the right):

Comparison

Because the pseudo-interval considers the green circle (rather than the red square of the interval evaluator), it can't prove this entire region outside of the shape, and therefore has to subdivide it.

Downsides to interval evaluation

There are also cases where the pseudo-interval evaluator has the upper hand!

Consider rotating a model by 45°: this is equivalent to a new evaluation

$$ f_\text{rot}(x, y) = f\left(\frac{x + y}{\sqrt{2}}, \frac{y - x}{\sqrt{2}}\right) $$

The "natural" axes of the model are now at 45° angles from our image axes. Unfortunately, our intervals are in terms of the image's $x$ and $y$ coordinates, which remain horizontal and vertical. During evaluation, this transform enlarges the sampled region by a factor of $\sqrt{2}$:

Rotated rectangle

You can now imagine situations where pseudo-interval evaluation is more precise, because its circle circumscribes the original region:

Rotated rectangle with a circle

Sure enough, there are cases where interval arithmetic recurses down to individual pixels, while pseudo-interval evaluation succeeds in proving regions entirely inside the shape:

Comparison of rotated renderings

More generally, interval evalution gets worse and worse as you stack up more transforms. One way to think about it is that interval arithmetic "forgets" any correlations between its inputs. For example, if you evaluate $x^2$ as $x \times x$ (instead of with a dedicated power operation), interval arithmetic produces a much wider result:

$$[-1, 1]^2 = [0, 1]$$ $$[-1, 1] \times [-1, 1] = [-1, 1]$$

We know that the value must be the same on each side of the multiplication, so it's impossible to get -1 as a result; however, the implementation of multiplication assumes that the two sides are totally uncorrelated.

You can work around this failure mode (to some extent) by automatically detecting and collapsing affine transforms, but you certainly don't get it for free.

Using point samples on a Lipschitz-continuous function means that this problem disappears: we know that the stack of transforms hasn't changed the bounds on the gradient, so any transforms of the model don't affect sampling!

Normalization

Here's a more complex example, built from about 400 math expressions:

SDF reading 'hello world'

Looking at the field lines, it's clear that this is a less well-behaved distance field! Specifically, the gradient at the w is very low; the field lines are extremely stretched out.

Interval evaluation shows the effects of the badly-behaved field: there's a bunch of regions around the w where the renderer recurses farther than necessary.

Interval evaluation of the above SDF

Having a badly-behaved field means that we can't assume Lipschitz continuity. In this model, most of the field has a magnitude $\ge 1$:

Gradient magnitude

Our evaluation is using Fidget, which provides a forward-mode automatic differentiation evaluator. This suggests a dumb potential solution: find the (value, dx, dy, dz) tuple using automatic differentiation, then normalize by dividing the value by the magnitude of the partial derivatives:

Normalized distance field (which is subtly wrong)

The gradient looks vaguely correct – field lines advance at the same rate everywhere – but there's a subtle issue here. Looking above the e, you can spot a dramatic shift in brightness, indicating that the distance field believes it's suddenly much further away from the model. This clearly won't work: a sample taken above the e could falsely report that it's very far from the model.

So, how did this go wrong?

Above the e, two fields are combined with min. We'll call these two fields the $e$ and $w$ fields, based on the letter than generated the field. At the point where the fields are exactly equal, they have the same value but different gradient magnitudes.

$$ \begin{aligned} f(x, y) &= \min(e(x, y), w(e, y)) \\ e(x, y) &= 0.1,\hspace{0.1in}||\nabla e(x, y)|| = 1.2 \\ w(x, y) &= 0.1,\hspace{0.1in}||\nabla w(x, y)|| = 0.5 \\ \end{aligned} $$

Now consider the modified $f_\text{norm}(x, y) = f(x, y) / ||\nabla f(x, y)||$, slightly above and below the $\min$ transition:

$$ \begin{aligned} f_\text{norm}(x, y - \epsilon) &= 0.1 / 1.2 = 0.083 \\ f_\text{norm}(x, y + \epsilon) &= 0.1 / 0.5 = 0.2 \end{aligned} $$

Normalization instroduces a discontinuity in the value here! Even though our gradient is guaranteed to be 1 everywhere (because of normalization), there are instantaneous transitions in the value itself. This means that the Lipschitz property doesn't hold, so we can't use pseudo-interval evaluation.

Normalization, once more

What's to be done?

Non-rigorously, it seems like discontinuities in the gradient magnitude lead to discontinuities in the value after normalization. The only expresions which introduce discontinuities in the gradient magnitude are min and max, which suggests a simple solution: normalize before those operations.

Of course, this changes the values, but it doesn't change their sign. We're using min and max to perform union and intersection operations, so the sign is the only thing that matters.

min(a, b) =>     union(a, b) = min(normalize(a), normalize(b))
max(a, b) => intersect(a, b) = max(normalize(a), normalize(b))

Here's the new normalization:

Normalized distance field (which is fine)

Everything looks good: the field lines are evenly spaced, and there are no discontinuities.

Plugging this into our psuedo-interval evaluator, we can get a perfectly reasonable result. It ends up evaluating slightly more tiles than the interval evaluator, but still produces a correct image:

Hello, world

Expression simplification

Interval evaluation has a second benefit: at each min and max node, we may be able to prove that one branch is always taken within a particular spatial region. If this happens, then we can simplify the shape, generating a shorter expression which is valid within that spatial region (and less expensive to evaluate).

With intervals, we check that they are not overlapping: min([0, 1], [2, 3]) will always pick the left-hand ([0, 1]) argument.

A similar strategy works for Lipschitz-bounded point samples: within a region of radius $r$, the value can only differ by $\pm r$ from a point sample in the center. The equivalent condition for selecting the left-hand argument of a min operation is

$$ \min(a, b) \text{ where } a + r \lt b - r$$

Similar rules apply for the right-hand argument and max expressions.

I find it helpful to visualize how much the expression is simplified at each tile. For our hello, world model, here's a comparison of tape lengths at the final tile, with the original tape being 390 clauses:

Tape lengths in each tile

It's definitely working in both cases! There are a few tiles where they differ, but there's not a clear winner; average tape lengths are 120.2 (for interval evaluation) versus 121.5 (for pseudo-intervals).

Performance impacts

We've presented three different evaluation strategies, which can be used to both prove entire regions inside / outside and provide data for expression simplification.

Interval arithmetic is conservative and works on any kind of distance field. The evaluator has to do extra work for each arithmetic operation to track intervals, compared to floating-point (value-only) evaluation. If many transforms are stacked up, it tends to provide results that are overly broad.

For Lipschitz-continuous distance fields, you can sample at the center of the region, then add the radius to calculate a pseudo-interval. This is a single floating-point evaluation, which is cheaper than interval arithmetic. In addition, it doesn't have issues with stacked transforms.

Finally, for distance fields that are not Lipschitz-continuous, you can evaluate them using forward-mode automatic differentiation, then normalize by dividing by the magnitude of the gradient before min and max nodes, creating a new Lipschitz-continuous field. Like single-point sampling, this doesn't have issues with stacked transforms; however, automatic differentiation is more expensive than floating-point evaluation. Normalization also produces a different numerical result, so this strategy is only valid if you just care about the sign.

It's pretty obvious that single-point sampling will beat interval arithmetic: it's doing less work, and doesn't have issues with overly conservative estimates. (Indeed, the paper shows this in Figure 13)

However, it's not immediately obvious whether interval arithmetic or normalized single-point samples (using forward-mode automatic differentiation) would be more expensive! Obviously, the answer is going to be implementation-dependent, but I can provide one data point.

The Prospero Challenge presents a large expression (7668 clauses), and challenges readers to write an efficient renderer. The underlying expression is not Lipschitz-continuous, so we have to use the normalized evaluator.

Testing these two evaluators on this model, I see the following results:

IntervalNormalized
pseudo-interval
512×51212 ms18 ms
1024×102420 ms28 ms
1536×153629 ms38 ms
2048×204837 ms48 ms

It looks like interval arithmetic is slightly faster in this case, but they're relatively close to each other – there's certainly no order-of-magnitude differences here.

Wrapping up

This has been a fun exercise, and I'd encourage you to go read Lipschitz Pruning: Hierarchical Simplification of Primitive-Based SDFs yourself and see their full system laid out on paper.

Suggested further reading:

I've published the code used to generate figures in this blog post at mkeeter/gradients-and-intervals, along with run.py for post-processing. This is Research Quality Code™, so take it with a grain of salt!