At last year's SGP conference, researchers presented an interesting method for meshing NURBS surfaces. This piqued my interest, since I had written a NURBS mesher earlier that year (as part of Foxtrot, a STEP file viewer).

On a long train ride, I spent some time trying to understand the paper. I quickly realized that I'd need to go upstream: the NURBS meshing paper was the the latest in a series of related papers, and heavily relied on previous works for geometric infrastructure.

The lineage is roughly

This year, as part of my traditional Winter Break Graphics Hackathon, I dug into the first three papers. At the end, I could successfully raytrace the Utah Teapot directly from its Bézier patches:

I gave running commentary in a Twitter thread; this writeup goes into more depth, and should be helpful for others seeking to understand this line of research.

## The implicit matrix representation

Let's start in the middle.

These papers describe a way of converting parametric curves and surfaces into a function which converts a point $(x, y, z)$ into a matrix $M(x, y, z)$.

The matrix $M(x, y, z)$ is a linear combination of constant matrixes, e.g.

$$M(x, y, z) = M_0 + x M_X + y M_Y + z M_z$$

This matrix has an interesting property: its rank drops by one when you're exactly on the curve or surface!

Consider the cubic Bézier curve with sample points $\left[[0,0], [1,2], [2,1], [3,3]\right]$:

Through a mysterious process (to be explained later), we can calculate the $M$ matrices. They're not particularly pretty:

$$M_0 = \left[\begin{array}{cccccc} 0 & 0 & 0 \\ -0.47130973& -0.69441025& 0.27461067 \\ 0.50379062& -0.58034736& -0.5515659 \\ \end{array}\right]$$

$$M_X = \left[\begin{array}{cccccc} -0.25848205& 0.25319592& -0.65863269\\ -0.07003399& 0.27575858& -0.11849897 \\ -0.45427448& 0.088577 & -0.05392415\\ \end{array}\right]$$

$$M_Y = \left[\begin{array}{cccccc} 0.28634427& 0.10487212& 0.23777945\\ 0.28634427& 0.10487212& 0.23777945 \\ 0.28634427& 0.10487212& 0.23777945\\ \end{array}\right]$$

(since this is 2D, we're ignoring $M_Z$)

With these matrices, we can evaluate $M(3, 3)$, which should be on the curve, then look at its singular values (using the SVD):

$$\sigma(3, 3) = [1.6725, 0.76772, 2.77\times10^{-16}]$$

This seems to work: the last singular value is very close to zero. To confirm, we'll check a point that's off the curve:

$$\sigma(3, 4) = [1.6067, 1.183, 0.153]$$

In this case, the singular values are all non-zero, meaning the matrix has full rank. This is what we expect for a point that's not on the curve!

To go from singular values to a single pseudo-distance, we multiply them together.

\begin{aligned} \prod_i \sigma_i(3, 3) &= 3.57 \times 10^{-16} \\ \prod_i \sigma_i(3, 4) &= 0.29 \end{aligned}

As expected, this pseudo-distance goes to 0 on the curve.

Here's what pseudo-distance looks like around the curve (shown in white):

That the values go to zero is very obvious when plotted in log scale:

## Building the matrix representation

Now, let's take a step back: where does $M(x,y,z)$ come from?

Remember that we're dealing with parametric curves and surfaces. In the general case, we're taking some parameter $s$ (where $s$ is one-dimensional for a curve and $s = (s_1, s_2)$ for a surface), and generating a rational curve or surface

$$\left( \frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)} \right)$$

In our example above, we're generating a 2D Bézier curve, which means that we have a 1D parameterization with $f_0(s) = 1$ and $f_3(s) = 0$.

Consider a second set of functions $g_{0,1,2,3}(s)$ with the property

$$\sum_{i=0}^3 g_i(s)f_i(s) = 0$$

Right now, the $g_i(s)$ functions are underdefined; we don't know exactly what they are, just that the above property must hold for a given set of $f_i(s)$ functions.

Now, let's use the $g_i(s)$ functions to build a new function, which is parameterized by $s$ and evaluated at some point in space $\left(x, y, z\right)$

$$L(s; x, y, z) = g_0(s) + xg_1(s) + yg_2(s) + zg_3(s)$$

Again, $L$ is not a specific function; it's just a particular way of combining a set of $g_i(s)$ functions. However, it already has an interesting property: what happens if we evaluated it at a point on the curve or surface?

\begin{aligned} L\left(s; \frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)}\right) &= g_0(s) + \frac{f_1(s)}{f_0(s)}g_1(s) + \frac{f_2(s)}{f_0(s)}g_2(s) + \frac{f_3(s)}{f_0(s)}g_3(s) \\ &= \frac{g_0(s)f_0(s) + f_1(s)g_1(s) + f_2(s)g_2(s) + f_3(s)g_3(s)}{f_0(s)} \\ &= \frac{0}{f_0(s)} \\ &= 0 \end{aligned}

That's neat: $L(s; x, y, z)$ is 0 at the point on the surface given by $s$!

The paper gives more intuition for the behavior of $L$ by noting that once we've locked down a particular value for $s$, $L_s(x, y, z)$ defines a plane in 3D. From our observation above, that plane must pass through the point on the surface given by $s$.

Let's jump ahead very briefly, so as to break up the wall of math with a picture. In our Bézier curve example, we end up with three $L(s; x, y)$ functions which each define a line in 2D for a particular value of $s$. Here's what they look like as you sweep the value of the curve parameter $s$, with the point on the curve shown as a black dot:

The three lines always intersect at the point on the curve given by a particular $s$!

Okay, back to math! We've discussed a set of four functions $g_{0,1,2,3}(s)$, given then a particular property, and combined them into a new function $L(s; x, y, z)$.

Now, we'll lock things down a little more.

Consider a set of basis functions $\left(\psi_1(s), \psi_2(s), ..., \psi_{m_\nu}(s)\right)$, where $m_\nu$ is the number of basis functions of degree $\leq \nu$, and $\nu$ is an arbitrarly chosen degree.

We can construct $g_i(s)$ as a weighted sum of these basis functions, e.g.

$$g_0(s) = \sum_{i=1}^{m_\nu}\lambda_{0,i}\psi_i(s)$$

Because all of the $g_i(s)$ functions are using the same basis functions, we can represent their combination as

$$L(s; x, y, z) = \sum_{i=1}^{m_\nu}\left(\lambda_{0,i} + x\lambda_{1,i} + y\lambda_{2,i} + z\lambda_{3,i} \right) \psi_i(s)$$

We will collapse the weighted sum into its own function

$$\Lambda_i(x, y, z) = \lambda_{0,i} + x\lambda_{1,i} + y\lambda_{2,i} + z\lambda_{3,i}$$

(Did you know that $\Lambda$ is a capital $\lambda$? I didn't until just now; it's all Greek to me!)

Now, we can rewrite our previous function: $$L(s; x, y, z) = \sum_{i=1}^{m_\nu}\Lambda_i(x, y, z) \psi_i(s)$$

For reasons that will soon become clear, we can also write this as a matrix multiplication (albeit one where it's a single row multiplied by a single column):

$$\left[\begin{array}{cccc} \psi_1(s) & \psi_2(s) & ... & \psi_{m_\nu}(s) \end{array} \right] \left[\begin{array}{c} \Lambda_1(x, y, z) \\ \Lambda_2(x, y, z) \\ ... \\ \Lambda_{m_\nu}(x, y, z) \\ \end{array}\right] = \left[L(s; x, y, z)\right]$$

Let's take a step back: we've assumed the existance of a quartet of functions $g_{0,1,2,3}(s)$, given them certain properties, and built this infrastructure on top of them.

However, we haven't constrained them enough that there's only a single set of $g$ functions for a given shape. You can imagine generating multiple sets of $g$ functions, e.g.

\begin{aligned} &\left(g_{0,1}(s), g_{1,1}(s), g_{2,1}(s), g_{3,1}(s)\right) \\ &\left(g_{0,2}(s), g_{1,2}(s), g_{2,2}(s), g_{3,2}(s)\right) \\ &... \\ &\left(g_{0,r_\nu}(s), g_{1,r_\nu}(s), g_{2,r_\nu}(s), g_{3,r_\nu}(s)\right) \end{aligned}

Here, we're generating $r_\nu$ sets of $g$ functions, $r_\nu$ being another mystery value that we'll explain later.

Now that we've got multiple sets of $g$ functions, we can use the rest of our infrastructure and build a big matrix of $\Lambda$ values. As shorthand, let's just call it $M(x, y, z)$:

$$M(x, y, z) = \left[\begin{array}{cccc} \Lambda_{1,1}(x,y,z) &\Lambda_{1,2}(x,y,z) & ... & \Lambda_{1,r_\nu}(x,y,z) \\ \Lambda_{2,1}(x,y,z) &\Lambda_{2,2}(x,y,z) & ... & \Lambda_{2,r_\nu}(x,y,z) \\ ... & ... & ... & ... \\ \Lambda_{m_\nu,1}(x,y,z) &\Lambda_{m_\nu,2}(x,y,z) & ... & \Lambda_{m_\nu,r_\nu}(x,y,z) \\ \end{array}\right]$$

But wait! That's a confusing choice of variable, because we used $M$ as our "magic matrix with drop-of-rank property" earlier.

Well, surprise! This is our magic matrix with drop-of-rank property!

Let's see why. As before, we can multiply by the basis functions for $g$ to generate a family of functions $L(s; x, y, z)$:

\begin{aligned} &\left[\begin{array}{cccc} \psi_1(s) & ... & \psi_{m_\nu}(s) \end{array} \right] M(x, y, z) \\ = &\left[\begin{array}{cccc} \psi_1(s) & ... & \psi_{m_\nu}(s) \end{array} \right] \left[\begin{array}{cccc} \Lambda_{1,1}(x,y,z) & ... & \Lambda_{1,r_\nu}(x,y,z) \\ ... & ... & ... \\ \Lambda_{m_\nu,1}(x,y,z) & ... & \Lambda_{m_\nu,r_\nu}(x,y,z) \\ \end{array}\right] \\ = &\left[\begin{array}{ccc}L_1(s; x, y, z)& ... & L_{m_\nu}(s; x, y, z)\end{array}\right] \end{aligned}

Now we can see where the drop-of-rank property comes from. Consider evaluating $M$ at a point on the surface given by the parameter $s$:

$$M\left(\frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)}\right)$$

From all that has come before, we know that

\begin{aligned} & \left[\begin{array}{cccc} \psi_1(s) & ... & \psi_{m_\nu}(s) \end{array} \right] M\left(\frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)}\right) \\ &= \left[\begin{array}{ccc}L_1(s; \frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)})& ... & L_{m_\nu}(s; \frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)})\end{array}\right] \\ &= \left[\begin{array}{ccc}0 & ... & 0\end{array}\right] \end{aligned}

You can think of this multiplication as building a linear combination of the matrix rows, weighted by $\psi_1(s)...\psi_{m_\nu}(s)$.

It's trivial that there exists a linear combination of the matrix rows which sums to zero, because we just found it.

The fact that such a combination exists means that the rows aren't linearly independent, meaning the rank of the matrix isn't full!

(This explanation is a bit handwavey, because it also depends on $m_\nu$ and $r_\nu$, but this is at least the right intuition; see Section 3.2 of the paper for details)

## Building the matrix, part 2

At this point, we have a recipe for building this useful matrix $M(x, y, z)$ for a given curve or surface defined by

$$\left( \frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)} \right)$$

You need to find $r_\nu$ sets of functions $g_{0,1,2,3}(s)$ with the property

$$\sum_{i=0}^3 g_i(s)f_i(s) = 0$$

where each $g_i(s)$ is a weighted sum of $m_\nu$ basis functions $\psi_1(s), ..., \psi_{m_\nu}(s)$

Then, you collect the weights into a big matrix, and you're done!

Of course, this is one of those "now you have two problems" situations: how, exactly, are we going to find these sets of $g$ functions?

It depends on what kind of model you're dealing with, and the paper presents strategies for a few common geometric representations. Let's work through section 2.1, which explains how to solve the problem for rational Bézier curves.

For Bézier curves, $f$ will be using Bernstein polynomials as its basis functions; we'll declare that $g$ will do the same.

There are $d + 1$ Bernstein polynomials of degree $d$, numbered $0$ through $d$. The $i$'th Bernstein polynomial of degree $d$ has equation

$$B_i^d(s) = {d \choose i} t^i(1 - t)^{d - i}$$

where $0 \leq i \leq d$ and $d \choose i$ is the binomial coefficient

$${d \choose i} = \frac{d!}{i!(d - i)!}$$

In other words,

$$f_0(s) = \sum_{i=0}^d w_{0,i} B^d_i(s)$$

where $w_{0,i}$ is the $i$th weight in $f_0(s)$.

We'll use $\nu$ as the max degree for $g$ functions, e.g.

$$g_0(s) = \sum_{j=0}^\nu \lambda_{0,j} B^\nu_j(s)$$

where $\lambda_{0,j}$ is the $j$th weight in $g_0(s)$.

At this point, we know the values for weights in $f_{0,1,2,3}(s)$, since that's our curve, but we don't know the $\lambda_{...}$ weights for $g$.

Want to learn a cool fact about Bernstein polynomials? If you multiply them, the result will be a scaled Bernstein polynomial with a degree that's the sum of the other two degrees.

Let's walk through that math:

\begin{aligned} B^d_i(s)B^\nu_j(s) &= {d \choose i} t^i(1 - t)^{d - i} {\nu \choose j} t^j(1 - t)^{\nu - j} \\ &= {d \choose i}{\nu \choose j} t^{i + j}(1 - t)^{d + \nu - (i + j)} \\ &= {d \choose i}{\nu \choose j}\frac{B^{d + \nu}_{i + j}(s)}{{d + \nu}\choose{i + j}} \end{aligned}

This means that because $f(s)$ is built from $B^d(s)$ and $g(s)$ is built from $B^\nu(s)$, their product $f(s)g(s)$ can be expressed in the basis $B^{d + \nu}(s)$.

Now, we're going to do a lot of math. Consider multiplying one of our $f(s)$ functions by a polynomial from $B^\nu$:

$$B_j^\nu(s) f_0(s) = \sum_{i=0}^d w_{0,i} {d \choose i}{\nu \choose j}\frac{B^{d + \nu}_{i + j}(s)}{{d + \nu}\choose{i + j}}$$

Let's collect the modified weight factor as capital $W$:

$$W_0(i,j) = \frac{w_{0,i}{d \choose i}{\nu \choose j}}{{d + \nu} \choose {i + j}}$$

meaning our previous equation becomes

$$B_j^\nu(s) f_0(s) = \sum_{i=0}^d W_0(i,j)B^{d + \nu}_{i + j}(s)$$ We can rewrite this as a very awkward matrix multiplication:

$$B_j^\nu(s) f_0(s) = \left[\begin{array}{cccc} B_j^{d + \nu}(s) & B_{j + 1}^{d + \nu}(s) & ... & B_{j + d}^{d + \nu}(s) \end{array} \right]\left[\begin{array}{c} W_0(0, j) \\ W_0(1, j) \\ ... \\ W_0(d, j) \\ \end{array}\right]$$

To keep the row matrix the same for all $j$, we'd like to make it of length $d + \nu$. This means that we'll need to pad the column vector with zeros:

\begin{aligned} B_0^\nu(s) f_0(s) &= \left[\begin{array}{cccc} B_0^{d + \nu}(s) & B_{1}^{d + \nu}(s) & ... & B_{d + \nu}^{d + \nu}(s) \end{array} \right]\left[\begin{array}{c} W_0(0, 0) \\ W_0(1, 0) \\ W_0(2, 1) \\ ... \\ W_0(d, 0) \\ 0 \\ 0 \\ ... \end{array}\right] \\ B_1^\nu(s) f_0(s) &= \left[\begin{array}{cccc} B_0^{d + \nu}(s) & B_{1}^{d + \nu}(s) & ... & B_{d + \nu}^{d + \nu}(s) \end{array} \right]\left[\begin{array}{c} 0 \\ W_0(0, 1) \\ W_0(1, 1) \\ ... \\ W_0(d, 1) \\ 0 \\ ... \\ \end{array}\right] \end{aligned} ...and so on for $B_2^\nu(s)f_0(s)$ through $B_\nu^\nu(s)f_0(s)$.

Now, let's build out such a matrix for all values of $j$ between 0 and $\nu$, which we'll call $\mathbb{S}_{0,\nu}$:

$$\mathbb{S}_{0,\nu} = \left[\begin{array}{cccc} W_0(0, 0) & 0 & 0 & ...\\ W_0(1, 0) & W_0(0, 1) & 0\\ ... & ... & ... & ... \\ W_0(d, 0) & W_0(d - 1, 1) & ... & ...\\ 0 & W_0(d, 1) & ... \\ ... & ... & ... & ... \\ 0 & 0 & .... & W_0(d, \nu) \end{array}\right]$$

Consider multiplying the row vector of basis functions against this new matrix:

\begin{aligned} &\left[\begin{array}{cccc} B_0^{d + \nu}(s) & B_{1}^{d + \nu}(s) & ... & B_{j + i}^{d + \nu}(s) \end{array} \right] \mathbb{S}_ {0,\nu} \\ &= \left[\begin{array}{ccccc} B_ {0}^\nu(s) f_{0}(s) & B_{1}^\nu(s) f_{0}(s) & B_{2}^\nu(s) f_{0}(s) & ... & B_\nu^\nu(s) f_{0}(s) \end{array} \right] \\ &= f_0(s) \left[\begin{array}{ccccc} B_ {0}^\nu(s) & B_{1}^\nu(s) & B_{2}^\nu(s) & ... & B_\nu^\nu(s) \end{array} \right] \end{aligned}

(we're almost there, hold on!)

When we multiply this result against the weights of our $g_0(s)$ function, something magical occurs:

\begin{aligned} &f_0(s) \left[\begin{array}{ccccc} B_ {0}^\nu(s) & B_{1}^\nu(s) & B_{2}^\nu(s) & ... & B_\nu^\nu(s) \end{array} \right] \left[ \begin{array}{c} \lambda_{0,0} \\ \lambda_{0,1} \\ \lambda_{0,2} \\ ... \\ \lambda_{0,\nu} \\ \end{array} \right] \\ &= f_0(s) \left( B_ {0}^\nu(s) \lambda_{0,0} + B_ {1}^\nu(s) \lambda_{0,1} + ... + B_\nu^\nu(s) \lambda_{0,\nu}\right)\\ &= f_0(s)g_0(s) \end{aligned}

So, from only weights of $f_0(s)$ and a max degree $\nu$, we built a matrix $\mathbb{S}_{0,\nu}$ with the property that

$$\left[\begin{array}{ccccc} B_ {0}^{d + \nu}(s) & B_{1}^{d + \nu}(s) & B_{2}^{d + \nu}(s) & ... & B_{d + \nu}^{d + \nu}(s) \end{array} \right] \mathbb{S}_ {0,\nu} \left[ \begin{array}{c} \lambda_{0,0} \\ \lambda_{0,1} \\ \lambda_{0,2} \\ ... \\ \lambda_{0,\nu} \\ \end{array} \right] = f_0(s)g_0(s)$$

Notice that $\mathbb{S}_ {0,\nu}$ isn't parameterized by $s$ at all, it's just an array of constants! This means that if we can find a vector that's in the null space of $\mathbb{S}_ {0,\nu}$, the terms $\lambda_ {0,0}, \lambda_ {0,1}, ... \lambda_{0,\nu}$ of said vector define a $g_0(s)$ for which $f_0(s)g_0(s) = 0$, without knowing anything about $s$ at all!

(Finding the null space of a matrix is easily accomplished by your favorite linear algebra package, so just assume that's trivial)

Now we're closing in on our goal. We can similarly build $\mathbb{S} _ {1,\nu}$ through $\mathbb{S} _ {3,\nu}$ based on $f _ {1-3}(s)$ in our original curve, then arrange them as a matrix that's $4\times$ the width of $\mathbb{S} _ {0,\nu}$. Let's call this new matrix simply $\mathbb{S} _ \nu$:

$$\mathbb{S} _ \nu = \big[\begin{array}{c|c|c|c} \mathbb{S} _ {0,\nu} & \mathbb{S} _ {1,\nu} & \mathbb{S} _ {2,\nu} & \mathbb{S} _ {3,\nu} \end{array}\big]$$

Now, one more big matrix multiplication:

$$\left[\begin{array}{ccccc} B_ {0}^{d + \nu}(s) & B_{1}^{d + \nu}(s) & ... & B_{d + \nu}^{d + \nu}(s) \end{array} \right] \big[\begin{array}{c|c|c|c} \mathbb{S} _ {0,\nu} & \mathbb{S} _ {1,\nu} & \mathbb{S} _ {2,\nu} & \mathbb{S} _ {3,\nu} \end{array}\big] \left[ \begin{array}{c} \lambda_{0,0} \\ \lambda_{0,1} \\ ... \\ \lambda_{0,\nu} \\ \hline \lambda_{1,0} \\ ... \\ \lambda_{1,\nu} \\ \hline \lambda_{2,0} \\ ... \\ \lambda_{2,\nu} \\ \hline \lambda_{3,0} \\ ... \\ \lambda_{3,\nu} \\ \end{array} \right]$$

Notice that the column vector got 4x taller and includes $\lambda$ coefficients for every $g$ function, because $\mathbb{S}$ got 4x wider.

Let's evaluate this multiplication in two passes:

\begin{aligned} &\left[\begin{array}{ccccc} B_ {0}^{d + \nu}(s) & B_{1}^{d + \nu}(s) & ... & B_{d + \nu}^{d + \nu}(s) \end{array} \right] \big[\begin{array}{c|c|c|c} \mathbb{S} _ {0,\nu} & \mathbb{S} _ {1,\nu} & \mathbb{S} _ {2,\nu} & \mathbb{S} _ {3,\nu} \end{array}\big] \left[ \begin{array}{c} \lambda_{0,0} \\ \lambda_{0,1} \\ ... \\ \lambda_{0,\nu} \\ \hline \lambda_{1,0} \\ ... \\ \lambda_{1,\nu} \\ \hline \lambda_{2,0} \\ ... \\ \lambda_{2,\nu} \\ \hline \lambda_{3,0} \\ ... \\ \lambda_{3,\nu} \\ \end{array} \right] \\ = &\left[\begin{array}{ccc|ccc|cc|cc} B _ {0}^\nu f _ {0} & ... & B _ \nu^\nu f _ {0} & B _ 0^\nu f _ {1} & ... & B _ 0^\nu f _ {1} & ... & B _ \nu^\nu f _ {2} & ... & B _ \nu^\nu f _ {3} \end{array} \right] \left[ \begin{array}{c} \lambda_{0,0} \\ ... \\ \lambda_{0,\nu} \\ \hline \lambda_{1,0} \\ ... \\ \lambda_{1,\nu} \\ \hline ... \\ \lambda_{2,\nu} \\ \hline ... \\ \lambda_{3,\nu} \\ \end{array} \right] \\ = &\hspace{0.1in}f_0(s)g_0(s) + f_1(s)g_1(s) + f_2(s)g_2(s) + f_3(s)g_3(s) \end{aligned}

(In the middle expression, $B$ and $f$ are still functions of $s$, but I didn't have room to fit the $(s)$ while staying in one line)

Well, would you look at that! The result is the same expression from the start of this section, which we need to be zero!

This means that by calculating the null space of $\mathbb{S}_\nu$, we find coefficients $\lambda _ {0,0},...\lambda _ {3,\nu}$ which let us construct $g _ {0-3}(s)$ such that

$$\sum_{i=0}^3 g_i(s)f_i(s) = 0$$

The null space in $\mathbb{S} _ \nu$ probably contains more than one vector, which lets us find multiple sets of $g$ functions. This in turn tells us how we find $r _ \nu$, which (as you may have forgotten) is the number of columns in our matrix $M(x, y, z)$: $r _ \nu$ is the nullity of $\mathbb{S} _ \nu$.

From the coefficients on these $g$ functions, we use our existing infrastructure to build the desired matrix $M(x, y, z)$; remember that the terms of the matrix are given by

$$\Lambda_i(x, y, z) = \lambda_{0,i} + x\lambda_{1,i} + y\lambda_{2,i} + z\lambda_{3,i}$$

and the full matrix is

$$M(x, y, z) = \left[\begin{array}{cccc} \Lambda_{0,1}(x,y,z) &\Lambda_{0,2}(x,y,z) & ... & \Lambda_{0,r_\nu}(x,y,z) \\ \Lambda_{1,1}(x,y,z) &\Lambda_{1,2}(x,y,z) & ... & \Lambda_{1,r_\nu}(x,y,z) \\ ... & ... & ... & ... \\ \Lambda_{d,1}(x,y,z) &\Lambda_{d,2}(x,y,z) & ... & \Lambda_{d,r_\nu}(x,y,z) \\ \end{array}\right]$$

where each column represents a particular set of $g$ functions, i.e. a single vector in $\mathbb{S}_\nu$'s null space.

Mission accomplished!

## Calculating pre-images

If we have a point on the curve or surface, how do we calculate its corresponding $s$ parameter?

We know that given basis functions $\psi_1(s) ... \psi_{m_\nu}(s)$ for $f_{0-3}(s)$

\begin{aligned} & \left[\begin{array}{cccc} \psi_1(s) & ... & \psi_{m_\nu}(s) \end{array} \right] M\left(\frac{f_1(s)}{f_0(s)}, \frac{f_2(s)}{f_0(s)}, \frac{f_3(s)}{f_0(s)}\right) \\ &= \left[\begin{array}{ccc}0 & ... & 0\end{array}\right] \end{aligned}

This means, $\left[\begin{array}{cccc} \psi _ 1(s) & ... & \psi _ {m _ \nu}(s) \end{array} \right]^T$ is in the null space of $M\left(\frac{f _ 1(s)}{f _ 0(s)}, \frac{f _ 2(s)}{f _ 0(s)}, \frac{f _ 3(s)}{f _ 0(s)}\right)^T$

This is helpful, because we can calculate the null space using the aforementioned standard functions from your linear algebra package.

Going from the null space to $s$ depends on the exact form of your basis functions $\psi_i(s)$. In our example, they're Bernstein polynomials.

This means we can be more specific and say the null vector is

$$\left[\begin{array}{c} \psi _ 1(s) \\ \psi _ 2(s) \\ ... \\ \psi _ {m _ \nu}(s) \end{array} \right] = \left[\begin{array}{c} B _ 0 ^ \nu(s) \\ B _ 1 ^\nu(s) \\ ... \\ B _ \nu ^ \nu(s) \end{array}\right]$$

Now, consider the ratio of the first two terms of this expression:

\begin{aligned} \frac{\psi_2(s)}{\psi_1(s)} &= \frac{B_1^\nu(s)}{B_0^\nu(s)} \\ &= \frac{{\nu \choose 1} s^1(1-s)^{\nu-1}}{{\nu \choose 0} s^0(1 - s)^{\nu}} \\ &= \frac{\frac{\nu!}{1!(\nu - 1)!} s^1(1-s)^{\nu-1}}{\frac{\nu!}{0!(\nu - 0)!}s^0(1 - s)^{\nu}} \\ &= \frac{\nu s(1-s)^{\nu-1}}{(1 - s)^{\nu}} \\ &= \frac{\nu s}{(1 - s)} \end{aligned}

In other words, given the ratio of the first two terms in the null vector, we can calculate $s$ with simple algebra!

\begin{aligned} \frac{\psi_2(s)}{\psi_1(s)} &= \frac{\nu s}{(1 - s)} \\ s &= \frac{\psi_2(s)}{\nu \psi_1(s) + \psi_2(s)} \end{aligned}

If the ray is pathologically close to the surface or has multiple intersections, then the situation is more complex; one of the later papers has strategies for these cases (section 2.4), but I didn't dive into them.

## Ray intersections

Remember, the end goal is to do raytracing directly using this matrix representation. A ray can be defined as

$$R(t) = O + tD$$ where the origin $O = (x_0, y_0, z_0)$, the direction $D = (d_x, d_y, d_z)$, and $t$ is the distance travelled along the ray.

We can plug this into $M(x, y, z)$ and build $M(t)$, the m-rep matrix at distance $t$. Since $M(x,y,z)$ and $R(t)$ are both linear, let's declare

$$M(t) = A - tB$$ where $A = M(x_0, y_0, z_0)$ and $B = M(x_0 + d_x, y_0 + d_y, z_0 + d_z)$.

This is what's known as a (linear) matrix pencil.

Just like normal matrices, square matrix pencils have eigenvalues and eigenvectors; finding them is known as the generalized eigenvalue problem

By definition, at an eigenvalue $t_e$, then $\det (A - t_eB) = 0$. Notice that this means that the matrix pencil does not have full rank at the position $O + t_eD$, so this is a ray-surface intersection!

Unfortunately, you can only find the eigenvalues for a square matrix pencil, and our $A$ and $B$ are probably not square!

Here's where things go a little off the rails.

The papers have a strategy for reducing a rectangular matrix pencil to a square, which is laid out in section 2.3 of the third paper. It's an iterative algorithm which repeatedly shrinks the matrixes until the $B$ matrix is of full rank. The linked paper explains it clearly enough that I don't need to unpack it, although I have no idea why it works.

Also, it doesn't always work.

The paper suggests a backup strategy:

We mention that there are several methods to choose a rank $m$ square sub-pencil of $A − tB$, one of them being to compute a column-reduction of the pencil and then keep the pivot columns.

This may be obvious to the authors, but as someone that doesn't wrangle matrix pencils for breakfast, it was rather cryptic. In my raytracing experiments, I never ran into this failure mode, so I didn't go down that rabbit hole.

One more note: generalized eigenvalue solving is an exotic piece of linear algebra machinery. It exists in SciPy and Eigen, but not in NumPy or nalgebra (I opened an issue for the latter, and someone is working on it!).

## Going from curves to surfaces

If you've followed me this far, you should be able to work through section 2.3 of Busé '14; I'm not going to unpack it in quite as much detail.

In brief, the basis functions $\psi_i(s)$ for the $g_{0,1,2,3}(s)$ functions are products of Bernstein polynomials, i.e.

$$\left[\begin{array}{c} \psi_1(s)\\ \psi_2(s)\\ ...\\ \psi_{m_\nu}(s) \end{array}\right] = \left[\begin{array}{c} B_0^{\nu_1}(s_1)B_0^{\nu_2}(s_2)\\ B_1^{\nu_1}(s_1)B_0^{\nu_2}(s_2)\\ B_2^{\nu_1}(s_1)B_0^{\nu_2}(s_2)\\ ...\\ B_{\nu_1}^{\nu_1}(s_1)B_0^{\nu_2}(s_2)\\ B_0^{\nu_1}(s_1)B_1^{\nu_2}(s_2)\\ ...\\ B_{\nu_1}^{\nu_1}(s_1)B_{\nu_2}^{\nu_2}(s_2) \end{array}\right]$$

where $s = (s_1, s_2)$ (since this is now a 2D parameterization).

For ease of writing, I'm going to use the shorthand

$$B_i^{\nu_1}(s_1)B_j^{\nu_2}(s_2) = B_i^{\nu_1}B_j^{\nu_2}(s)$$

The rest of the process is similar: build $\mathbb{S}_\nu$, find its null space, and use that to build $M(x, y, z)$.

However, there's one interesting issue that arises when dealing with surfaces: projection from 3D back to surface coordinates (i.e. calculating pre-images).

(This isn't discussed in Busé '14; the following is an insight that I came up with myself)

For curves, we used the ratio of the first two $\psi(s)$ functions. For surfaces, remember that the first two basis functions are

\begin{aligned} &B_0^{\nu_1}(s_1)B_0^{\nu_2}(s_2)\\ &B_1^{\nu_1}(s_1)B_0^{\nu_2}(s_2) \end{aligned}

Their ratio cancels out to $\frac{B_0^{\nu_1}(s_1)}{B_1^{\nu_1}(s_1)}$, which seems the same as before.

However, think about a point at the very edge of one side of the patch, where $s_2 = 0 \rightarrow B_0^{\nu_2}(s_2) = 0$. This means that both basis functions will be zero, so their ratio won't tell us anything useful.

(In practice, this manifests itself as horrific numerical instability, since the values don't end up being exactly 0)

In fact, there are speckles in the original image due to this very issue!
Did you see them?

Let's drop back into the 1D (curve) case for a moment. Previously, we only considered the ratio of the first two basis functions, i.e. $\frac{\psi_1(s)}{\psi_2(s)}$.

Notice that we can also consider the ratio of the last two basis functions:

\begin{aligned} \frac{\psi_{m_\nu - 1}(s)}{\psi_{m_\nu}(s)} &= \frac{B_{\nu - 1}^\nu(s)}{B_\nu^\nu(s)} \\ &= \frac{{\nu \choose {\nu - 1}} s^{\nu - 1}(1-s)^{1}}{{\nu \choose \nu} s^\nu(1 - s)^0} \\ &= \frac{\nu (1-s)}{s} \\ s &= \frac{\nu \psi_{m_\nu}(s)}{\nu \psi_{m_\nu}(s) + \psi_{m_\nu - 1}(s)} \end{aligned}

Still in 1D, we now have two equations for $s$, and can combine them into a minimization problem:

$$\left[\begin{array}{c} \nu \psi_1(s) + \psi_2(s) \\ \nu \psi_{m_\nu}(s) + \psi_{m_\nu - 1}(s) \end{array}\right] s = \left[\begin{array}{c} \psi_2(s) \\ \nu \psi_{m_\nu}(s) \end{array} \right]$$

We can solve $As = B$ for $s$, minimizing squared error.

Back in the case of surfaces, the same idea holds: we want to generate multiple pairs of values with meaningful ratios, then use minimization to solve for the best parameter value.

For example, all of the ratios

\begin{aligned} &B_0^{\nu_1}B_0^{\nu_2}(s):B_0^{\nu_1}B_0^{\nu_2}(s) \\ &B_0^{\nu_1}B_1^{\nu_2}(s):B_1^{\nu_1}B_1^{\nu_2}(s) \\ &B_0^{\nu_1}B_2^{\nu_2}(s):B_2^{\nu_1}B_2^{\nu_2}(s) \\ &... \\ &B_0^{\nu_1}B_{\nu_1}^{\nu_2}(s):B_{\nu_1}^{\nu_1}B_2^{\nu_2}(s) \\ \end{aligned} should be equivalent to $B_0^{\nu_1}(s_1) : B_1^{\nu_1}(s_1)$, except in cases where $B_j^{\nu_2}(s_2) \approx 0$.

Similarly, all of the ratios \begin{aligned} &B_{\nu_1}^{\nu_1}B_0^{\nu_2}(s):B_{\nu_1 - 1}^{\nu_1}B_0^{\nu_2}(s) \\ &B_{\nu_1}^{\nu_1}B_1^{\nu_2}(s):B_{\nu_1 - 1}^{\nu_1}B_1^{\nu_2}(s) \\ &B_{\nu_1}^{\nu_1}B_2^{\nu_2}(s):B_{\nu_1 - 1}^{\nu_1}B_2^{\nu_2}(s) \\ &... \\ &B_{\nu_1}^{\nu_1}B_{\nu_2}^{\nu_2}(s):B_{\nu_1 - 1}^{\nu_1}B_{\nu_2}^{\nu_2}(s) \\ \end{aligned} should be equivalent to $B_{\nu_1}^{\nu_1}(s_1) : B_{\nu_1 - 1}^{\nu_1}(s_1)$, with the same caveat.

We can build a big matrix that uses all of these ratios, then solve for a value of $s_1$ that minimizes squared error. Because the ill-conditioned pairs are (nearly) zeros, they don't contribute to the error, so we end up choosing $s_1$ based on only the well-conditioned values!

After switching to this method of finding $s$, the glitches go away:

## Putting it all together

At this point, we have built three core pieces of mathematical machinery.

The first is a way to convert Bézier curves and surfaces into this matrix representation (called an "m-rep" in the original papers).

Then, using this m-rep, we can

• Convert a point in 3D space back into parameter space
• Calculate ray-surface intersections

With these tools, raytracing is simple:

Because the ray-patch intersection is expensive, it skips any patches with bounding boxes that cannot be hit by a particular ray, and sort based on distance to bounding box to exit early when possible.

Here's how many patches could intersect each pixel's ray:

Here's how many patches that are actually tested; once a single intersection is found, we skip any bounding boxes that are farther away that that intersection:

Given an intersection (in 3D space) and a particular patch, we use the pre-image calculation to find UV coordinates:

Then, it's relatively straightforward to convert from (parametric) $(u,v)$ coordinates to normals, using the closed-form derivatives of Bernstein polynomials. This produces our final shaded image:

## Performance

One of the first questions I was asked is whether this is faster than traditional raytracing of triangulated meshes.

The answer is an emphatic no.

My Python implementation renders a 400x400 image of the teapot in 99 seconds, on a 2021 Macbook Pro.

Admittedly, this is single-threaded Python, so it's not even close to optimal.

I did an experimental port to Julia, which I had never used before. This brought the speed down to 55 seconds, albeit with some new rendering glitches that I didn't care to debug:

I appreciate that the Julia profiler reports allocations as well as timing:

55.885217 seconds (47.47 M allocations: 50.370 GiB, 5.45% gc time, 0.62% compilation time)

This could probably be optimized, since 50 GB is a huge amount of churn.

I decided against going down the rabbit hole, and instead dove into the neighboring rabbit hole of rewriting everything in C++ using Eigen, which I've used extensively in the past.

This brought the speed for a 400x400 image to a mere 25 seconds, along with a different set of graphical glitches (rendered with UV coordinates):

Debugging this is very challenging: the $\mathbb{S}_\nu$ matrix is identical in C++ and Python, but NumPy and Eigen compute a different null space. This isn't wrong, because the null space is not unique, but this kind of divergence makes hunting down the glitches very hard!

(If you want to debug this yourself, I suspect that the matrix pencil reduction is misbehaving in C++, but it's hard to see past the matrices and find an actual root cause)

## Closing thoughts

Fundamentally, this is a very cool math trick. I enjoyed fighting my way through the wall of linear algebra to understand how it actually works.

However, it's very hard to implement, and quite slow. I don't know enough to judge whether it's actually more robust than other strategies, or whether the tradeoffs are worth it.

If you'd like to explore further, here are my three implementations:

(be warned: this is extremely research-quality code)