Swingline is a fast, pure-GPU implementation of Weighted Voronoi Stippling.


Weighted Voronoi Stippling is an iterative process built on Lloyd's algorithm. Two steps are repeated over and over until the stipple locations converge:

The original paper suggests using the GPU to calculate the Voronoi diagram; Swingline extends this by calculating the cell updates on the GPU as well.

This makes it alarmingly fast!

For small sample counts (1-2K samples), Swingline is between 10 and 20 times faster than StippleGen. For larger sample counts (e.g. 10K), Swingline takes 10s of seconds to converge (and StippleGen simply hangs).

Rendering the Voronoi diagram

To construct the Voronoi diagram, we'll be using a fairly old technique: it was first published at SIGGRAPH '99 and is also referenced in the Red Book.

The core idea is incredibly simple:
At every cell center, render a uniquely-colored cone with depth culling turned on.

From the side, a set of cones looks like this:
Cones side
And from the top (with flat shading), it looks quite Voronoi-ish:
Cones top

This works because the z-height of each cone represents Euclidean distance to the cell center. Turning on z-culling solves for the closest cell center for every pixel; coloring each cone uniquely lets us disambiguate which cell center was chosen.

Cones are drawn using instanced rendering. We create one vertex buffer that contains a single cone, another buffer that contains a list of offsets (representing each cell's location), and draw them using glDrawArraysInstanced.

The vertex shader is very simple:

layout(location=0) in vec3 pos;     /*  Absolute coordinates  */
layout(location=1) in vec2 offset;  /*  0 to 1 */
uniform vec2 scale;

out vec3 color_;

void main()
    gl_Position = vec4(pos.xy*scale + 2.0f*offset - 1.0f,
                       pos.z, 1.0f);

    // Pick color based on instance ID
    int r =  gl_InstanceID          % 256;
    int g = (gl_InstanceID / 256)   % 256;
    int b = (gl_InstanceID / 65536) % 256;
    color_ = vec3(r / 255.0f, g / 255.0f, b / 255.0f);

Now, you understand the faint color patches in the background of the demo images: each color patch represents a different Voronoi cell, and the black dots are the stipple locations.


Finding weighted cell centroids

With the Voronoi diagram rendered, we need to find cell centroids.

The centroid of cell $i$ is defined as $$ p_i = \frac{ \sum_{A} x w(x)}{ \sum_{A} w(x)} $$ where $A$ is the set of pixels in cell $i$, $x$ is a pixel's coordinate, and $w(x)$ is a weighting function sampled at point $x$.

A naive implementation iterates over every pixel in the Voronoi diagram, accumulating a position and weight for the given cell's index.

For a $W$ by $H$ image, the naive strategy has a running time of $O(W \times H)$.

Instead, I'll present an $O(W + H)$ strategy that takes advantage of the GPU's massive parallelism. As implied by the running time, this algorithm is a two-stage operation. The first stage is horizontal accumulation and the second is vertical summing.

Here's a visualization of the algorithm, which probably won't make sense yet:


As shown in the picture above, we'll be using an intermediate texture of height $H$ and width $N$, where $N$ is the number of samples. In this texture, a fragment's $x$ coordinate represents which Voronoi cell it's looking for; its $y$ coordinate is the row of the Voronoi image that it searches.

In the first stage, each fragment iterates across every column of its assigned row, checking pixels and accumulating weights and positions for the pixels that match the fragment's Voronoi cell index.

Expressed as a fragment shader, the logic is as follows:

layout (pixel_center_integer) in vec4 gl_FragCoord;
out vec4 color;

uniform sampler2D voronoi;
uniform sampler2D img;

void main()
    int my_index = int(gl_FragCoord.x);
    ivec2 tex_size = textureSize(voronoi, 0);
    color = vec4(0.0f);

    // Iterate over columns of the source image, accumulating a
    // weighted sum of the pixels that match our index
    for (int x=0; x < tex_size.x; x++)
        ivec2 coord = ivec2(x, gl_FragCoord.y);
        vec4 t = texelFetch(voronoi, coord, 0);

        // Unpack RGB value into a Voronoi cell index
        int i = int(255.0f * (t.r + (t.g * 256.0f)
                                  + (t.b * 65536.0f)));
        if (i == my_index)
            float weight = 1.0f - texelFetch(img, coord, 0)[0];
            weight = 0.01f + 0.99f * weight;

            color.xy += (coord + 0.5f) * weight;
            color.w += weight;
            color.z += 1.0f;

    // Normalize to the 0 - 1 range
    color.x = color.x / tex_size.x;
    color.y = color.y / tex_size.y;

Next, we accumulate down columns of the intermediate texture. Remember, each column in this texture represents a particular Voronoi cell. This summation collapses the per-row values into a single result for each Voronoi cell.

layout (location=0) in uint index;
out vec3 pos;

uniform sampler2D summed;

void main()
    ivec2 tex_size = textureSize(summed, 0);
    pos = vec3(0.0f, 0.0f, 0.0f);
    float weight = 0.0f;
    float count = 0.0f;
    for (int y=0; y < tex_size.y; ++y)
        vec4 t = texelFetch(summed, ivec2(index, y), 0);
        pos.xy += t.xy;
        weight += t.w;
        count += t.z;
    pos.xy /= weight;
    pos.z = weight / count;

This shader renders directly back into the VBO that's used for cone positioning, using transform feedback.

We also store the average pixel weight, which is used to adjust the dot's radius. This makes the images look better.

Here's an example comparing constant and variable-radius stipples:



Implementation and Usage

Swingline is a command-line program with the following usage:

Usage: swingline [-n samples] [-r radius]
                 [-o output] [-i iterations] image

Sets of stipples are saved to an SVG output file.

Swingline is implemented in under 900 lines of C, plus stb_image.h for image loading. It has GLFW and libepoxy as dependencies.

You can view the source on Github.

Further Reading

Thanks to Christian Brunschen, who pointed me to Parallel Banding Algorithm to Compute Exact Distance Transform with the GPU. This paper describes an advanced GPU-only algorithm to calculate Euclidean Distance Transforms (from which centroidal Voronoi diagrams can be extracted).