Efficiently updating implicit in-order forests

You may find yourself looking at an ordered list of values:

Index	0	1	2	3	4	5	6	7
Value	2	0	3	7	2	8	7	2

And you may ask yourself: how do I efficiently compute a range aggregation on this data, e.g. finding the max value for indices 1 through 4?

Simple iteration works, but is O(N) in terms of range size.

A more efficient strategy is an "implicit in-order forest", which interleaves the raw data with a pre-computed reduction tree. This is the kind of data structure that's been independently re-invented dozens of times, but I first learned about it through Tristan Hume's blog post.

(You may want to familiarize yourself with that post before continuing; I'll give a brief summary, but he does a great job presenting this data structure)

For the values above and the max operator, we'd build the following tree:

max-reduction tree

Then, to query it, we find a set of tree nodes that exactly covers our desired range. For example, to compute the aggregate value across indices 1-4 (inclusive, zero-indexed), we'd combine the tree nodes marked in red:

max-reduction tree with values selected

We only have to combine three values instead of four, because one node (7) is already an aggregate of two leaf nodes! I'll refer to such a node in the tree as a "range node", because it covers a range of the original values. As we query larger spans of data, we get a O(log(N)) runtime for computing aggregate values.

(In these diagrams, I'm going to focus on a single tree, but rest assured that you can build a forest and pack it into a flat array just like before)

All of this is already covered by previous discussions of the data structure; now, let's push the boundaries.

In my use case, I need to update ranges, not just query them. For example, given our data above, how would we inject a new value of 9 across the range 0-4?

The naive approach is to merge the new value with every leaf node, then bubble up the results through the range nodes:

max-reduction tree values updated

(This is how it's done in Michael Rojas's excellent Typescript implementation )

Unfortunately, this update operation has an O(N) running time based on the size of the range. If we're updating infrequently, than that's probably fine, but it offends my aesthetic sensibilities (and could easily lead to Accidentally Quadratic behavior).

Let me tell you the trick right away: we double the node size. In our modified tree, each node stores two distinct values:

Aggregate values (A, which accumulate upwards, same as before)
Regular values (R, which implicitly extend downwards).

The rules are as follows:

An aggregate value A must store the combination of the R value at that node and all A and R values below it in the tree. Operations on the tree must maintain this invariant.
The true value at a given node is implicitly stored as the union of the A value for that node and all R values above it, as you walk to the root of the tree. This applies for both leaf nodes and range nodes.

Here's our original tree with both aggregate A (first, in red) and regular R values (second, in blue). To start, the leaf nodes have regular R values, and all other nodes only have aggregate A values.

max-reduction tree with A and R nodes

When updating a range, we write the new value to the R field, then update A fields recursively up the tree to maintain our invariants. Here's our previous example of writing the value 9 to the range 0-4 (inclusive):

max-reduction showing an AR update

This is much more efficient than touching every leaf node: we can write to range nodes that cover wide swaths of the tree! (In fact, we updated the exact same nodes that we'd read for the equivalent range query)

Computing the aggregate value of indices 1-4 (inclusive) now reads the following values from the tree:

max-reduction showing an AR range query

As you can see, this is a little more expensive: each node now walks up the tree to accumulate regular values above it. More formally, the runtime for both queries and updates is O(log²(N)). We've also doubled our memory usage, since each node in the tree now stores two values.

If you're paying close attention, you have noticed one more subtle limitation: we can no longer store "any associative operator" in this data structure. Instead, the operation must be both commutative (f(a, b) = f(b, a)) and idempotent (f(a, b) = f(a, f(a, b))).

(For those familiar with category theory, my friend Martin has informed me that this is a bounded meet-semilattice)

I used max in the examples above, which meets these definitions, but I'm actually using this to track set membership with fixed-size bitfields.

Here's how the original data structure compares to the new updatable variant:

	Original	Updatable
Push	`O(log(N))`	`O(log(N))`
Query range	`O(log(N))`	`O(log²(N))`
Update range	`O(N)`	`O(log²(N))`
Memory usage	`2*N`	`4*N`
Operation	Associative	Associative, idempotent, commutative

We could bring memory usage down to 3*N by storing the R values in a separate tree, since the leaf nodes don't need both R and A. I also suspect that there's a better way to do range and update queries that would bring us back to O(log(N)); in the current implementation, parent nodes are often visited more than once (e.g. every node checks the R value at the root of the tree).

I've written a relatively un-optimized implementation as part of Fidget (which is an ongoing project that I'm not quite ready to blog about). This code will be handling much smaller sequences than the Gigatrace project – 10³-10⁴ values, versus 10⁹ – so it's written for clarity and doesn't do as much clever bitshifting.

Even so, it turns out that this code is hard to get right: it's an absolute nesting ground for off-by-one-errors. After hearing co-workers rave about the powers of property-based testing, I wrote a proptest suite to automatically check my implementation.

There are two property-based tests:

The first test is for basic functionality: we build a tree, then compare the result of range_query to item-by-item accumulation. Iterating over every item in the range gives a ground-truth reading to compare against.
The second test is for updates: we build a tree, then update a range with a new value. This update is applied to both the tree (using update_range) and the original data. Then, we do the same test as above: check that a range query gives the same result between the tree and the list.

All of the parameters for these tests are chosen by proptest, meaning a wide variety of values are checked.

For the basic test, we vary list length, list values, and the checked range.
For the update test, we also vary the update range and update value.

The proptest experience was great. I didn't have to think about writing dozens of test cases; instead, I wrote a single checking function, let the computer run it on hundreds of cases, and glared at my bit shifting operations until it passed.

I'm not sure how many people out there need efficient range queries and updates on time-series bitsets, but if you do, I hope this has been helpful!

Like the original data structure, I'm sure this has been independently re-invented dozens of times; if you're got any pointers, then let me know.

Updates

On the Geometry Processing World-Wide Discord, Nathan suggests that this is a segment tree with range updates.

Rémy on Mastadon points me to the Algorithmica write-up on segment trees, which has a ton of detail.

Matt Keeter // blog

Efficiently updating implicit in-order forests

Updates