Efficiently updating implicit in-order forests
You may find yourself looking at an ordered list of values:
And you may ask yourself: how do I efficiently compute a range aggregation on
this data, e.g. finding the
max value for indices 1 through 4?
Simple iteration works, but is
O(N) in terms of range size.
A more efficient strategy is an "implicit in-order forest", which interleaves the raw data with a pre-computed reduction tree. This is the kind of data structure that's been independently re-invented dozens of times, but I first learned about it through Tristan Hume's blog post.
(You may want to familiarize yourself with that post before continuing; I'll give a brief summary, but he does a great job presenting this data structure)
For the values above and the
max operator, we'd build the following tree:
Then, to query it, we find a set of tree nodes that exactly covers our desired range. For example, to compute the aggregate value across indices 1-4 (inclusive, zero-indexed), we'd combine the tree nodes marked in red:
We only have to combine three values instead of four, because one node (7) is
already an aggregate of two leaf nodes! I'll refer to such a node in the tree
as a "range node", because it covers a range of the original values. As we
query larger spans of data, we get a
for computing aggregate values.
(In these diagrams, I'm going to focus on a single tree, but rest assured that you can build a forest and pack it into a flat array just like before)
All of this is already covered by previous discussions of the data structure; now, let's push the boundaries.
In my use case, I need to update ranges, not just query them. For example, given our data above, how would we inject a new value of 9 across the range 0-4?
The naive approach is to merge the new value with every leaf node, then bubble up the results through the range nodes:
(This is how it's done in Michael Rojas's excellent Typescript implementation )
Unfortunately, this update operation has an
O(N) running time based on the
size of the range. If we're updating infrequently, than that's probably fine,
but it offends my aesthetic sensibilities (and could easily lead to
Accidentally Quadratic behavior).
Let me tell you the trick right away: we double the node size. In our modified tree, each node stores two distinct values:
- Aggregate values (
A, which accumulate upwards, same as before)
- Regular values (
R, which implicitly extend downwards).
The rules are as follows:
- An aggregate value
Amust store the combination of the
Rvalue at that node and all
Rvalues below it in the tree. Operations on the tree must maintain this invariant.
- The true value at a given node is implicitly stored as the union of the
Avalue for that node and all
Rvalues above it, as you walk to the root of the tree. This applies for both leaf nodes and range nodes.
Here's our original tree with both aggregate
A (first, in red) and regular
values (second, in blue). To start, the leaf nodes have regular
R values, and all
other nodes only have aggregate
When updating a range, we write the new value to the
R field, then update
fields recursively up the tree to maintain our invariants. Here's our previous
example of writing the value 9 to the range 0-4 (inclusive):
This is much more efficient than touching every leaf node: we can write to range nodes that cover wide swaths of the tree! (In fact, we updated the exact same nodes that we'd read for the equivalent range query)
Computing the aggregate value of indices 1-4 (inclusive) now reads the following values from the tree:
As you can see, this is a little more expensive: each node now walks up the tree
to accumulate regular values above it. More formally, the runtime for both
queries and updates is
We've also doubled our memory usage, since each node in the tree now stores two
If you're paying close attention, you have noticed one more subtle limitation:
we can no longer store "any associative operator" in this data structure.
Instead, the operation must be both commutative (
f(a, b) = f(b, a)) and
f(a, b) = f(a, f(a, b))).
max in the examples above, which meets these definitions, but I'm
actually using this to track set membership with fixed-size bitfields.
Here's how the original data structure compares to the new updatable variant:
|Operation||Associative||Associative, idempotent, commutative|
We could bring memory usage down to
3*N by storing the
R values in a
separate tree, since the leaf nodes don't need both
A. I also suspect
that there's a better way to do range and update queries that would bring us
O(log(N)); in the current implementation, parent nodes are often
visited more than once (e.g. every node checks the
R value at the root of the
I've written a relatively un-optimized implementation as part of Fidget (which is an ongoing project that I'm not quite ready to blog about). This code will be handling much smaller sequences than the Gigatrace project – 103-104 values, versus 109 – so it's written for clarity and doesn't do as much clever bitshifting.
Even so, it turns out that this code is hard to get right: it's an absolute
nesting ground for off-by-one-errors. After hearing
co-workers rave about the
powers of property-based testing, I wrote a
proptest suite to
automatically check my implementation.
There are two property-based tests:
- The first test
is for basic functionality: we build a tree, then compare the result of
range_queryto item-by-item accumulation. Iterating over every item in the range gives a ground-truth reading to compare against.
- The second test
is for updates: we build a tree, then update a range with a new value. This
update is applied to both the tree (using
update_range) and the original data. Then, we do the same test as above: check that a range query gives the same result between the tree and the list.
All of the parameters for these tests are chosen by
proptest, meaning a wide
variety of values are checked.
- For the basic test, we vary list length, list values, and the checked range.
- For the update test, we also vary the update range and update value.
proptest experience was great. I didn't have to think about writing
dozens of test cases; instead, I wrote a single checking function, let the
computer run it on hundreds of cases, and glared at my bit shifting operations
until it passed.
I'm not sure how many people out there need efficient range queries and updates on time-series bitsets, but if you do, I hope this has been helpful!
Like the original data structure, I'm sure this has been independently re-invented dozens of times; if you're got any pointers, then let me know.