A Neat XOR Trick

There's a neat trick to solve Advent of Code, day 6 in a single pass.

Looking over the solutions mega-thread, it seems like not many people discovered this trick; when I posted about it, people found it to be noteworthy.

Let's start with a brief summary of the problem: you're given an input string, e.g.

nznrnfrfntjfmvfwmzdfjlvtqnbhcprsg

Within this string, you want to find the first W-character window in which every character is unique.

For this example, with a 4-character window, this occurs here:

nznrnf [rfnt] jfmvfwmzdfjlvtqnbhcprsg

Naively, you can do this with three nested loops:

fn run(s: &[char], window_size: usize) -> usize {
    for i in 0..s.len() - window_size {
        let mut unique = true;
        for j in 0..window_size {
            for k in (j + 1)..window_size {
                if s[i + j] == s[i + k] {
                    unique = false;
                }
            }
        }
        if unique {
            return i + window_size;
        }
    }
    panic!("No unique window found");
}

With an input string of length N and a window size W, the running time of this implementation is O(N * W * W).

There's an obvious algorithmic optimization: instead of comparing each element within the window, we could instead build a HashSet and check its size:

fn run(s: &[char], window_size: usize) -> usize {
    for i in 0..s.len() - window_size {
        let mut set = HashSet::new();
        for j in 0..window_size {
            set.insert(s[i + j]);
        }
        if set.len() == window_size {
            return i + window_size;
        }
    }
    panic!("No unique window found");
}

Insertion into the HashSet is O(1), so our running time is down to O(N * W).

(Whether this is faster in practice is left as an exercise to the reader, since it adds allocation into the mix)

However, there's still room to improve, with a little trickery.

First, let's eliminate the HashSet. We can consider each character as a bitmask:

a = 10000000000000000000000000000000
b = 01000000000000000000000000000000
c = 00100000000000000000000000000000
d = 00010000000000000000000000000000
...etc

Given a set of characters, we can see which ones are present by combining these bitmasks with or (which is written as |):

a | b | c = 11100000000000000000000000000000

If we have a set of W characters, the or of their bitmasks will have W bits set if and only if those characters are unique.

a | b | c = 11100000000000000000000000000000
a | b | b = 11000000000000000000000000000000

Hopefully, this is intuitively obvious: each character can only activate one bit in the result, so if we have W bits set, then we have W unique characters.

In Rust, we check how many bits are set with count_ones. With -C target-cpu=native passed to the compiler, it should compile down to 1-2 instructions:

popcnt on x86;
cnt.8b followed by uaddlv.8b on AArch64

We can use this bitmask to replace our HashSet:

fn run(s: &[char], window_size: usize) -> usize {
    for i in 0..s.len() - window_size {
        let mut set = 0u32;
        for j in 0..window_size {
            set |= 1 << (s[i + j] as u32 - 'a' as u32);
        }
        if set.count_ones() as usize == window_size {
            return i + window_size;
        }
    }
    panic!("No unique window found");
}

(and now there are zero allocations!)

The code is still O(N * W) – because we have an outer and inner loop – but it's prepared us for the final trick.

Consider a slight modification to our previous axiom:

If we have a set of W characters, the xor of their bitmasks will have W bits set if and only if those characters are unique.

a ^ b ^ c = 11100000000000000000000000000000
a ^ b ^ b = 10000000000000000000000000000000

This is less intuitively obvious: everyone that I've presented it to, including myself, has to convince themself that it's correct.

One reply on Reddit has a particularly good explanation (edited to match the convention of using W for window size):

This algorithm is just counting how many characters appear an odd number of times in a window of size W. That number will be W if-and-only-if they're all unique!

Using xor is more powerful than or, because of one particular property:

a ^ b ^ a = b

Applying xor twice will turn a bit on, then back off.

In other words, we can maintain the bitset for a sliding window in a single u32, turning bits on as they enter the window and off as they leave.

This eliminates the inner loop from our previous function:

fn run(s: &[char], window_size: usize) -> usize {
    let mut set = 0u32;
    for i in 0..s.len() {
        // Turn on bits as they enter the window
        set ^= 1 << (s[i] as u32 - 'a' as u32);

        // Turn off bits as they leave the window
        if i >= window_size {
            set ^= 1 << (s[i - window_size] as u32 - 'a' as u32);
        }

        // Check the current window and see if we're done
        if set.count_ones() as usize == window_size {
            return i + 1;
        }
    }
    panic!("No unique window found");
}

With this last trick, we're down to O(N) running time, with no dependence on the window size! I didn't do rigorous benchmarking, but one reply said that this trick sped up their code by almost 3x.

There's no moral to the story, other than "xor is cool". I hope everyone is enjoying Advent of Code!

Matt Keeter // blog

A Neat XOR Trick

Further reading