Program synthesis with Z3

Introduction

Advent of Code is an annual set of programming puzzles, released day by day in December (like an advent calendar).

On Day 21, we're given a tiny reprogrammable robot and asked to write a program so that it can walk across the hull of a ship, jumping over any missing gaps.

It looks something like this:

The program is up to 15 instructions in SpringScript, a fictional programming language. Each instruction contains an opcode (AND, OR, or NOT), a source register, and a target register.

The robot has two writeable target registers: T (the temporary register) and J (the jump register). At the end of program execution, if J is True, then the robot jumps four tiles ahead.

In addition, the robot provides sensor data in read-only registers A through I. These registers mark whether upcoming tiles are filled (True) or empty (False).

A naive solution jumps if the upcoming tile is empty.

Test it out in the simulation below:

This isn't going to work in all cases, and will fail for the map below.

Can you write a script to solve this map? Edit the textbox on the left, or press "Load solution" to see an answer.

Solving manually

Based on the Reddit Solutions Thread, many people took a similar approach:

Record a section of the map where the robot fell
Update your script to solve that section
Repeat until the whole map is solved

I started with a similar approach, building up a map that collects the "danger zones" where the robot has fallen. It looks something like this:

(As before, you can write a script and try to solve this map – it's not too hard!)

This is a relatively manual process, though some folks automated it with genetic algorithms and other search strategies.

What would a robust, automatic, general-purpose solver look like?

Phrasing our problem

Let's start from our goal:

If I were a jumping robot, I would simply make sure to be in the air whenever there's a pit below me.

Because a jump always covers four tiles, a robot is in the air if it has jumped within the last three tiles. At any given tile, a robot jumps if it is not already in the air, and its script tells it to jump given the sensor readings.

We want to find a program for which the goal above is true, given a particular map.

Enter Z3

Each instruction can be recorded as a 7-bit value:

Two bits for instruction (AND, OR, NOT)
Four bits for the input register (A through I, T, J)
One bit for the output register (T or J)

The entire program is therefore 7 * 15 = 105 bits of data, so there are roughly 40564819207303340847894502572031 possible programs.

This would take a little while to search by hand. Luckily, we have a large hammer available: we'll be using Z3, an automated theorem prover from Microsoft Research.

Our plan is to symbolically execute the program defined by our 105 bits, without fixing them to any particular value. Then, we'll be able to make statements about when the robot must jump, and ask the solver to find a program that meets those conditions.

Here's the code to do this symbolic evaluation:

def symbolic_execution(length=15):
    ''' Symbolically executes a program of a given length

        Returns (f, prog):
        - f is a lambda function which should be called with a map
         of sensor readings, and returns the symbolic result
        - prog is a list of (opcode, reg_in, reg_out) tuples
    '''
    # Initialize the T and J registers to False
    T = BoolVal(False)
    J = BoolVal(False)
    sensors = {c:Bool(c) for c in 'ABCDEFGHI'}
    program = []

    for i in range(length):
        op = BitVec('op_%i' % i, 2)
        reg_in = BitVec('reg_in_%i' % i, 4)
        reg_out = BitVec('reg_out_%i' % i, 1)

        # Pick out our left-hand argument based on reg_in
        lhs = T # default value
        for (i, r) in enumerate(list(sensors.values()) + [T, J]):
            lhs = If(reg_in == i, r, lhs)

        # Pick out our right-hand argument based on reg_in
        # This is ignored if the opcode is NOT, but we can't know
        # that going in, since we're executing symbolically.
        rhs = If(reg_out == 0, T, J)

        # Calculate the result based on opcode
        result = If(op == 0, And(lhs, rhs),
                 If(op == 1, Or(lhs, rhs),
                             Not(lhs)))

        # Write the result to either T or J; the other one
        # keeps the value from the previous iteration.
        (T, J) = (If(reg_out == 0, result, T),
                  If(reg_out == 1, result, J))

        program.append([op, reg_in, reg_out])

    # Helper function to bind sensor readings
    f = lambda vs: substitute(J,
        *[(v, BoolVal(vs[k]) if type(vs[k]) is bool else vs[k])
          for (k, v) in sensors.items()])

    return (f, program)

Each instruction is a tuple [opcode, input_reg, output_reg], which are each a BitVec value (i.e. integers of a fixed bit length, whose values are unknown until the solver picks them).

When executing, we track the temporary and jump registers T and J, updating them at each instruction based on whether they're the output of that instruction. The final value for J represents whether to jump at the end of of the program.

This value is generic in terms of the program, but also in terms of sensor readings, which isn't quite what we want – we'd like to make assertions given a fixed set of sensor readings, rather than letting the solver pick them.

To fix this, we wrap J in a lambda function which applies a particular set of sensor readings; think of this as partial evaluation with respect to the sensor registers.

We can immediately use this to do interesting things. Let's solve for a one-instruction program which jumps if the A sensor register is False.

>>> (f, prog) = symbolic_execution(1)
>>> solver = Solver()
>>> sensors = {'A': False, **{c: True for c in 'BCDEFGHI'}}
>>> solver.add(f(sensors))
>>> solver.check()
sat
>>> solver.model()
[reg_in_0 = 1, op_0 = 1, reg_out_0 = 1]

This is great: it prints sat, meaning that it found a solution, then showed the encoded values for the opcode and registers.

We'll write a helper function to make things human-readable:

def decode(prog, model):
    ''' Converts a program + solved model into SpringScript
    '''
    if len(model) == 0:
        return ' '
    ops = {0: 'AND', 1: 'OR', 2: 'NOT'}
    inputs = {i:c for (i,c) in enumerate('ABCDEFGHITJ')}
    outputs = {0: 'T', 1: 'J'}
    script = ''
    for (op, reg_in, reg_out) in prog:
        op = model.eval(op).as_long()
        reg_in = model.eval(reg_in).as_long()
        reg_out = model.eval(reg_out).as_long()

        op = ops.get(op, 'NOT')
        reg_in = inputs.get(reg_in, 'T')
        reg_out = outputs.get(reg_out, 'T')
        script += "%s %s %s\n" % (op, reg_in, reg_out)
    return script[:-1]

Running this on our solution, we see the resulting program:

>>> decode(prog, solver.model())
'OR B J'

This is an interesting solution: it actually ignores the hole, and uses the fact that sensor B is above a tile. Given the limited constraints which we provided, this is perfectly valid.

Bring me my 105 bits

Now, we can write our full solver, which takes a map string (e.g. ###..#..##) and asserts that the robot is airborn whenever it is above a hole:

def solve_for_map(m, length=15):
    (j, prog) = symbolic_execution(length)

    solver = Solver()
    jumps_at = []
    for (i,c) in enumerate(m):
        # We're in the air if we've jumped within 3 tiles
        airborn = Or(jumps_at[-3:])

        # If there's no tile underneath us, then we must be airborn
        if c != '#':
            solver.add(airborn)

        # Sensor readings are based on the upcoming map:
        sensors = {c: i + j + 1 >= len(m) or m[i + j + 1] == '#'
                   for (j,c) in enumerate('ABCDEFGHI')}

        # We jump if we're not airborn and the program says to jump
        jumps_at.append(And(Not(airborn), j(sensors)))

    r = solver.check()
    return decode(prog, solver.model()) if r == sat else False

Running this on a very simple map, we get a valid-but-unusual solution:

>>> print(solve_for_map('#####  #####'))
OR E T
NOT T J
OR J J
AND J T
AND T T
AND T T
AND T T
OR J T
OR T J
NOT T J
AND D T
OR B T
AND T T
NOT E J
NOT T J

Take a moment to think about this, and test it out in the simulator below:

Bring me... slightly fewer bits

Of course, this is because we asked for 15 instructions, so Z3 gave us 15 instructions. Many of them are nonsensical or have no effect, but the total behavior of the script matches our specification.

We can make the solutions tighter by solving for a minimal number of instructions. There's probably a clever way of doing this in Z3 itself, but I implemented it by just testing each program size, counting upwards:

def minimize_for_map(m):
    for i in range(0, 16):
        r = solve_for_map(m, i)
        if r:
            return r
    return False

Running this on our simple map, we find a hilarious one-instruction solution:

>>> minimize_for_map('#####  #####')
'NOT J J'

That's right: always be jumping.

Revisiting our map

Now, we can finally revisit the map from our problem input:

>>> print(minimize_for_map('##### ################ ## ############# #  ############# #### #  ######## ## ## # ######## # ##############  # ############# # ## ########### ## ##  #########   ##############  ## ## ####'))
OR H T
NOT C J
AND I T
OR T J
AND D J

This takes about three seconds on my laptop, finding a five-instruction solution.

Let's plug it in and see how it does:

It's worth taking a step back and recognizing that this is magical. In about 100 lines of Python, we've built a universal solver that pulls a SpringScript program out of thin air.

Caveat

This doesn't quite work for the Advent of Code bot simulator, which presents your bot with a series of smaller maps, rather than one long map.

To fix this, we can modify solve_for_map to take multiple inputs, i.e.

def solve_for_maps(maps, length=15):
    (j, prog) = symbolic_execution(length)
    solver = Solver()
    for m in maps:
        jumps_at = []
        for (i,c) in enumerate(m):
            # ...same as before...
    r = solver.check()
    return decode(prog, solver.model()) if r == sat else False

def minimize_for_maps(maps):
    for i in range(0, 16):
        # Call the multi-map function instead
        r = solve_for_maps(maps, i)
        if r:
            return r
    return False

Then pass it the full set of small maps:

>>> maps = ['#####.###########',
            '#####.##.########',
            '#####.#..########',
            '#####.####.#..###',
            '#####.##.##.#.###',
            '#####.#.#########',
            '#####..#.########',
            '#####.#.##.######',
            '#####.##.##..####',
            '#####...#########',
            '#####..##.##.####',
            '#####.#.###.#####',
            '#####.###.#..####',
            '#####...###...###',
            '#####.#.#.#.#.###']
>>> print(minimize_for_maps(maps))
NOT H J
OR C J
AND B J
AND A J
NOT J J
AND D J

This produces a slightly longer, six-instruction solution, which I believe to be minimal.

Upping the ante

Why are we given a whole 15 instructions?

Well, probably because humans aren't powerful boolean-crunching machines, so we're given some room for inefficiency.

This begs the question: what does a truly hard, adversarial map look like?

As it turns out, we can generate one with Z3!

Previously, we were evaluating over a symbolic program, but binding it to a particular set of sensor readings. Now, we take the next step into the Symbol-Verse and make the tiles in the map variables as well. Here's a modified version of solve_for_map which accepts tiles as variables, rather than a map string:

def solve_symbolic_map(m, length=15):
    ''' m is an array of BoolRef tiles, which may be variables
    '''
    (j, prog) = symbolic_execution(length)
    jumps_at = []
    conds = []
    for (i,t) in enumerate(m):
        # We're in the air if we've jumped within 3 tiles
        airborn = Or(jumps_at[-3:])

        # If there's no tile underneath us, then we must be airborn
        conds.append(Or(t, airborn))

        # Sensor readings are based on the upcoming map:
        sensors = {c: i + j + 1 >= len(m) or m[i + j + 1]
                   for (j,c) in enumerate('ABCDEFGHI')}

        # We jump if we're not airborn and the program says to jump
        jumps_at.append(And(Not(airborn), j(sensors)))

    return (prog, And(conds))

We can use this to solve for a matching map and program:

>>> m = [Bool('t_%i' % i) for i in range(5)]
>>> (p,c) = solve_symbolic_map(m, 1)
>>> solver = Solver()
>>> solver.add(c)
>>> solver.check(c)
sat
>>> model = solver.model()
>>> model
[t_0 = True,
 t_1 = True,
 t_4 = False,
 reg_in_0 = 0,
 t_2 = True,
 t_3 = True,
 reg_out_0 = 1,
 op_0 = 2]
>>> decode(p, model)
'NOT A J'
>>> ''.join(['#' if model.eval(t) else ' ' for t in m])
'#### '

This has produced a relatively boring map and a relatively boring program, but they're compatible!

How can we use this to generate hard maps?

We'll need to use a hithero-unused Z3 feature, qualifiers, to make an assertion of the form

This map can be solved with n instructions, but not with n - 1 instructions

This is translated into Z3 in a rather literal fashion:

>>> m = [Bool('t_%i' % i) for i in range(10)]
>>> s = Solver()
>>> (p, q) = solve_symbolic_map(m, 1)
>>> vars = sum(p, []) # flatten the list
>>> vars
[op_0, reg_in_0, reg_out_0]
>>> s.add(Not(Exists(vars, q))) # This is the tricky part!
>>> s.add(solve_symbolic_map(m, 2)[1])
>>> s.check()
sat
>>> model = s.model()
>>> ''.join(['#' if model.eval(t) else ' ' for t in m])
'###### #  '

Now we're getting somewhere! We've found a map which can't be solved in a single instruction:

>>> solve_for_map('###### #  ', 1)
False
>>> solve_for_map('###### #  ', 2)
'OR G J\nAND I J'

We could extend this to a huge map and let Z3 think for a while:

>>> m = [Bool('t_%i' % i) for i in range(100)]
>>> (p, q) = solve_symbolic_map(m, 14)
>>> s = Solver()
>>> vars = sum(p, [])
>>> s.add(Not(Exists(vars, q)))
>>> s.add(solve_symbolic_map(m, 15)[1])
>>> s.check()
# and now we wait...

In my rough testing, this takes approximately forever: existential qualifiers are a harder class of problem, and there are too many free variables.

It's easier to generate an adversarial map incrementally:

Find a map which can be solved in N instructions
Solve for a set of T new tiles, appended to the end of the map, which make it only solveable in N + 1 instructions
Increment N by 1 and repeat

My implementation looks something like this:

m = [BoolVal(True)]
for instructions in range(2, 16):
    print("Solving for %i instructions" % instructions)
    while True:
        # Add 5 fresh boolean variables to the end of the map
        m += [Bool('t_%i' % (i + len(m))) for i in range(5)]
        print("   map size: %i" % len(m))

        s = Solver()
        (p, q) = solve_symbolic_map(m, instructions - 1)
        s.add(Not(Exists(sum(p, []), q)))
        s.add(solve_symbolic_map(m, instructions)[1])

        if s.check() == sat:
            model = s.model()
            # Update the map with the new tiles
            m = [BoolVal(bool(model.eval(i))) for i in m]

            q = ''.join('#' if i else ' ' for i in m)
            print("Found map '%s'" % q)
            break

Running this overnight on a friend's server, it finds a 131-tile map which can be solved in no fewer than 11 instructions:

(please don't try to solve this one by hand)

Generating an adversarial map which requires a full 15 instructions is probably out of my reach, but if anyone has clever ideas (or few thousand CPU-hours to spare), let me know!

Related work

This paper (and associated slides) talk about building a Z3 model for a full RISC-V CPU (!!).

If you like pulling things out of thin air, check out this blog post about using AFL to synthesize JPEGS.

ICFPC 2013 is a similar style of problem, and they also used Z3 to compare program equality.

Credits

Generating the adversarial map was made possible by Martin Galese; thanks for not questioning why I had a Python script using 100% of a CPU core on your workstation for days on end.

Pixel art is from the Kenney 1-Bit Pack.

Matt Keeter // Program synthesis with Z3