Game of Sway

Quantum computing for the whole family!

Play Sway

Grab some dice and let’s play!

You’ll need a d20, a grid, and black and white pieces.

20

The rules are simple:

Try it out! Place a piece on the board to begin.

Black: 0 White: 0

Place a stone on an empty cell.

Game Over

Take a moment to see if you can figure out a winning strategy.

Think you’ve got it?

Strategy

Before you say this is pure unadulterated chaos, I want you to know that at first, I also kept losing (no matter whether I played as black or white) and thought there was no strategy here. It turns out2 that’s exactly the evidence why some strategy must exist; otherwise, I should be winning 50% of all games. If stronger players keep beating weaker ones, there must be some exploitable structure hiding under the randomness.

So, I dreamt up the following rules-based AI strategies to play against each other. Each one has a name and a rough description of how it decides on the best move:

Who do you think will win?

From the tournament results in the demo above, we see while some strategies dominate others, the ranking between the top players is more nuanced. This “Go with dice” knockoff is getting interesting.

But these are all hand-crafted heuristics…

Can we find the optimal strategy? That’s going to require something a bit more… unconventional.

Solving the game

Those strategies above were just a shot in the dark. Time to throw some compute at this thing.

The board is only \(8 \times 8\), so there’s no more than 64 choices to make. But, for each possible choice, there’s a surprisingly large number of outcomes due to the environmental randomness. For example, after placing a piece on a board with 36 pieces, all 37 stones could have a nonzero probability of flipping, resulting in \(2^{37}\) (over 137 billion) possible outcomes in just one look-ahead.

In order to build a strong AI player, the following techniques are available, but not all are feasible:

Technique Example Application to Sway
Compute the exact solution Checkers solved in 2007 by the Chinook project3 Enumerating all possible game states is intractable
Decompose into independent subgames Late-stage Go, Winning Ways4 book examples Neighborhood influence prevents generic decomposition
Linear programming / minimax solvers Heads-up limit Texas Hold’em solved by Cepheus5 Intractable number of variables/constraints
Find a closed-form trick Nim’s XOR-based formula6 ? Possible, but game-specific tricks are all-or-nothing; no incremental progress until a proof is found, if one exists at all, and no guarantee it transfers to similar games
Learn value and policy priors Go / AlphaZero7 ? Possible, if we can mitigate massive number of self-play iterations
Monte Carlo rollouts Multi-armed bandit8 ? Possible, if we can mitigate massive number of rollouts

Monte Carlo rollouts seem promising.9 Basically, the technique is to gather data to better model the outcome probabilities. The more data you gather (i.e. more samples you take) the better.

Rollouts

Consider the following game state.

Hit “sample” to visualize the tremendous number of immediate outcomes from a single state.

A single rollout plays the game to completion, but how many rollouts do we need to feel confident about picking a particular move?

Well, it depends on how chaotic the board state is. Board states with a higher variance of outcomes require more rollouts. Specifically, given variance \(\sigma^2\) and a desired precision \(\epsilon\), the number of rollouts needed per move is on the order of10

\[ O\left(\frac{\sigma^2}{\epsilon^2}\right) \]

That could be a lot of rollouts.

Consider this early-game position. Press “Start rollouts” to run rollouts for every candidate move. Watch the top two moves emerge on the board and the gap between them narrow on the number line.

Black to move

The variance in Sway is gnarly. The gap between good moves is tiny. The number of samples needed grows fast!

What if there were a way to sample all those outcomes at once?

Quantum superposition

Here’s the wild part. While a classical computer performs rollouts one by one, a quantum computer puts all those possible games into superposition, which is a single quantum state that encodes every outcome at once.

Where classical Monte Carlo needs \(O(\sigma^2/\epsilon^2)\) rollouts per move to estimate an expectation, fault-tolerant quantum amplitude estimation11 can do it in \(O(\sigma/\epsilon)\) oracle calls. That’s a quadratic reduction! For example, 160,000 rollouts reduces to just 400!

It’s not exactly that simple, though. Even if a powerful fault-tolerant quantum processing unit (QPU) existed, if the problem doesn’t have the right structure, the QPU adds overhead for zero gain. More on that below.

Consider the GPU: it earns its place in a system because certain workloads (matrix multiplications, convolutions, ray tracing) map onto its massively parallel architecture so well that no amount of CPU clock speed can compete.

CPU sequential branching, I/O offload GPU massively parallel cores

Remove that structural match and the GPU sits idle or, worse, slows things down.

Similarly, a fault-tolerant QPU also has its own strengths and weaknesses. Amplitude amplification12, amplitude estimation, quantum walks, and a handful of linear-algebra routines let a QPU replace time-consuming sampling as long as the problem already has the combinatorial structure those primitives exploit.

CPU sequential branching, I/O offload QPU superposition over stochastic outcomes

While most problems do not need a quantum computer, this one does!

Quantum advantage

When does quantum mechanics reduce the fundamental work in AI search? Three conditions must hold at once:

  1. Sampling is inherent. The environment is stochastic and no classical trick can eliminate the need to average over random outcomes. (Like the environment’s turn in Sway)
  2. Precision is the bottleneck. The differences between good and great decisions are tiny, forcing enormous sample counts. (We’ve just seen this in action above)
  3. The oracle must be worth building. The dynamics are simple enough to compile into reversible, coherent transitions without drowning in overhead. (Sway’s rules are simple enough to be built in a quantum circuit)

The usual benchmark problems fail at least one:

Sway passes all three. The environment turn is irreducibly stochastic. The repeated Sway events attenuate the marginal impact of any single move. And the rules are local: each cell only checks its immediate neighbors, making the transition cheap to compile into reversible logic.

Can’t unsee it

Now that you think of it, the pattern of “neighbors reinforce you, isolation makes you vulnerable, and the environment shakes things up” does seem to pop up everywhere.13 Here are a few that caught my eye:

It’s a toy game with toy rules. But it’s interesting that the same tension between local stability and global randomness creates hard computational problems. If quantum speedups apply to one, maybe they apply to others. That’s a question I’d love to see explored.