Quantum computing for the whole family!
Grab some dice and let’s play!
You’ll need a d20, a grid, and black and white pieces.
The rules are simple:
Try it out! Place a piece on the board to begin.
Place a stone on an empty cell.
Take a moment to see if you can figure out a winning strategy.
Think you’ve got it?
Before you say this is pure unadulterated chaos, I want you to know that at first, I also kept losing (no matter whether I played as black or white) and thought there was no strategy here. It turns out2 that’s exactly the evidence why some strategy must exist; otherwise, I should be winning 50% of all games. If stronger players keep beating weaker ones, there must be some exploitable structure hiding under the randomness.
So, I dreamt up the following rules-based AI strategies to play against each other. Each one has a name and a rough description of how it decides on the best move:
Who do you think will win?
From the tournament results in the demo above, we see while some strategies dominate others, the ranking between the top players is more nuanced. This “Go with dice” knockoff is getting interesting.
But these are all hand-crafted heuristics…
Can we find the optimal strategy? That’s going to require something a bit more… unconventional.
Those strategies above were just a shot in the dark. Time to throw some compute at this thing.
The board is only \(8 \times 8\), so there’s no more than 64 choices to make. But, for each possible choice, there’s a surprisingly large number of outcomes due to the environmental randomness. For example, after placing a piece on a board with 36 pieces, all 37 stones could have a nonzero probability of flipping, resulting in \(2^{37}\) (over 137 billion) possible outcomes in just one look-ahead.
In order to build a strong AI player, the following techniques are available, but not all are feasible:
| Technique | Example | Application to Sway |
|---|---|---|
| Compute the exact solution | Checkers solved in 2007 by the Chinook project3 | ✘ Enumerating all possible game states is intractable |
| Decompose into independent subgames | Late-stage Go, Winning Ways4 book examples | ✘ Neighborhood influence prevents generic decomposition |
| Linear programming / minimax solvers | Heads-up limit Texas Hold’em solved by Cepheus5 | ✘ Intractable number of variables/constraints |
| Find a closed-form trick | Nim’s XOR-based formula6 | ? Possible, but game-specific tricks are all-or-nothing; no incremental progress until a proof is found, if one exists at all, and no guarantee it transfers to similar games |
| Learn value and policy priors | Go / AlphaZero7 | ? Possible, if we can mitigate massive number of self-play iterations |
| Monte Carlo rollouts | Multi-armed bandit8 | ? Possible, if we can mitigate massive number of rollouts |
Monte Carlo rollouts seem promising.9 Basically, the technique is to gather data to better model the outcome probabilities. The more data you gather (i.e. more samples you take) the better.
Consider the following game state.
Hit “sample” to visualize the tremendous number of immediate outcomes from a single state.
A single rollout plays the game to completion, but how many rollouts do we need to feel confident about picking a particular move?
Well, it depends on how chaotic the board state is. Board states with a higher variance of outcomes require more rollouts. Specifically, given variance \(\sigma^2\) and a desired precision \(\epsilon\), the number of rollouts needed per move is on the order of10
\[ O\left(\frac{\sigma^2}{\epsilon^2}\right) \]
That could be a lot of rollouts.
Consider this early-game position. Press “Start rollouts” to run rollouts for every candidate move. Watch the top two moves emerge on the board and the gap between them narrow on the number line.
The variance in Sway is gnarly. The gap between good moves is tiny. The number of samples needed grows fast!
What if there were a way to sample all those outcomes at once?
Here’s the wild part. While a classical computer performs rollouts one by one, a quantum computer puts all those possible games into superposition, which is a single quantum state that encodes every outcome at once.
Where classical Monte Carlo needs \(O(\sigma^2/\epsilon^2)\) rollouts per move to estimate an expectation, fault-tolerant quantum amplitude estimation11 can do it in \(O(\sigma/\epsilon)\) oracle calls. That’s a quadratic reduction! For example, 160,000 rollouts reduces to just 400!
It’s not exactly that simple, though. Even if a powerful fault-tolerant quantum processing unit (QPU) existed, if the problem doesn’t have the right structure, the QPU adds overhead for zero gain. More on that below.
Consider the GPU: it earns its place in a system because certain workloads (matrix multiplications, convolutions, ray tracing) map onto its massively parallel architecture so well that no amount of CPU clock speed can compete.
Remove that structural match and the GPU sits idle or, worse, slows things down.
Similarly, a fault-tolerant QPU also has its own strengths and weaknesses. Amplitude amplification12, amplitude estimation, quantum walks, and a handful of linear-algebra routines let a QPU replace time-consuming sampling as long as the problem already has the combinatorial structure those primitives exploit.
While most problems do not need a quantum computer, this one does!
When does quantum mechanics reduce the fundamental work in AI search? Three conditions must hold at once:
The usual benchmark problems fail at least one:
Sway passes all three. The environment turn is irreducibly stochastic. The repeated Sway events attenuate the marginal impact of any single move. And the rules are local: each cell only checks its immediate neighbors, making the transition cheap to compile into reversible logic.
Now that you think of it, the pattern of “neighbors reinforce you, isolation makes you vulnerable, and the environment shakes things up” does seem to pop up everywhere.13 Here are a few that caught my eye:
It’s a toy game with toy rules. But it’s interesting that the same tension between local stability and global randomness creates hard computational problems. If quantum speedups apply to one, maybe they apply to others. That’s a question I’d love to see explored.