Once more Gridlandia has gotten a little bigger, and now our task is to examine drawing seven
connected,
equal-size districts (we're going to skip the 6x6 grid because we don't want to have to worry
about ties). We'd
like to be able to do a similar analysis as in the 5x5 case, where we can examine a proposed
plan and a
distribution of voters to see if there is evidence that the plan was intentionally selected
to favor or disfavor
one of the parties. This time, however, we don't have a full list of all of the legal plans.
We know that
there are 158,753,814 of them, and even with a careful
encoding scheme, a
file containing all of these would be a couple of gigabytes. But this is okay! If we have
trouble writing down
all of the legal districtings of a 7x7 grid, there is no way we'll be able to do such a thing
for a real
jurisdiction, so we can use this as an opportunity to develop some **sampling
techniques**.

How do we generate a random, and hopefully representative, sample of plans? There are two really simple techniques which are terrible on their own, but when combined become very powerful. The first is to have a computer generate district assignments of the cells by a uniformly random process and keep the ones which satisfy the criteria for being legal districting plans. What's great about this is that each valid plan occurs with equal probability. What's not so great is that this probability is extremely tiny. While there are a very large number of legal plans, there are way, way, way more labellings than legal plans—about $10^{33}$ of them. Even if we had a computer that could generate 1,000 attempted plans per second, we wouldn't expect to hit a valid one within a human lifetime, or the duration of life on earth.

The second strategy is to use a human to produce plans. This has the advantage of not wasting time drawing and checking labellings which are not legal plans. The downside is that, in the grand scheme of things, humans are pretty slow at this task. Some of our brave researchers tried to write down all of the legal plans for the 4x4 grid and, although they succeeded, it did take quite a few hours of time.

How can we combine these? What we can do is **start with a human-drawn (or
human-assisted) plan**,
then use a computer to **randomly modify it**. Remember when we defined two
plans to be **adjacent
in the metagraph** if we could transform one into the other by swapping two cells in
adjacent districts?
Since there are only 1176 ways to randomly pick a pair of cells to try to swap in a 7x7 grid,
a computer can very easily move from one plan to the next, which is exactly what is
happening in the animation below. Every half-second, an algorithm randomly picks two cells
and checks if
exchanging their district assignments yields a legal districting plan. If so, it moves to
that plan. If not, it
tries again.

We are doing a **random walk** on the metagraph. Each plan is represented by a
node in the
metagraph. Every half-second, the walk takes a random step to one of the neighbors of the current
plan. This algorithm
is remarkable for three important reasons. The first is its simplicity—the demo above
is running live off
of
about 50 lines of JavaScript, in your browser, right now. The second is the effectiveness and
speed. While it
would take a long time to enumerate *all* of the legal districting plans, we can
definitely
enumerate a huge chunk of them very quickly. The third is that the process often rapidly
converges to a truly
representative sample of the full universe.

### Introducing MCMC: A Demo

This kind of sampling algorithm is called a * Markov chain Monte Carlo*
method, or MCMC
for short. These are incredibly powerful tools for sampling, approximation, and optimization
in settings where it
is difficult to get your hands on the object you're interested in, such as the space of all
valid districting
plans.

Once again, you can select a distribution of voters below. This time you can design a plan by clicking and right-clicking the cells to change their district assignment, or press the 'Random Plan' button to have the random walk choose one for you. Once you have a valid plan, press the 'MCMC Sampler' button to generate a histogram. The Blue bar represents the number of seats your distribution of voters the Hearts Party wins under your selected plan, and the histogram bars show the seat shares for a sample of 1,000 plans generated by the MCMC random walk. If the Blue bar is to the left of most of the mass of the histogram, it means that the nearby sample found a lot of plans that gave more seats to the Hearts Party than yours. If the Blue bar is to the right of most of the mass, it means that the sampler found a lot of plans which give more seats to the Clubs Party than yours.

#### Design a distribution $\Delta$

Click cells to toggle

#### Design a plan $\mathcal D$

Click cells change their district assignments

#### Is your plan an outlier for this distribution?

Number of ♥ seats

If you leave your distribution of voters alone and click the Sample button multiple times,
the histogram will
change, but not by too much, and the general shape will still be the same. This is because
even though 1,000 is a
very small number compared to 158,753,814—less than one percent of one
percent—it's still a large
enough sample of
plans to be a fairly **representative** of the whole collection of plans. How to
do this in the real
world where we need to draw districts on states and municipalities which may have hundreds or
thousands of cells
is a hot area of research in redistricting.

Sampling real-world districting plans is of course much more complex than sampling on the grid, but fundamentally the idea of the random walk is the same. When you change the details of how you conduct the random walk, you change the "stationary distribution" of the walk—in other words, it changes the probability of seeing any particular plan after you do the random walk for a long time. The scientific community is currently spending lots of energy to understand the details of how to conduct these walks to come to the strongest conclusions about the universe of redistricting possibilities.