Recombination and other samplers
Redistricting is the problem of partitioning a set of geographical units into a fixed number of districts, subject to a list of often-vague rules and priorities. In recent years, the use of randomized methods to sample from the vast space of districting plans has been gaining traction in courts of law for identifying partisan gerrymanders, and it was used as an analytical tool by several legislatures and independent commissions.
The mathematics of sampling graph partitions is interesting in its own right, and there has been enormous progress in recent years. Most of the new methods use an idea that comes from this research group: that spanning trees provide a computationally efficient way to partition a graph, because deleting a single edge cuts the graph into two pieces. We introduced the idea in 2018 and released a supporting software package called GerryChain the same year. Now the spanning tree partition step can be found in several different kinds of samplers, including those introduced by Jonathan Mattingly’s group at Duke and Kosuke Imai’s group at Harvard.
Here are a few key papers from the lab on samping graph partitions…
Recombination
This survey paper by DeFord-Duchin-Solomon introduces recombination (or “ReCom”), a big-step Markov chain method with a spanning-tree step for bipartitioning. We set up redistricting as a graph partition problem and introduce a new family of Markov chains on the space of graph partitions, all of which enable approximate sampling from the spanning tree distribution on partitions, which we argue is a natural target for the redistricting problem. The main point of comparison is the “Flip” chain that was in common use before this time, in which the proposal randomly changes the assignment label of a single node at a time. We present evidence that ReCom mixes efficiently, especially in contrast to the slow-mixing Flip, and provide experiments that demonstrate its qualitative behavior. We demonstrate the advantages of ReCom on real-world data and explain both the challenges of the Markov chain approach and the analytical tools that it enables. We close with a short case study involving the Virginia House of Delegates — paper
Reversible ReCom
The next paper by Cannon-Duchin-Randall-Rule introduces an updated Recombination chain, addressing the fact that the simpler recombination methods from the original survey had only an approximate description of their steady state. In this paper, we modify ReCom slightly to give it a property called reversibility, resulting in a new Markov chain, nicknamed RevReCom. This new chain converges to the simple, natural distribution that ReCom was originally designed to approximate: a plan’s stationary probability is proportional to the product of the number of spanning trees of each district. This spanning tree score is a measure of district “compactness” (or shape) that is also aligned with notions of community structure from network science. After deriving the steady state formally, we present diagnostic evidence that the convergence is efficient enough for the method to be practically useful. In addition to the primary application of benchmarking of redistricting plans (i.e., describing a normal range for statistics), this chain can also be used to validate other methods that target the spanning tree distribution — paper
Comparing Samplers
- In a preprint from 2024, Cannon-DeFord-Duchin study the McCartan-Imai Sequential Monte Carlo (SMC) method recently introduced for political redistricting — paper. In particular, we try to understand the reasons and consequences of districts being repeated frequently in SMC samples.
- DeFord-Najt-Solomon showed in 2019 that even approximate sampling from the uniform distribution is intractable unless RP=NP — paper
- Donnay-Kahle elaborate on the idea that the uniform distribution on partitions is a bad choice for real-world ensembles, showing the extent to which it favors non-compact plans — paper
Related Software
- gerrychain is the Lab’s Python codebase that implements MCMC methods for sampling and analyzing legislative maps — GitHub repo
- gerrytools is a companion package to gerrychain intended to help generate common forms of visualizations — GitHub repo
- pcompress helps store MCMC runs to allow for replicability — GitHub repo
- Binary Ensemble compression (BEN) and its extreme cousin XBEN go even farther than pcompress to improve storage of general ensembles of redistricting plans — GitHub repo
- FRCW (“Fastest ReCom Chain in the West”) is an implementation of ReCom in the high-performance language Rust, which allows for significant speed improvements over Python — GitHub repo