A newer, smarter geography package

We are excited to announce a major update of the Python package MAUP, our geospatial tookit for redistricting data. The name of this package comes from the “modifiable areal unit problem” in geography: the same spatial data will look different depending on how you divide up a space into units.

In redistricting, different data often comes attached to different geographies: various population data from the U.S. Census Bureau is reported at the level of census blocks, block groups, and/or census tracts, while election results are usually reported by voting precincts. The MAUP package contains tools to aggregate, disaggregate, and transfer data between units so that data from different sources can be aligned, often by using census blocks (the smallest units) as a kind of common denominator. This is a big improvement over the default spatial data transfer functions in standard software like ArcGIS, which tend to assign values proportional to areas of overlap. Areal interpolation is often exactly wrong for the kinds of data we work with in redistricting, where the largest units at a given level often have the fewest people.

The process of moving data between units is often complicated by topological issues in shapefiles (the common file format for representing geographic units). Precinct shapefiles, in particular, are often created by stitching together collections of polygon geometries sourced from different counties at different timestamps, or are simply done without prioritizing mathematically precise nesting of subunits or matching of coincident boundaries. As a result, official shapefiles often have unintentional gaps or overlaps between precincts. These issues can pose problems when moving data between shapes. Version 2.x of MAUP provides two repair functions with a variety of options for fixing these issues:

1. quick_repair is the new name for the autorepair function from MAUP version 1.x (and autorepair still works as a synonym). This function makes straightforward repairs to gaps and overlaps.

2. smart_repair is a more sophisticated repair function designed to recover the “true” adjacency relations between geometries as accurately as possible, under the assumption that the shapefile is a “noisy” version of a clean ground-truth tiling. In the case of gaps that adjoin several geometries, this is accomplished by an algorithm that first subdivides the gap into pieces using ideas from convex geometry, instead of assigning the whole gap to a single adjacent unit. smart_repair also includes an option to ensure that repaired geometries nest cleanly into some larger units (e.g., to ensure that repaired precincts nest cleanly into counties).

MAUP 1.x was originated by Max Hully and substantially updated by Max Fan; MAUP 2.x is the work of Jeanne Clelland (jeanne.clelland@colorado.edu). More details are available in the updated documentation, Jupyter notebooks to explore MAUP can be found in github, and a scientific paper about the ideas in smart_repair is available on ArXiv.