The analysis is based on the period from 2016 through 2021; exceptions are noted in specific articles.
The Marshall Project scraped criminal case records from the Cuyahoga County Clerk of Courts’ Search Selection and Entry site, which provides web-based access to basic information about criminal cases. It includes defendant information like race and home address, along with case dockets that include descriptions of events like sentencing and links to original PDF filing documents. The scraper has run on and off from May 2021 through January 2022. The Marshall Project will never make public personally identifying information using this data.
This data was loaded into a PostGIS database. Defendants’ home addresses were geocoded with geocod.io and joined with geographies from other sources such as Cuyahoga Board of Elections precincts (to compare with election results), and U.S. Census places (to compare with population demographics). These spatial joins form the bulk of the analysis used in this piece.
The court provided us with a list of case numbers spanning 2016 to 2021. We used this list to audit our database of scraped cases and ensure our scrape was a complete record of all cases seen by the court in this timeframe. Over 98% of our data matched the case numbers provided by the court. The handful of mismatches represented cases that had either been sealed, or, we believe, recently expunged. Since the details of expunged cases are no longer open to the public, the court was not free to confirm the status of certain cases that were missing from the court’s list of case numbers but captured by our scraper.
To calculate the disparity in outcomes for common charges like theft and drug possession, multiple techniques were used. One approach used a natural language classifier to determine outcome. Another used a simple flag to determine if a case ended in the defendant being sent to prison and applied more restrictive criteria, only considering cases with a single count of the charge. A third approach employed a dataset of cases from 2009 to 2019 obtained and processed by Lawstata that includes a count of defendants’ prior cases. Using this dataset, we applied similar criteria, and filtered based on a maximum number of priors, looking at scenarios where the defendant had a maximum of zero priors, as well as scenarios with one and two prior cases. All techniques show similar variation between judges.
Voting data and precinct boundaries were obtained from the Cuyahoga County Board of Elections for the 2016, 2018 and 2020 general elections. Top-level voting figures such as who ran and total votes cast were also cross-referenced with Ballotpedia’s election results pages (2016, 2018, 2020).
The drop-off of people who voted for president but did not vote in a judicial race is precisely captured in the results for a single race. However, because there are multiple judges’ races on the ballot, participation varies between races. To calculate the drop-off of people who voted for president but did not vote for judicial candidates in a given election, we calculated the average drop-off among all contested judicial races. Uncontested races were excluded to avoid distorting participation due to races where voters only have a single choice.
The 2020 county precinct map was used to identify the precinct where a defendant’s address was located.
To calculate the number of incarcerated people from Cuyahoga County, we used institutional census reports from the Ohio Department of Corrections that detail the gender and race of incarcerated people broken down by the county where they were convicted. The latest data is from January 2021.
Other data sources
Demographic data is drawn from the American Community Survey’s five-year estimates for 2016-2020 to best match the 2016-2021 timeframe used throughout the analysis. To ascertain the adult population in Cuyahoga County voting precincts, we used 2020 decennial census data from the U.S. Census Bureau.