Search About Newsletters Donate
Feature

How We Analyzed the Justice Department’s Death in Custody Data

A rare look into data collected under the Death in Custody Reporting Act revealed serious problems.

An illustration shows a bar graph with bars of varying heights. Leaning at the base of some of the individual bars are figures of people dressed in teal-colored jumpsuits. The bars and the people cast shadows at an angle on the ground.

A quarter-century ago, Congress passed the Death in Custody Reporting Act (DCRA), which mandated that the Justice Department collect information from the states about everyone who dies in law enforcement custody. The Marshall Project’s analysis of the data collected under that legislation shows serious, systematic deficiencies with real-world stakes that dramatically limit how DCRA data can be used to improve conditions and prevent future in-custody deaths.

How we got the data

On Nov. 20, 2024, The Marshall Project accessed a page on the Bureau of Justice Assistance’s website that displayed high-level, aggregated data about people who died in law enforcement custody. The data was displayed in tables showing, for example, the cumulative totals for all the different ways people died in custody in various years.

Screenshot of a table on the BJA's website showing data collected under the Death in Custody Reporting Act.
A table from the Bureau of Justice Assistance’s website on the page where The Marshall Project downloaded a full, unredacted dataset of in-custody death records collected under the Death in Custody Reporting Act.

This information was initially collected as individual-level data by state reporting agencies and sent to the Bureau of Justice Assistance, as per the requirements of the Death in Custody Reporting Act, which mandates an accounting of everyone who dies in America’s prisons and jails or during arrests made by law enforcement officials.

However, it was possible for a user to view, and download, the raw records behind the high-level figures.

The individual-level data was left exposed in the tables the Bureau of Justice Assistance published. To obtain it, we right-clicked on the “Grand Total” row of a table on the page titled “Figure 1: Deaths by Location Type & Fiscal Year” and then clicked through to a menu for viewing the source data. From there, we clicked the “Full Data” button, selected the option to show all of the fields in the dataset and then downloaded the data.

Screenshot of a table on the BJA's website showing data collected under the Death in Custody Reporting Act.
The table displayed on the Bureau of Justice Assistance’s website that The Marshall Project used to download the data used in this analysis.

Officials from the Office of Justice Programs, the Justice Department agency that operates the Bureau of Justice Assistance, did not respond to questions about whether they intended to leave this data accessible to the website’s visitors.

However, shortly after we downloaded this data, but before we informed the agency that we had obtained it, the web page was changed to no longer allow this method of accessing individual-level data.

Data

The dataset we downloaded contained 25,393 rows, each one for a person who died in custody and whose death was reported to the federal government. The time frame of our data spanned from Oct. 1, 2019, to Sept. 30, 2023, and included the following fields:

Analysis

Our analysis largely centered on identifying data quality problems with the information as it appeared in our download.

We identified nearly 700 individuals who had died in law enforcement custody but were not present in the dataset, and entire states, like Mississippi, that had reported almost zero deaths in their prisons or jails. There were thousands of records lacking any basic information about the cause or location of death. There were hundreds that did not note the law enforcement agency that held the person in custody or the race or ethnicity of the person who died.

A review of a random sample of around 1,000 entries found that more than three-quarters did not meet the federal government’s own criteria for how a death should be recorded.

Our analysis, which is described in detail below, was largely conducted manually. It was verified through a combination of manual and programmatic methods.

Understanding missing deaths

When someone dies in custody, their death is first recorded by a local law enforcement agency, state prison or county jail. Under the law, that information is supposed to be passed up to a state-level agency, which aggregates those reports and sends them to the Justice Department.

To get a sense of deaths missing from the dataset that should have been included, we compared names in the dataset with a list of people who died in law enforcement custody compiled from news reports collected by activists, Loyola University New Orleans’ Incarceration Transparency project, and Marshall Project readers who reached out to us to share stories of their loved ones who died in custody.

We identified 681 deaths on our list that were not present in the DCRA dataset. Our comparison list was largely focused on deaths in Louisiana, Alabama and South Carolina, since those are the states for which Incarceration Transparency collects death information through public record requests. Slightly over half of these missing records were for people who died in Louisiana.

We conducted this matching process manually, searching for the name of a deceased individual, while filtering for the state in which they died. We allowed for things like typos or the use of nicknames.

Once the initial matching process was complete, we augmented our findings by using a fuzzy matching algorithm to attempt to match the names we were not able to find in DCRA through a manual search.

We started by identifying pairs of names we believed represented the same person, but did not match exactly due to slight spelling differences. We confirmed that the slightly mismatched names represented the same person based on additional details, like the state where the death occurred and the year the person died.

As a way of judging similarity, we then measured the Damerau-Levenshtein distance of these manually paired records, and found 0.5 to be the maximum distance between a pair of records that we considered to be a manual match. We then identified all of the name pairs within that distance to find more potential matches that were missed during the manual process. We identified records with this Damerau-Levenshtein distance that also included an exact match for the state of death, the year of death and at least the first or last name. This programmatic approach identified 34 additional matches, bringing the number of missing names to 681.

We checked 1,847 names through this process, comparing the federal government list against external sources gathered by academics, activists and journalists; however, because our comparison list is far from comprehensively representing everyone who died in custody across the country, it is not advisable to use the proportion of missing names to be representative of the entire universe of records missing from DCRA.

Through interviews with experts on in-custody death tracking, we learned the transition of the DCRA program from being administered by the Bureau of Justice Statistics to being the responsibility of the Bureau of Justice Assistance coincided with a drop-off in data quality. We attempted to evaluate these claims by comparing the proportion of missing names in the data we downloaded from the Justice Department’s website, which was collected under the Bureau of Justice Assistance, with recently released data from the program’s Bureau of Justice Statistics era.

Earlier this year, in response to a lawsuit filed by USA Today, the Justice Department publicly released DCRA records stretching from 2015 through 2019, when the data collection was done by the Bureau of Justice Statistics.

We repeated the process of checking for missing names with the Bureau of Justice Statistics-era data using death information from Incarceration Transparency collected in Louisiana, Alabama and South Carolina, to keep things consistent. We found:

This suggests, at least by this measure, that the quality of DCRA data has degraded over time.

Data quality issues

Our initial exploration of the dataset suggested that its problems weren’t limited to missing records. We identified myriad instances where the data that was present seemed grossly inadequate. For example, there were 25 instances where the “Brief Circumstances” field, which according to federal standards should contain a short description of how the person died, contained only a single asterisk.

To get a sense of the prevalence of these issues, we employed a random sampling technique to select 1,023 deaths listed in the dataset. We then had two Marshall Project journalists assess the information in the brief circumstances field for each of those selected records.

A guide released by the Bureau of Justice Assistance states that each entry should include “the brief circumstances surrounding each decedent’s death.” The guide lays out that the following details should be included in every entry:

For example, the guide lists “Overdose” as an insufficient description, instead suggesting the following as an example of a sufficient alternative:

“On 02/08/2021 at 05:20 p.m., John Doe was found having a medical emergency. Officers notified on-site medical personnel who quickly responded to the scene and began life saving measures, performing CPR, and applying an AED. John Doe was transported to the local hospital but expired at 07:59 p.m. Autopsy reports list the cause of death as respiratory complications of COVID-19 with contributing factors of pulmonary calcifications and hypertensive cardiovascular disease.”

We counted as insufficient all cases where both human reviewers felt the brief circumstances description did not meet the minimum requirements as laid out in the guide — 786 out of the 1,023 we checked, or 77%. The two reviewers agreed on categorizations in 945 instances and disagreed in 78. We did not count as insufficient the records where the two reviewers disagreed.

We identified other issues with the DCRA dataset by running keyword searches and simple filtering processes — for example, 76 of the 110 entries in the state of Virginia had the name of the individual listed as “Decedent,” 50 people in Illinois were listed as “Unknown” and around 4,000 entries in the brief circumstances field were simply “unknown,” “unavailable” or “N/A.”

How to work with us

We have decided not to publicly release the entire list of names, due to privacy considerations for the families of incarcerated individuals. However, if you are a journalist or researcher interested in reporting on, or researching, deaths in custody using this dataset, please fill out this form.

If you’re interested in learning more about reporting on in-custody deaths, check out our guide for journalists, published as part of The Marshall Project’s Investigate This series.

Acknowledgements

Thanks to Andrea Armstrong (Loyola University New Orleans) and Michael Lavine (University of Massachusetts Amherst) for assistance in obtaining and analyzing the data in this analysis.

Tags: Dying Behind Bars Death in Custody Reporting Act Prison Death Police Death Deaths in Custody methodology Prison and Jail Conditions crime data George Floyd