As of 2020, millions of formerly incarcerated people in 13 states had recovered their right to vote. After the November general election, my co-reporter, Nicole Lewis, was curious to understand the impact formerly incarcerated voters had on the election — particularly local elections. She knew that millions of people were newly eligible to vote, but she had no idea how many had registered and how many turned out.
We both knew it wasn't going to be a straightforward request because few states actually track how many formerly incarcerated people register to vote.
We knew that we needed the voter rolls. Thankfully, Nicole was able to get the list of potentially eligible voters in some states. In states that did not have a list of eligible voters, a list of all people released from prison served as a proxy.
We found that no more than 1 in 4 formerly incarcerated voters had registered to vote in the recent election, in four key states where we were able to obtain voter registration records along with some form of release records. Compare that to the general population, where 3 in 4 eligible voters registered to vote.
We initially thought the story was about formerly incarcerated people who turned out to vote in the most recent election. The story that emerged from the analysis was about the barriers to registration. We saw low numbers of registrants in our data analysis, and as we spoke to organizers about it, we repeatedly heard about newly eligible voters being unaware their rights had been restored.
From a technical standpoint, the biggest challenge was cleaning the data. We received records in various formats (PDF, CSV, TSV), preventing one-size-fits-all processing.
From a reporting standpoint, a challenge was identifying which state agency had the records we wanted and negotiating a release with them. In Arizona, for example, we received the release records — but the voter registration records would have cost about $6,000, meaning we had to exclude that state from the analysis. And in some cases, states don’t track who has been re-enfranchised — only who has been released.
This analysis is highly reproducible, and we hope to see journalists in other states asking these questions about a growing and important voter pool.
How can you get the data
Everything hinges on obtaining the right data.
You need to do some reporting to figure out who has the records that you want. The voter registration records should be easy enough to pinpoint: Your state’s Board of Elections, or its equivalent, should have them.
It’s a little trickier to get the list of newly eligible voters. The National Conference of State Legislatures website keeps an up-to-date page about the voting rights of people convicted of felonies, including a table with the current status in every state and a summary of recent changes. You can get a sense of the legal status on that page, and then you can start by contacting organizers in your state who are focused on voting rights and increasing voter turnout for underrepresented groups. They will likely know about changes, either in the law or through executive action, that affect the formerly incarcerated.
It is possible that the state’s department of corrections will have the list. If that’s not the case, then perhaps the Board of Elections will have it. If voting rights were restored through the executive order, like in Kentucky, then the governor’s office might have a list to share.
If your state does not have a list of newly eligible voters, then you will want to use releases from prisons as a proxy for newly eligible voters. The state’s department of corrections should have the list. At a minimum, you will need a full name, birthday or age on the day of release along with the release date.
Each state will set its own criteria for who is eligible to vote. Some might specify that certain, violent felony convictions or having out-of-state convictions disqualify someone from re-enfranchisement. These nuances will be important to your reporting as you explore barriers to registration. Many people we interviewed described being confused or overwhelmed by these intricacies.
As for the data gathering, this level of detail isn’t critical since you will be asking for the same lists irrespective of the criteria for eligibility (either a list of the newly eligible or the recently released).
How can you analyze the data
With both the newly eligible voter records (or a list of prison releases) plus the voter registration records in hand, you will need to take the following steps:
1. Extract data from PDFs
If you received any of the records as PDFs, you will need to put them into some structured data format (Excel, CSV, TSV, etc). We used Tabula to convert the PDFs into CSVs.
2. Clean up the data
Once you have your structured data, you will probably need to do some cleaning and reorganizing of the columns to get all the values lined up under the right column names. For example, if there are first, middle and last name columns, you might find a first name in the middle name column and a middle name in the last name column. We used Python to whip the data into shape.
After converting all your data and getting it into the right columns, you will need to compare how each dataset handles names. Take note of whether there are all capital letters in the names in one dataset but standard capitalization in the other, or whether the names are in one column or two or three separate columns. The goal is to determine if you need to change the structure of the name in your newly eligible voter file or prison release file to exactly match the structure of the names in the voter registration file.
You will do the same for the birthdate column as you did for the name column, making sure that the structure of the data in both files matches. You will need to consider if the dates are MM-DD-YYYY or YYYY/MM/DD, for example. (If you don’t have a birthdate column in your eligible voter file and your voter registration file, then read the note below.)
NOTE: If you do not have a birthdate column, as it often happens, you will at the very least need an age at release and a date of release. The two will allow you to approximate a birth year for each person in your prison release records, which you can then use to match on the birth year in the voter registration file. We used Python’s relativedelta to calculate the time difference, since it takes into account leap years and other time attributes.
At this point, you should have matching name columns where the data structures are comparable, and you should have either matching birthdate columns or birth year columns based on the time difference between age at release and date of release. If so, then you are ready to move on to the matching.
3. Merge the data
In Python, we used Pandas to merge the two datasets. A few things to expect: repeated names, people with the same names and different birthdays or birth years. You will need to use your own judgement about how best to continue winnowing down this list. One of our prison release records also included a unique identifier for each formerly incarcerated person, so we de-duplicated that column. We also de-duplicated the full name.
We’ve provided sample code for cleaning and matching to get a sense of how we handled preparing the data and matching the names.
Initially, we thought we might need to use some form of fuzzy matching tool like OpenRefine or dedupe. It turned out that the records were clean enough to match simply by cleaning up capitalization and restructuring the fields for consistency. It seems likely this is the case in most places, but if your names are particularly messy, you may need to use such advanced techniques.
4. Some caveats
After taking all these steps, you won’t get a perfect list. It is possible that some people were missed because of data entry issues, and if you don’t have birthdate and only birth year in either dataset, then you are lacking some useful precision.
To address the uncertainty, we overmatched release records to voter records. If there was any chance that the formerly incarcerated person named Duane Felipe could be one of the Duane Felipes who registered to vote, we counted that as a match. In reality, there is a chance that none of the Duane Felipes who registered to vote is the one we’re looking for.
This approach allows you to set a conservative upper bound on how many formerly incarcerated people registered to vote. It does not allow you to say exactly how many people registered to vote in your state. We were careful to note in the story that there were no more than a certain number of people who had registered to vote. We were equally careful to describe numbers in rough, approximate terms such as “1 in 4.” That’s more reader-friendly, but it’s also more accurate because we know the data is approximate as well.
Where to look for your story
The story you tell depends on the proportion of people with felonies who are eligible to register compared with those who did register in your state. A lower number compared to your state’s general voter registration rate might indicate there are obstacles and barriers to registration. A higher number might indicate that something is working, and perhaps the questions will focus more on the impact of these voters' on local or state elections.
As more people are released from prisons and states continue to re-enfranchise the formerly incarcerated, it is important to report on what states are doing to notify them about their right to vote. In Kentucky, the department of corrections is notifying the newly eligible upon release and through their probation officers. Although organizers in Kentucky raised issues with the forms used by the department, there are corrections departments in other states, like Nevada, that aren’t doing anything to notify the formerly incarcerated about their eligibility.
The formerly incarcerated are a growing voter demographic, numbering the millions. Experts expect that they will begin to play a prominent role in upcoming elections. As such, at The Marshall Project, we believe that tracking their participation and the efforts made to help these newly enfranchised voters take part in our democracy is an increasingly relevant beat for reporters.