This project is part of the Smart Chicago Collaborative's Civic Works Project, a program funded by the Knight Foundation and the Chicago Community Trust to spur and support civic innovation in Chicago.

Our partners for this project are the Chicago Justice Project, a nonprofit research organization, and FreeGeek Chicago's Supreme Chi-Town Coding Crew (SC3).

Yana Kunichoff is the project's analyst and editor, Geoffrey Hing and Brian Peterson were the developers.

The Illustrated Press created the illustrations.

All icons from the Noun Project. Judgment icon designed by Bruno Gatjens Gonzalez.

You can find the code backing this website on Github. We used the Tarbell content management system to build our site. Content and code is under a Creative Commons Attribution 3.0 Licence.

Data Processing

In order to analyze the data and make it available for public access, the project team had to load the raw data, which was provided in comma-separated values (CSV) format, into a database. We also had to perform multiple transforms on the data in order to make it easier to query the data for our analysis. The data loading and transformations are performed by a series of Python scripts. You can view the source code repository for the scripts here.

Fixing shifted columns

In CSV files columns containing commas are accommodated by wrapping the value in quotation marks. Similarly, quotation marks can be used in a column value if it is escaped. However, improper escaping can cause columns in a row to be shifted. When doing initial queries on the data we identified these improperly escaped fields and updated the data loading scripts to fix the shifted fields.

Creating cleaner records

Other than fixing the shifted columns, the data was loaded into the database in the same format as the raw data. In the next step of the data processing pipeline, we create new records where values are parsed and cleaned.

Loading additional dataset

Spatial data describing Chicago community area boundaries, census tracts and census places was loaded into the database.

Tract-level demographic data from the American Community Survey (ACS) was exported from the United States Census Bureau's American Fact Finder tool. This data was aggregated from the cenus tract level to CHicago community areas.

Geocoding and finding spatial relationships

The addresses in the cleaned records were geocoded using Mapquest's Open Geocoding API web service. Once geocoded we used the database's spatial capabilities to determine the Chicago community area or suburban place corresponding to the convicted person's address.

Identifying convictions

The records in the data represent a series of events in a case. So, there will be one record for someone's initial conviction and another record if their sentence is later changed. While this accurately describes a criminal case's proceedings we can't simply count records to get a picture of how many people were convicted and how many convictions there are for any given crime.

We identified convictions like this:

Querying and exporting data

The data that drives the charts and maps in this project comes from code that implements database queries for statistics of interest. We also implemented management commands to export the results of these queries in machine-readable file formats like CSV and JSON that can be presented on the web. There is also a script that exports the version of the data that we've made publicly available which removes exact addresses and other personal identifiers from the records.

Mapping convictions

The data includes records for people convicted in Cook County's courts, regardless of their home address. Many of the convicted had home addresses in Chicago or Cook County, but some had addresses in other parts of Illinois, or other states. To focus on areas familiar to the audience of this site, we chose to map conviction rates only for the City of Chicago and suburban places in Cook County.

To be able to compare the number of convictions, adjusted to population, we needed to use geographies for which American Community Survey population figures were available. For Chicago, these were community areas, which roughly correspond to neighborhoods, and for suburbs, we used census places. Some suburban places include portions that are outside of Cook County. Because the purpose of mapping conviction rates was to show overall spatial trends, and because of the challenges of separating both home addresses and population between the areas inside and outside Cook County, our maps reflect the entire area and population of these places.

Categorizing crimes

We charted the number of convictions for violent index, property index and drug offenses. We also counted convictions of offenses that disproportionately affect women, such as sexual assault and domestic violence. Violent index, property index and offenses often affecting women were identified by mapping a record's offense statute to an Illinois Unified Crime Reporting (IUCR) code using this table from the Illinois State Police. This table helped the Chicago Justice Project define categories of IUCR codes. Rather than listing the codes that correspond to each category here, please refer to the source code that implements the categorical queries.

Drug crimes were also identified by statute, but there were not useful IUCR-delinieated categories. Instead, we wrote queries that categorized the offenses by the offense class specified in the Illinois Compiled Statutes and by the type of drug indicated by the offense statute. The source code for these queries is here.

Handling ages

The age of convicted individuals was calculated by taking the difference between the the charge disposition date field and the date of birth field. We did not expect to receive records for people under the age of 18, though some records for apparent juveniles do appear in the data. As we were not sure whether we received records for all cases involving juveniles and had no way of differentiating between records for juveniles and incorrectly entered dates of birth, we excluded records of juveniles when doing age-based comparisons. These records were included in total counts of convictions and in comparisons based on the type of crime.