"I can build this - but should I?" Welcome to the AI crime prediction debate
- Summary:
- Can you predict crime with AI? And what should be done with these predictions, and the data that powers them? An AI model from Chicago raises all kinds of questions.
This article caught my eye from New Scientist: AI predicts crime a week in advance with 90 percent accuracy:
An Artificial intelligence application that scours crime data can predict the location of crimes in the coming week with up to 90 percent accuracy. Still, there are concerns about how systems like this can perpetuate bias.
This is a precautionary tale about how technologists and researchers can create applications in AI based on their assumptions about the utility of the software, but with an inadequate understanding of how it will be used. The enterprise is subject to the same danger. The purported goal of the model was to predict where violent crime was likely to occur one week in advance.
At the University of Chicago, Ishanu Chattopadhyay and his colleagues created an AI model that analyzed historical crime data from Chicago, Illinois, from 2014 to the end of 2016, then predicted crime levels for the weeks that followed this training period. Chattopadhyay concedes that the data used by his model is somewhat biased, but says that efforts have been taken to reduce the effect of bias, and the AI doesn’t identify suspects, only potential sites of crime. “It’s not Minority Report,” he says. It did not identify individuals. His bio claims he is an Assistant Professor at the University of Chicago in the Committee on Genetics, Genomics and Systems Biology. That doesn't seem like someone who would be building a model without any reference to people.
But, I took a closer look at the paper, not the article, and I found that the Chicago PD used the algorithm to compile a list of potential perpetrators of gun violence and their victims. As per New Scientist:
Details of the algorithm and the list were initially kept secret, but when the list was finally released, it turned out that 56 percent of Black men in the city aged between 20 to 29 were featured on the list.
My shields went up over a few things. They claim an AI can now predict the location and rate of crime across a city a week in advance with up to 90 percent accuracy. Similar systems have been shown to perpetuate racist bias in policing, and the same could be true in this case, but the researchers who created this AI claim that it can also be used to expose those biases. Chattopadhyay says the AI’s predictions could be more safely used to inform policy at a high level, rather than being used directly to allocate police resources. Could be.
This is AI ethics problem 1. Anyone involved in the specification all through to the implementation must think about how the application can be used and misused. Developers rarely take this responsibility seriously.
Let’s first look at some data (I filled in the years 2017-2021):
The first thing I noticed was that the training data covers the years 2014-2016, but the murder rate almost doubled in two years, fell back dramatically and then started to increase again. I question whether the training data was sufficient to explain the situation with such old data adequately.
Now it is possible that the model found explanatory variables regardless of the magnitude, but I’d like to understand that. He has publicly released the data and algorithm used in the study so that other researchers can investigate the results. I hope to get a look at this; I will follow up.
The model predicted the likelihood of particular crimes occurring across the city, divided into squares of about 300 meters square a week in advance with up to 90 percent accuracy. It was also trained and tested on data for seven other major US cities with a similar level of performance.
What does 90% accuracy mean? That violent crime will occur within a week in a particular area? In each area? Or is it an aggregate? It doesn't take AI to predict a violent crime in Riverdale, Englewood, North Lawndale, Washington Park, West Garfield Park, South Chicago or a host of others. On the other hand, the 90% accuracy rate may be bolstered by predictions there will not be a murder in Edison Park, Forest Glen, Norwood Park, Lakeview, Westmont, West Lawn, Belmont and dozens of others.
In summary, 90% can be a fungible number, heavily weighted by the predictions of no crime in the no-crime areas. But more importantly, what’s the point of the predictions at all? What does the prediction do? What do I do if I'm told, as a police officer, that there will be a murder in an area? I can’t saturate an area of 300 meters square with police offices for a week. In violent crimes, police are more likely to investigate after they happen, so I don't see the point.
This brings up the whole problem of predictive policing. Sending more police into an area with the suspicion of a violent crime tends to increase the number of less violent arrests (taillights, traffic infractions, etc.) The accumulated arrests tend to skew the system to send even more police to the area, both 1) depriving other areas of policing and 2) a continuous positive feed loop.
Previous efforts to use AIs to predict crime have been controversial because they can perpetuate racial bias. If you are a Black man in the city aged between 20 to 29, you are likely the target of an investigation. Law enforcement resources are not infinite. So you do want to use that optimally. From New Scientist:
The researchers also used the data to look for areas where human bias affects policing. They analyzed the number of arrests following crimes in neighborhoods in Chicago with different socioeconomic levels. This showed that crimes in wealthier areas resulted in more arrests than in poorer areas, suggesting bias in the police response.
Lawrence Sherman at the Cambridge Centre for Evidence-Based Policing, UK, says he is concerned about the inclusion of reactive and proactive policing data in the study, or crimes that tend to be recorded because people report them and crimes that tend to be recorded because police go out looking for them. The latter type of data is very susceptible to bias, he says. “It could be reflecting intentional discrimination by police in certain areas."
And the Chicago PD is not exactly a paragon of fairness.
My take
What’s the point? The 90% accuracy rate cited in this study doesn’t only apply to murder. It also includes Criminal Sexual Assault, Aggravated Battery, Burglary, Robbery, Theft, and Motor Vehicle Theft. All of these crimes combined in 2021 amounted to 25,507, eighty-eight times the murder rate. No information is given on the accuracy rate for these crimes separately. How is this information used? An application like this raises the fundamental ethical question, “I can build this, but should I?”
And one more thing: one should never put a black box in the hands of people who are guaranteed to misuse it. Racial disparities in the Chicago Police Department’s practices are stark. The Vera Institute of Justice’s Arrest Trends tool demonstrates that in a population of 30.1 percent Black, 72.5 percent of all arrests made by the CPD in 2016 were of African Americans. That disparity is higher for violent offenses.
In our practice, we refer to this as subsequential bias. Any AI applications you put in the social context obligates you to look over the horizon and consider how your application can be used in unethical practices.
End note: the Chicago crime data table above includes data sourced from Wikipedia and biggestuscities.com.