Main content

AI in criminal justice – is tech simply automating deep human biases?

Chris Middleton Profile picture for user cmiddleton April 25, 2024
A Westminster legal policy forum debated the use of AI in criminal justice. The findings were fascinating – in some cases, troubling.


Statues of ‘Lady Justice’, or Roman goddess Justitia, show her carrying scales and a sword, illustrating the twin concepts of justice being balanced - with cases weighed and evidence measured - yet also swift, decisive, and sharp. 

Some versions depict her as blindfolded, which has come to mean ‘impartial’, but originally it was artists’ commentary on judges being blind to poverty, circumstance, and bias. Like the law itself, sometimes it is all a matter of interpretation.

So, which version of justice will AI bring about: a future of digital impartiality, balance, and pointed insight? Or one of societal biases and unfairness being automated, and given a veneer of machine neutrality? 

There is certainly evidence cautioning against the use of AI in justice. For example, a prosecutor might point to the US Correctional Offender Management Profiling for Alternative Sanctions algorithm, aka COMPAS. This was used in some US courts as a predictor of criminals’ risk of reoffending. A ProPublica investigation found that, by using the system in sentencing guidelines:

Black defendants were far more likely than white defendants to be incorrectly judged to be at a higher risk of recidivism, while white defendants were more likely than black defendants to be incorrectly flagged as low risk.

The reason? Historic inequity in the justice system, in which black offenders or defendants were often treated more harshly than white - problems that few would argue have been eradicated today. (Black Americans are still overrepresented in the prison system, more likely to be arrested, and three times more likely to be shot by police - with the latter based solely on 2024 shootings so far.)

Historic bias becomes judgement, which then becomes training data - issues explored last year in my interview with author and IBM Senior Engineer Calvin Lawrence. Now fast forward to a hypothetical world of predictive policing, and consider the possible outputs.

Our prosecutor might also point to the Manhattan lawyer last year who prompted ChatGPT for precedents in a case against an airline, which he then presented in court. The problem? The supposed case law was an AI hallucination. 

In that example, the lesson is not so much that ChatGPT generated a fake - though that is important. It is more that the lawyer abdicated professional responsibility by trusting it, and so early in the AI Spring. He lacked the deep, first-hand expertise to spot false information, and trusted an AI to do the donkey work for him. 

Common sense suggests that these types of problems will only proliferate as AI becomes more widely adopted by harassed, time-poor lawyers who are keen to lose the drudgery of a document-based career.

So, what is the case for the defence? Alongside my interview this week with Eleanor Lightbody, CEO of legal AI specialist Luminance, this was the topic for a Westminster Legal Policy Forum on AI in criminal justice.

Chairing the first half was peer and magistrate Lord Ponsonby, who observed that perhaps the most important thing an AI could do is help establish identity - in complex fraud cases, for example. Then he said:

The second is the reading and writing of documents - in family court cases, particularly, where we have hundreds of them. And they all have to be written, and we have to read most of them! So, there may be ways in which that process can be enabled and sped up.

Certainly, that is Luminance’s perspective - and its opportunity in the market, as we explored earlier. But what else?

Phil Bowen is Director of UK non-profit the Centre for Justice Innovation, which seeks to be a policy accelerator for new technologies in the legal realm. Not to benefit vendors, he suggested, but to pursue a fairer and more transparent system that does minimal harm.  He explained:

We felt that a lot of the conversation on technology was about cost savings and efficiencies, but was not focused on how we do things like reduce crime and reoffending. And how we make sure that the disposals of the system are felt to be fair - and, indeed, are fair. Especially for communities that are more heavily involved in the criminal justice system than others.

Second, we wanted to look at public attitudes to the use of those technologies and their application in the justice system. We felt it was really important that it shouldn't just be decision-makers in Whitehall, or in constabularies, choosing which technologies they bought. The public needs to be engaged too.

However, the challenge there is what information would the public base its judgement on? Vendor marcomms, perhaps? Take COMPAS as a hypothetical example. Citizens would doubtless have supported any technology that promised to reduce reoffending, punish recidivism, and keep their communities safe. But it took a deep investigation to uncover the reality of that algorithm’s effect.

That said, Bowen was at pains to say that the public is capable of making very nuanced, detailed judgements about these issues - if given the opportunity and data to do so.

Six years ago, the CJI produced a report, Just Technologies, on a range of innovations in justice. Despite now needing an update, it uncovered some interesting facts. Bowen explained:

Analytics and predictive policing, which come along for the ride [with AI]. 

The criminal justice system makes big, weighty decisions on the lives of our fellow citizens on a daily basis. So, it's not surprising that - when the use of AI was still relatively nascent in justice - we picked up a great deal of practitioner interest in how it could make their decision-making, their assessment of people, better informed and more predictive through the use of things like artificial intelligence.

In short, there was strong interest in predictive policing and sentencing. There is no doubt that interest remains, just as there is no end to the enthusiasm about real-time facial recognition in the police, despite the litany of negative reports on its risks to racial minorities due to inadequate training data. 

Flawed human judgements

So, the challenge in industrializing or automating justice is as much people’s trust in a technology’s accuracy or transformative ability as it is any flaws in the systems themselves. (Remember that Manhattan lawyer, presenting an AI hallucination in court without a second thought.) In a sense, therefore, the real risk to humans is evangelism trumping common sense and critical thinking.

Referring back to the 2018 report, Bowen confirmed:

In the interviews we conducted with senior police and justice officials, we got a great sense of the curiosity, the sense of possibilities around AI.

Right back when I first started out as a policy official in the Ministry of Justice, we were rolling out something called the Offender Assessment System in the probation service, which is a risk assessment tool based on large data sets. It tries to make predictions and assess the risks about what's likely to happen if an individual gets this or that sentence. And we do the same in the justice system. 

We also noted what, at the time, seemed like a very significant document, which was an American paper looking at New York bail decisions. It applied a machine learning algorithm to judicial decision-making to see if better decisions could have been made. 

What it found was, had different decisions been made by judges on bail and remand, they could have reduced the number of people who were given remand without any increase in crime rates. 

And it suggested they could do that while reducing the percentage of African Americans and Hispanics in jail. And that, with real-life human decision-making versus the machine learning algorithm, New York City judges were treating some defendants as low risk when the algorithm suggested they were high risk. 

In other words, it suggested that in human decision-making, we were making the wrong risk decisions. At least, that was what it found at the time.

So, this is the flipside of the COMPAS example: systems that might counterbalance flawed human judgements. No one doubts that human decisions can be biased, of course - they generate the training data, after all.

But what if things go wrong? What if, lurking beneath the hood of systems designed to help tired, overworked, or biased officials is simply data from other tired, overworked, or biased officials - an infinite regress of bad decisions?

Bowen said:

While we're cautiously amenable to the use of tools to supplement human decision-making in justice, we also found when we spoke to practitioners, the public, and academics, a range of strong ethical and empirical issues that needed to be resolved.

So, if a decision-making tool issues a false negative, and guides a judge to send someone to prison who shouldn’t have gone, who is responsible? Who gets the blame? And how can that decision be appealed?

The perennial issue - not just in AI-based assessment tools, but in non-AI tools as well - is do they perpetuate racial and other disparities, given that the data that's locked inside them tends to come from the system itself?

“So, things like previous convictions are heavily weighted against particular communities because they're more heavily policed. So how do we adjust for that?

How indeed? Then he added:

From what I see, the use of AI is growing mostly within policing. That's partly because you’ve got 43 forces [in the UK], and the police are more sold on the idea of using technology to improve their systems. So, with policing, you get 43 different versions of what growth looks like.

Chief Inspector Scott Lloyd is Biometrics Capability Lead in the South Wales Police, and an officer seconded to the National Police Chiefs’ Council, where he is Science and Innovation Coordinator. He said:

Since the days of Sir Robert Peel, the bread and butter of policing is two things: to locate people, and identify them. And ultimately, technology might assist us to do both. But when we adopt technology, there is always a balance to be struck between how we use it to protect our communities, and the privacy challenges. And it has to be both necessary and proportionate. 

For example, policing has used facial recognition technology in two ways: retrospective facial recognition, where we get an image of someone from CCTV, body-worn video, or dash cam. We use that to identify who a person is, often comparing footage against lawfully held custody images. 

And, at the moment, it's also within the intelligence chain, where it is not yet at evidential standard. But that might change as the technology continues to improve. It may become a forensic capability.

That sounds like an admission the system is inaccurate. Lloyd continued:

But live facial recognition is the other use case, and is probably the most contentious. At the moment, it’s primarily where we put a camera on top of the van, or a lamppost, and we look at people who are walking past in order to locate individuals - whether that is suspects of criminality or people who are particularly vulnerable - for example, high-risk, missing persons.

So, what of the oft-cited flaws with live technology, which has often been found to misidentify people from ethnic minorities - so much so that its use in law enforcement was banned in California? He said:

Policing wouldn't want to use any technology that has a bias, or any concerns over race, gender, or age [data about children would also be gathered]. So, we've worked with the National Physical Laboratory to really focus on equitability and bias before we use the technology. 

And as tech changes, it’s fair to say that we need to revisit the legal landscape to ensure that we are legitimate, and that legislation is simple for our public to understand.

Really crude assumptions 

But is any of that enough? Silkie Carlo is Director of anti-surveillance and citizen rights organization Big Brother Watch. She said:

We urge caution around the adoption of AI in policing. The first lesson learned is that, just because we can, doesn't mean that we should, when it comes to AI in policing. Some of the smartest decisions that need to be made by government, and by police forces, are not just around what to use, but also what not to use. 

The second lesson is that the democratic process is critical. Parliament hasn't been involved anywhere near as much as it should have in the adoption of AI in policing. And I think that the adoption of new technologies has been poorer because of it.

So, what are the main issues for the organization? She acknowledged that there are claimed benefits, including efficiency, eliminating human bias, and improving security. Then said:

However, the costs often engage people's fundamental rights, particularly privacy, because most of these systems are built on bulk data. The perpetuation of discrimination is therefore one risk. And freedom of thought - because, increasingly, data analysis is going deeper and deeper into people's [private lives and thoughts].

The spectres of thought crime and science fiction film Minority Report are never far from the table. Did she have examples of these intrusions? She did:

Big Brother Watch led the campaign against ‘digital strip searches’. This is where individuals were going to police with experiences of rape and other sexual offences, and we found that one of the first things that police asked them for was a complete download of their digital devices. 

In many cases, this led to victims withdrawing from the process, because not only did it mean they would be without their devices, but they also felt they were being treated as suspects. It wasn’t justified at all.

Quite. As a result of that campaign, there are now better protections around digital abstraction. So, what about AI and other systems that predict recidivism and other future actions? Carlo said:

“Take the HART system [Harm Assessment Risk Tool], which was used by Durham police to give a recidivism risk score [a profiler similar to the US COMPAS sentencing algorithm]. 

“We looked into what the data variables were, and found that postcodes were among the significant variables that were being used to decide whether somebody was likely to reoffend or not. 

“Of course, postcode is often a proxy for socioeconomic status and, in some cases, race. And we found that not only is the postcode itself being used, but also other postcode variables, which are sociodemographic identifiers.

“These rely on really crude assumptions. They also rely on masses of data - thousands of bits of data - but the identifiers that they were using had titles as crude as, for example, ‘Asian heritage’. And the kinds of assumptions that are being made about people that fall into those categories could be things like ‘low-cost housing,’ or ‘works in food delivery’. They really are based on biased assumptions.”

My take

Once again, the lesson is simple. AI and automation should always be seen as proxies for human intelligence, and as models of whatever human systems the technology is introduced to. 

If those human systems are themselves deeply flawed, then technology will only replicate those flaws. After all, the first things to be automated are our assumptions about the world.

And we must never lose sight of a simple principle: what a system predicts a human might do, and what they actually do, may not be the same thing. In justice, surely, actions speak louder than algorithms. 

A grey colored placeholder image