Main content

How can we measure fairness beyond bias, discrimination and other undesirable effects in AI?

Neil Raden Profile picture for user Neil Raden July 22, 2020
The question of AI ethics and bias remains a potent one - but are we framing these issues in the right way? A better approach would be centered around AI fairness. But can fairness be monitored?

code of ethics

We are already at the point where important decisions that affect people's lives are being automated with growing concern that algorithms can replicate or amplify existing biases. There are widely reported incidents of complaints of discrimination in facial recognition, hiring systems and biased judicial systems that subject minorities to exceedingly longer prison sentences than non-minorities.

Those are just the ones that get a lot of press. Thousands of others are never reported. However, the problem is that people, not algorithms, are not any better at making decisions, so there are no general criteria that can be transferred to an algorithm.

AI ethics - a practical problem we have not solved

Instead of, or at least in addition to, trying to remove bias, discrimination or invasion of privacy, why not look at those criteria as derivative? Let's craft a notion of fairness; instead, that could drive the honing of these undesirable outcomes. 

I had a conversation with Anna Krylova, Chief Actuary of the State of New Mexico about ethics. Here is what she said:

Everyone knows what is ethical, or at least has a sense of it, even if they don't act on it. But New Mexico is a poor state (49th in per capita income), and auto insurance is mandatory and expensive. It's like a regressive tax. And if you are poor, it is more expensive because rate filings allow FICO scores as part of your rate. While FICO scores have a strong correlation with risk, it isn't causal. It's situational. Ans if you miss a payment or two because you can't afford it, you may get a get ticket for hundreds of dollars. You may get your car impounded and be unable to get to work or pick your kids up at school. And of course, to get reinstated, your premium will go up substantially. So it that ethical? Well, the insurance company has to stay solvent or not be in busies. But here is the question I ask with every filing I get, Is it fair?"

This was an unusual situation because what she was saying was, beyond the modeling of risks, expenses, and solvency, we need an evaluation of whether the models were fair based on the people's situation in New Mexico. So I asked how she measured fairness. She said:

  • Procedural Justice - the perceived fairness of the procedures used to evaluate and modify a property and casualty actuary's quantitative models. For example, whether the experience of the class of drivers was assessed within the context of their situation. Or whether they as a group are allowed to challenge any appraisal decisions. One important conclusion she came to was the use of FICO scores to underwrite the poor and working poor easily fit the unfair definition. Poor people do not have poor credit because they are poor drivers; they have poor credit because they're poor. 
  • Distributive Justice - When the distribution of credits can be perceived as a fair evaluation of the class's experience. People perceive fairness by comparing their rewards to that of someone similar to them. Unfairness is perceived when people feel they are being taken advantage of instead of those who have similar experiences and are receiving greater rewards or recognition than they are.

When a claims adjuster denies a cancer patient's claim for drug therapy, they likely feel at least the slightest tinge of remorse. When a rules-based system makes that decision, there is undoubtedly no remorse, but if that decision is questioned, how that decision was made can be discovered through a trace of the rule firings. But when that decision is made by an inferencing algorithm generated by a machine learning model, there is neither remorse, nor code to trace. There is no code. In this case, it is impossible to determine if the decision was fair. 

There are only so many decisions like this that a human can make in a day. The number of cases made by an algorithm is virtually endless. So not only are those decisions are made without empathy; without internal review, they are consistent. In Weapons of Math Destruction, Kathy O'Neill described a slightly bi-polar college student looking for a summer job and being turned down by twelve supermarkets, which all use the same psychometric software for evaluation. This is good that the algorithm made the same decision each time, but is it fair? Is the psychometric robust? Is it biased? If he had gotten an interview, would one of the hiring managers have seen something in him and offered him the job?

Monitoring AI fairness

Suppose we can remove gender bias from our data, and we apply a learning model to select the best candidate for a job. If what we are going to monitor is parity or quota compliance to ensure the groups' representation is protected, fairness can be measured by counting people from different groups. However, when it comes to ensuring fairness in a process or decision, such as in a recruitment process or a trial, measurement is much more difficult. How to measure whether the process or decision was fair and non-discriminatory?

Building confidence in AI delegated or algorithm-based decisions require three elements:

  • transparency in design and implementation
  • explaining how a decision was reached
  • accountability for its effects

In this context, performing and documenting a fairness analysis and the actions taken to solve the findings can be of great use.

Revisiting AI bias

Bias is a tricky term because it has so many meanings. In the context of AI, the word "bias" carries a heavy load of negativity. And it should, when dealing with people (or by extension, living things). Even a "positive" bias about a group typically implies a negative one about other ones. In What Scientific Idea is Ready for retirement, Tom Griffiths writes:

Being biased seems like a bad thing. Intuitively, rationality and objectivity are equated-when faced with a difficult question, it seems like a rational agent shouldn't have a predisposition to favor one answer over another. If a new algorithm designed to find objects in images or interpret natural language is described as being biased, it sounds like a poor algorithm. And when psychology experiments show that people are systematically biased in the judgments they form and the decisions they make, we begin to question human rationality.

But bias isn't always bad. For certain kinds of questions, the only way to produce better answers is to be biased. Inductive reasoning is a type of logical thinking that involves forming generalizations based on specific incidents you've experienced, observations you've made, or facts you know to be true or false. Griffiths adds:

Many of the most challenging problems that humans solve are known as inductive problems-problems where the right answer cannot be definitively identified based on the available evidence. Finding objects in images and interpreting natural language are two classic examples. An image is just a two-dimensional array of pixels-a set of numbers indicating whether locations are light or dark, green or blue. An object is a three-dimensional form, and many different combinations of three-dimensional forms can result in the same pattern of numbers in a set of pixels. Seeing a particular pattern of numbers doesn't tell us which of these possible three-dimensional forms are present: we have to weigh the available evidence and guess. Likewise, extracting the words from the raw sound pattern of human speech requires making an informed guess about the particular sentence a person might have uttered.

The only way to solve inductive problems well is to be biased. Because the available evidence isn't enough to determine the right answer, you need to have predispositions that are independent of that evidence. And how well you solve the problem - how often your guesses are correct - depends on having biases that reflect how likely different answers are.

My take

Be careful when using the term "bias" because it has so many meanings. In AI today, they are mostly negative, but they aren't entirely. Fairness is a far more ineffable quality, but in the end, it's the most important one.

A grey colored placeholder image