Removing bias from AI systems is a growing preoccupation. But is removing all bias from AI the proper goal? That provocative question recently sparked a vigorous LinkedIn debate.
After the debate, I had more to say on this topic - responses I'll share with diginomica readers. Donald Farmer, Principal of TreeHive Strategy, a BI and AI consultancy, was at the center of the debate. His comments kick off our article, followed by my responses. Farmer wrote:
As ever, I am frustrated when we use the term bias only to mean socially unacceptable biases. It is meaningless to build any decision-making system without bias - the purpose of machine learning and A.I. systems is to encode biases.
I disagree with that, and I have some credible company. My friend and former co-author James Taylor describes it this way:
AI is biased if (it) weights some factor as material in its prediction when that factor is not genuinely predictive, but is an artifact of its training dataset being distorted. This might be because a partial training dataset was selected in an unrepresentative way, or because there is a more general societal bias.
Mike Gualtieri of Forrester agreed with Taylor, and said succinctly. "When AI is biased, it is because it has detected and reflected bias in the data set. Garbage in = garbage out." Peter Burris chimed in with:
AI bias is the basis for any bad outcome in an AI-related sociotechnical system. It can derive from a weak understanding of a sociotechnical problem, good data from bad historical events, bad data from any historical events, biased training, or nefarious designs.
The purpose of ML is to find patterns, relationships, and correlations. If a model classifies "climate" based on precipitation and seasonal temperature, does it expose a bias for those classifications? Farmer added to his argument:
We must enumerate, identify, model, and then test for the biases we want to use and the biases we want to reject... Without this clarity, diversity hardly helps at all. It is true that a more diverse team may be somewhat more likely to enumerate some biases In advance, but without the clarity needed to identify, model and test for biases, we'll not make much progress.
My first question: what is a bias we want to use? Diversity seems to be unrelated to the topic Farmer is raising, though I think it is related to the documentary ("Coded Bias") we were both talking about. I contend that a team composed of people with different backgrounds will be more effective in spotting negative biases.
As for Farmer's statement that "Without clarity needed to identify, model and test for biases, we'll not make much progress," I don't see what this has to do with this discussion that bias refers to positive or negative aspects of AI. Farmer added:
We desperately need more diversity in data science and tech - because it is right and just. It also provably leads to better business decisions overall. Whether it leads to fewer socially unacceptable biases in systems, I am not sure.
This is an interesting comment because "diversity probably leads to better business decisions" seems to conflict with Farmer's contention that "whether it leads to fewer socially unacceptable biases in systems, I am not sure." He went on to say:
The language around all this is hopelessly fuzzy. I am rather tired of it. Perhaps we just have to give up and assume that we use "bias" to mean something negative, and the perfectly acceptable preferences of algorithms should be referred to some other way. Weightings? Preferences? I don't know.
Exactly. Bias in AI is considered to be negative. At the same time, the broader definition is indeed, as Farmer points out, not so narrow. I have not heard of "perfectly acceptable preferences of algorithms" referred to as biases. AI bias is widely understood to mean undesirable aspects in the social context when it affects people. Decisioning applications existed long before the current wave of AI, such as rule-based systems, decision trees, predictive models, and data science/statistical modeling. I'm not sure why the burst of activity about ethics occurred with AI, but the domain of these ethical concerns is people.
Defining AI bias - a debate with real world consequences
It's obvious where these applications affect people: credit, hiring, facial recognition, judicial, insurance, disinformation, digital phenotyping, and a host of other areas. It does not affect people, and we aren't concerned with it for autonomous underground mining machines or preventative maintenance of remote devices. But there are grey areas, too. Where to put an intra-city mass transit system, bypassing poorer neighborhoods, eminent domain, redlining (now illegal, but the effects of it prevail.). But most AI today is used to pierce your privacy veil to influence you and sell you things. Farmer asks for clarity, but the whole idea of lousy bias would be muddied by adding a "good bias" into the mix.
Methods for measuring bias and fairness are in their infancy. Bias can be meaningless or dangerous. For example, I am biased about the Philadelphia Phillies. If a Phillies pitcher competes for the Cy Young Award, I will rationalize why he should be favored. That's wrong if I'm in Baseball Writers Association; otherwise, it makes no difference. It affects no one. The phenomenon of bias manifests itself whether it's about baseball or matters of gender, age, race.
Hiring decisions based on ML-trained on historical data that chronicles past biases is an ethical problem; and harmful to the community. I understand Farmer's complaint about the misuse of the word bias in this narrow context, but everyone in AI understands it, and we have bigger fish to fry. Farmer had more to say about this:
Of course, I recognize that the ship has sailed. And as I am a fairly rigorous descriptivist when it comes to language, I have to acknowledge this use of the word. But here's the problem: the methods of identifying and testing for undesirable biases can be the same as the methods of testing tor acceptable preferences.
This is where Farmer loses me. When it comes to drift in data, mathematical models that can measure this include: SHAP, CDD, disparate impact, integrated gradients, adversarial robustness, counterfactual reasoning, slope bias, contrastive explanation, and others. But as I see it, a rigorous descriptivist would also consider that words' meanings can drift. The model is trained and tested to determine if it does rigorously test for goals. But not in production. Most ML algorithms are regressive; the absence of confounding factors is often overlooked; the non-goals are emergent. Non-goals=bias, goals=prediction/classification, even if measured the same way. Farmer added:
We gain a lot practically if we can enumerate our specifications all the goals and non-goals of a system and rigorously test them.
First of all, I wish even a fraction of AI development did this. Nevertheless, I'm still not convinced we should list them as biases, delineated by good or bad. Farmer:
For example, an insurance algorithm may prefer experienced safe drivers who buy cars with good safety records. It may effectively penalise drivers of unsafe vehicles. But if poorer drivers are buying older, less safe, vehicles, and ethnic minorities are in general poorer ... you see where that leads.
Insurance is a complicated problem. Features like age, gender, location, etc. are potentially negative classification in most practices. Still, for actuaries, a negative bias is a part of their business, and it's a thin line separating analysis and discrimination. See my recent diginomica article on this. The use of credit in insurance and other things is a screaming racist practice. One excuse I heard from a (white) actuary was, "People with higher FICO scores manage money better than those with lower scores, which implies that those with lower FICO scores don't manage other aspects of their lives." I gasped when he said that. That is a type of bias that you won't find directly in the data, just the results. Farmer:
A rigorous, repeatable framework - which can also monitor drift in these biases over time - is essential if we are to engineer out these problems. Diversity can't do it without a framework for codifying the problem.
Agreed. That's a good point. Monitor the drift in the bias through a consistent framework is a great idea and one I hadn't encountered before. Farmer:
And to my mind, the fuzziness around "bias" gets in the way rather than helping us clearly articulate what can be done. Truth is, I want to fix this problem. I want to fix it at scale, repeatedly and with assurance.
I just don't see the connection. I've seen organizations make real progress on these issues without expanding the term "bias" to cover negative biases and positive outcomes. I just don't know how Donald made his case. Farmer:
The saddest example of unwanted algorithmic bias I have seen among my clients was put into production by the most diverse team. The fix was to write clearer specs and better test cases.
This is an unexpected comment. Why would the most diverse teams apply the "unwanted algorithmic bias?" Would clear specs and better test cases improve the outcome? Definitely, but I don't see where the comments connect.
How much is there to know about developing systems in the social context? What is there to think about? Don't unfairly discriminate, don't engage in disinformation, don't invade people's privacy, don't conduct unfair computational classification and surveillance, don't combine clean files without AI of the combination. Just don't do this stuff. Thinking about the definition of bias isn't the problem. It is not the solution. It's like driving. You don't need Physics. Don't pass on the right (or the left in the UK), don't run red lights, don't tailgate, don't drive under the influence.
This article isn't about Farmer's insight and knowledge, which are peerless. It is about a position he's taken that I simply disagree with. Not only in AI, but in general parlance, bias has a negative meaning. In the strictest sense, bias can take many forms. Some are innocuous and some, I suppose are helpful. But if you scan the literature, all types of bias are negative:
- Confirmation bias
- The Dunning-Kruger Effect
- In-group bias
- Self-serving bias
- Availability bias
- Fundamental attribution error
- Hindsight bias
- Anchoring bias
And these are cognitive biases, all of which may be in play in the election of data, feature engineering, and even the interpretation of results. But there are more specific biases based on age, race, gender, and machine-generated biases with a name that can be mysterious. But nowhere can I find the use of positive, helpful associations in AI referred to as "bias."