An oddly titled article in KDNuggets, Stop Blaming Humans for Bias in AI, caught my attention. Who else would you blame? But that isn’t what the author meant. He was trying to absolve data scientists and AI as technology from developing biased applications. I found the premise preposterous:
Blaming data scientists for bias is not the right solution. Finding a way to combat bias at a systemic level is…bias in data sets results in systemic bias at a societal level. Individuals simply reflect that bias. They don’t create it.
I assume he means that systemic refers to the imprint of bias in the data. It isn’t the data that is systemic. It’s the people who put it there. So I ask myself, what separates the “societal level” from the individuals who “reflect bias?” Are these individuals excused from being part of the societal level because they're data scientists and/or AI generators? Do they assume a holy mantle once they are involved in this enterprise? But the article's murky propositions get clearer with this:
Those of varying ethnicities, age groups, genders, education levels, socio-economic backgrounds, and locations can more readily spot data sets that favor one set of values over another, thus weeding out unintended bias.
I think this is the central fallacy of the author’s proposition. Doesn’t this presuppose that there is a “clean” view of a dataset, scrubbed of bias by the diverse group?
All of this is premised on the author’s view that the problem is systemic bias buried in the data. That is, of course, true. But AI did not create bias. We did. In the US, there is a long tradition of using quantitative methods to justify everything from slavery, to Jim Crow, to discrimination in housing, criminal justice, education, zoning, healthcare and credit. If you want to deal with bias, don’t center your solution on AI; take a historical perspective.
How will the global talent pool deal with other-than-universal principles in ethics? A straight, white ontology defines prevailing (pervasive) “AI Ethics” as the bedrock of ethics. Even Spinoza was wise enough to see this, by weaving the teachings of Buddha into his philosophy. Western ethics identifies the privileged Western white male as the standard representative of humanity (Hume). Two main types of institutionalized bias are institutional racism and institutional sexism. Get realistic about what kind of society we are in. In 2022, for the first time, our Supreme Court removed a critical right (for women) it had previously upheld for purely political ideology, the most breathtaking event (so far) of the 21st century of institutional bias. As I said, take a historical view.
The foundation of AI Ethics as we know it, the bulk of so-called “AI Ethics” literature,” ethicists,” institutes and conferences is derived entirely from Western ethics and white Western ontologies. This discursive and repetitious echo chamber of facile ethics and the Western Canon washes over historical injustices and power asymmetries. If you want to deal with bias, envision an alternative future, and work toward it to bring about change. Scrubbing old datasets isn’t an answer.
AI has developed at three levels:
- Research organizations and academia and their spin-offs
- Commercial software and application vendors who embed AI in the products
- All the DIY organizations, large and small
Articles like these convey an image of all three as being driven to eliminate bias and harmful results. The world is full of selfish and greedy people. Step back and look at our world: gerrymandering, or the Astros stealing signs in a baseball game. Organizations are motivated by the success of their owners and shareholders. Pressure from management to cut corners and go faster can violate an organization's AI Ethics Principles
The complexity of the data problem
Even if this global bias detection squad cold spot bias in the data, it isn’t the data itself. It’s its context. Can it be abused for bias, or is it useful for some analysis? Gender and age are essential factors in insurance underwriting, pricing and reserving. There is the issue of weak proxies and shortcut learning in the models where innocent enough data is joined with other innocent data that can have harmful emergent effects. There is sequential bias, where an “AI for Good” operates as planned but inadvertently sets the stage for blatant bias in behaviors generated by the model.
Let’s think this through. Many attributes define a person beyond race and sex—for example, religion. Clearly, discriminating against a group because of their religion would be roundly criticized as a bias. So you wash out religion from datasets with PII by masking, deleting or encrypting. But any decent data scientist can join that file with others that can, with high fidelity, identify their religion. But what if the purpose of the model was to identify, say, Muslims to solicit for a Muslim charity? That would be objectionable.
The data isn’t static. New instances and datasets, logs, etc. will flow into the corpus requiring the diverse team of globally-based people to be on point 24/7. When you consider that more and more of this data will be streaming at considerable cost to the enterprise, developers won’t want to wait a few days for it to be blessed. What If the group is deadlocked over conflicting ethical points of view, which “ethics” mediates?
Human-in-the-loop AI is a project “safety net” that combines the strengths of people – and their diverse backgrounds with the fast computing power of machines.
This is optimistic. Madeleine Clare Elish, a cultural anthropologist examining the societal impacts of AI and automation, found the human-in-the-loop such an unworkably lousy idea. She gave it a name: the Moral Crumple Zone. You can read her whole piece here, in Moral Crumple Zones: Cautionary Tales in Human-Robot Interactions, where the human becomes the person to blame when things go wrong and - why that is so likely.
Algorithms are elevated as techniques to augment humans’ limits and biases. On the other hand, under this scheme, “people” are pressed into service to keep the algorithms from going off the rails. This inconsistent argument only proves that AI cannot be trustworthy to decide high-stakes decisions, despite common proclamations about its benefits. While it is, without doubt, the data used in a model that leads to problems, it is just as likely to be the conceptual model (embodying prejudices and bias of the modelers), the choice of model elements, unanticipated ML divergence - and even the interpretation of the results.