The problem of algorithmic opacity, or "What the heck is the algorithm doing?"

Profile picture for user Neil Raden By Neil Raden January 13, 2021
Summary:
Opacity in AI used to be an academic problem - now it's everyone's problem. In this piece, I define the issues at stake, and how they tie into the ongoing discussion on AI ethics.

Man solving complex problem finds surreal keyhole to success © frankie's - shutterstock
(frankie's - shutterstock)

Opacity in AI is a formal, academic description of what is more commonly referred to as, "What the heck is the algorithm doing?" It's a problem that is at the root of many ethical issues with AI.

It appears as a robust classification and ranking mechanisms, such as search engines, credit card fraud detection, market segmentation, spam filters, all used in insurance or loan qualification advertising or credit. These mechanisms of classification are calculated on computational algorithms, most often machine learning algorithms.

This is a bit of oversimplification, but there are three broad categories of opacity: (1) deliberate opacity by corporations, governments and, increasingly, data brokers, (2) opacity when the investigator is not qualified to understand the process, and (3) opacity that is just inevitable because of the scale of the machine learning algorithms and a lack of tools to discern their operation. There is also much resistance in algorithm developers to insert monitors in the code, ostensibly not harming performance, but there may be other, not so helpful reasons. In general, this form of opacity results from incongruity between optimization in high-dimensional machine learning and the demands of manual investigation and semantic interpretation.

This last form of opacity is challenging to separate from the second form because the impression is that algorithms are very complex. It's important to point out that complex codes that exhibited dangerous complexity did not arise with AI. Here is an example:

In February 1991 (First Gulf War), an Iraqi missile hit the US base of Dhahran in Saudi Arabia, killing 28 American soldiers. It was determined that the base's antiballistic system failed to launch because of a computer bug: the Patriot missile battery, had been running for 100 hours straight. After every hour, the internal clock drifted by milliseconds, which had a huge impact on the system (a delay of  ⅓ of a second after 100 hours).

There are thousands of examples like this, but what makes opacity in machine learning is that the code isn't procedural. It can't be examined line-by-line like the code running a Patriot missile.

A quick review of machine learning

Not all machine learning routines are classifiers, but most are. Machine learning algorithms are useful predictions and generalizations. Accuracy is always an issue, but it is assumed that accuracy improves with more significant quantities of data.  Part of the expansion of machine learning is due to the availability of substantial amounts of data, essentially due to cloud storage and "big data" initiatives. However, there is some danger to this as well. If a model is not producing useful insight, the tendency is to find more data, which almost always involves new data quality issues.

What's in a machine learning algorithm? Two parallel processes are driven by two different algorithms: or two distinct algorithms: learners and classifiers. Inputs (features) are processed by the classifiers, producing results referred to as a category.

A medical diagnosis system takes many input variables such as blood test results and clinical observations, and calculates a disease state diagnosis as output ('liver cancer,' 'heart disease, multiple sclerosis'). To do this, the machine learning algorithms have to be trained on sample data. Training requires some skill to provide a proper mix of data for the model. The result of the training, which is in the form of a matrix of weights, is consumed by the classifier to group new input data. 

Another example is a classifier that filters email using features (such as email header information, words in the email body, etc.) and produces one of two output categories: 'urgent,' 'social,' etc. 

In this article, I wanted to delve at a detailed level into machine learning algorithms' opacity for the two examples above. But the descriptions are too long for an article like this. I'll publish them as separate articles in a future release.

We're are at the stage where high-level discussions are needed about what is ethical and what isn't in opacity in machine learning algorithms, and their effect on social interests in classification and discrimination. This is part of my research on 'digital inequality,' which has too often focused on computational resources and skills. Instead, it should address what's become known as "AI Ethics," the question of how people may be subject to abuses from AI. These problems include privacy invasions from Digital Phenotyping, unfair computational classification, and surveillance. A litmus test of these abuses is unequal scrutiny across the population

There is a growing chorus of opinion that argues for auditors who can evaluate the code and determine if the models comply. Alternatively, it's possible to educate a wilder population of developers to alleviate the problem of lessening a homogenous and elite class of technical people who solely conceive of consequential algorithms. However, machine learning algorithms are opaque and ineffable at a fundamental level that is more challenging than the options listed here.

Machine optimizations derived from training data are not necessarily in agreement with human semantic comprehension. The "machinations" of machine learning algorithms can be mysterious, even for computer scientists and others with specialized training

My take

Alleviating black-boxed classification problems will not be accomplished by a single tool or process, but some combination of regulations or audit. Audits can work on the code itself, but more emphasis is needed on the algorithms' operations and more transparent alternatives, such as i.e., open-source, education of the general public, and the kind of kick in the butt of those empowered to develop code of such consequence.