The so-called "black-box" aspect of AI, usually referred to as the explainability problem, or X(AI) for short, arose slowly over the past few years. Still, with the rapid development in AI, it is now considered a significant problem.
How can you trust a model if you cannot understand how it reaches its conclusions? For commercial benefits, for ethics concerns or regulatory considerations, X)(AI) is essential if users understand, appropriately trust, and effectively manage AI results. In researching this topic, I was surprised to find almost 400 papers on the subject.
What is the motivation for having an AI application explain itself? I believe there are effectively, four major reasons which capture the different motivations for explainability, as follows:
- EXPLAIN TO JUSTIFY - the need for reasons or justifications for a particular outcome rather than a description of the inner workings of the logic of reasoning behind the decision-making process, particularly when unexpected decisions are made. It also ensures that there is an auditable and provable way to defend algorithmic decisions as fair and ethical, leading to building trust and demonstrating that the model complies.
- EXPLAIN TO CONTROL - Explainability can prevent things from going wrong. Understanding system behavior gives visibility over unknown vulnerabilities and flaws and can correct errors in low criticality situations.
- EXPLAIN TO IMPROVE - Models, built-in AI or data science, or any digital code must be subject to continuous improvement. If you now know why the system produced specific outputs, you will also learn how to make it smarter.
- EXPLAIN TO DISCOVER - Explanation yields new facts, and the value is that the machine can explain its learned strategy (knowledge) to us. In the future, XAI models will teach us about new and hidden laws in biology, chemistry and physics. Explainability is a powerful tool for justifying AI-based decisions. It can help verify predictions, improve models, and gain new insights into the problem at hand.
What could explainability look like?
- Analytic (didactic) statements: in natural language that describe the elements and context that support a choice
- Visualizations: that directly highlight portions of the raw data that support a choice and allow viewers to form their own perceptual understanding
- Cases: that invoke specific examples or stories that support the choice
- Rejections of alternative choices: (or "common misconceptions") that argue against less preferred answers based on analytics, cases and data
A sampling of products and toolkits for AI explainability
H20.ai offers explainability within its "Driverless AI" platform. Its method is not clear, nor is the scope. Still, they claim to produce, "Four charts are generated automatically including K-LIME, Shapley, Variable Importance, Decision Tree, Partial Dependence, and Disparate Impact Analysis," which seems like six. Still, there is no material on-line about their types of explanation.
IBM Explainability 360 is an open-source software toolkit unified API to bring together algorithms that help people understand how machine learning makes predictions and guides, tutorials, and demos together in one interface
Google Explainable AI: Explanations Score explains how much each factor contributed to the model predictions in AutoML Tables, inside your (Google) AI Platform Notebook, or via (Google) AI Platform Prediction API. The explanations service currently supports only models trained in TensorFlow.
DataRobot supports interpretable models, including a model blueprint, which shows the preprocessing steps that each model uses to make its conclusions. It is a useful feature for teams building models that must comply with regulatory agencies. DataRobot also has prediction explanations, which show the top variables impacting the model's outcome for each record.
Microsoft Azure's toolkit offers feature importance values for raw and engineered features, interpretability on real-world datasets at scale, during both the training and inference stages, plus interactive visualizations to find patterns within data.
All of these are, to a certain extent, intrusive. In other words, they open the black box or embed code to gather information about the models' operation. This diagram is from DARPA, illustrating the concern of tool vendors and application builders that explainability has too much impact on performance and function:
There is some contention that producers of AI apps have no real obligation to explain how their models work. Quantitative accuracy is probably incorrect, but it's only meant to describe progress in the tradeoff between model performance (quality of its output), the degree to which its operation is explainable. What do we mean by explanation? There are four identified types of explanations in the industry.
Explainability outside the AI black box - counterfactuals
Counterfactual: A counterfactual assertion is a conditional whose antecedent is false and whose consequent describes how the world would have been if the antecedent had been true. "If the moon had been made of green cheese, Armstrong and Aldrin would have come back with some."
There is an intense desire on the part of AI developers not to have to open the black box for explainability. One intriguing idea is that counterfactuals could be used as a means to provide explanations for individual decisions. Unconditional counterfactual explanations can be given for positive and negative automated decisions, regardless of whether they are solely automated or produce legal or other significant effects. People subjected to the inference of an AI model can be given meaningful explanations of a decision, facts to contest it, and advice on how they can change their behavior or situation to receive the desired decision possibly (e.g., loan approval).
A typical "explanation" is an attempt to convey an algorithm's internal state or logic that leads to a decision. Counterfactuals rely on a dependency on the external facts that led to that decision. This is an important distinction. In AI, the algorithm's internal state can consist of millions of variables intricately connected in a massive web of dependent behaviors. Giving a layperson this information it this way renders it impossible for them to reason about the behavior of an algorithm .
Counterfactual explanations follow this form: "You were denied a life insurance policy because your blood pressure was 150/85. If your blood pressure had been under 130/80 you would have been offered a policy." (the counterfactual is the second sentence).
Counterfactual explanations can narrow the divide between interests of affected people and model owners.
People have an apparent aversion to black-box decisions that affect them financially, healthwise, and dozens of other ways while at the same time being oblivious to certain different kinds of decisions. And it's not just affected people. A distributor of a consumer product wants to know why he has been placed on allocation. Students who have not been accepted to a college would like to know why (but at this point, that is not likely). Various participants want to understand why their material is delayed or why the price has changed in supply chains. When AI makes these decisions, demand for explainability can be heard.
In my research, I found that almost every solution is technical and requires AI modeling expertise to implement it and set it up. The only novel approach I see is the discussion about focusing on the externals rather than the internals - like the counterfactual example, which at this point, is still a research subject, but I believe it has merit.