AI has a black box explainability problem - can outcome analysis play a role?

Profile picture for user Neil Raden By Neil Raden January 7, 2020
Summary:
One of AI's major stumbling blocks is explainability. But can we address AI's black box by evaluating outcomes? One example from the insurance industry pushes this debate forward.

black blox

So why is there so much concern about black boxes?

According to an article on Medium, The ‘Black Box’ Problem of AI:

We need to ensure that organizations that deploy and utilize these systems remain legally responsible for any damages caused. However, legislation cannot solve everything, partly because it takes much more time to generate law and norms than it does to generate code. It is, therefore, vital that the 'architects' of our digital society…should be fully aware of the potentially harmful effects of their technology on society and that they should make positive efforts to limit these.

When you put an AI model in production, it can fire thousands or millions of times, evaluating people's resumes, deciding to issue a mortgage or to underwrite someone for insurance. In some cases, the decisions made by the AI model are pretty straightforward (though how the training process developed the criteria can be a little murky). In other models, particularly those made by so-called "deep learning," neural networks are devilishly difficult to understand. The culprits today are machine learning technologies, which include neural networks, but not so-called AGI, Artificial General Intelligence.

According to the analytical software company SAS:

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention.

That's a pretty high-level definition. In actual practice, it isn't that sophisticated. What machine learning doesn't do is pose a question to solve, select the data, and prepare it, select the algorithms to use, and decide when the model is good enough to use. People do that. Machine learning itself is just math: finding nearest neighbors, doing regressions, calculating gradient descent to a specified cost function. In other words, machine learning isn’t "intelligent" at all, as we understand intelligent.

But the algorithm is the antidote to latency and engine of efficiency.

When James Taylor and I wrote the book Smart (Enough) Systems over a decade ago, we dealt with the same issues - just different technology. The process of building a decision model used the same statistical methods as machine learning to generate "inferencing" models such as decision trees or scoring models, which were implemented the models as rules models and which also fired for each new input.

Good models tend to degrade over time, so we proposed an idea called “adaptive control.” In the diagram below, we depicted the best profit trajectory and continually tested it with alternatives (A/B testing is the general term or champion/challenger). The various curves illustrate that by changing models, you achieve the best outcome:

neil-raden-decision-models

Figure 1 - Copyright 2007 James Taylor and Neil Raden

The point is that you don’t push a model out on the public and let it run until someone starts to complain. The issue of explainability in black-box AI models is concerning, but one approach to dealing with it, as above, and as explained below, by evaluating the outcomes.

It’s this last point that leads to an interesting discussion about the ethics of black boxes. Daniel Schreiber, CEO & Co-Founder at Lemonade Inc., an Insurtech startup, wrote a blog recently: AI Can Vanquish Bias: Algorithms We Can’t Understand Make Insurance Fairer. I wrote about a similar claim recently regarding resume bots and AI ethics.

First, a few definitions: Insurtech is an exploding field backed by billions of dollars of venture capital. The idea is revolutionizing the industry with technology. An open question is: Is an Insurtech company a tech company that does insurance or an insurance company designed around tech? There are basically three types of Insurtech companies. The first, Aggregators, aren't insurance companies at all, they use the cloud and AI to market for insurers and offer applicants a service to find the best deal. This sort of business is no different than other aggregators that find car or truck deals, or a travel aggregator like Orbitz or Expedia.

The second type is more like an insurance company in that it handles the marketing, sales, application, underwriting, and policy issuance and may or may not process claims too. However, they assume no risk, as their policies are backed insurers or reinsurers. For this reason, they can grow as fast as they can because the capital requirements placed on insurers by regulators for solvency do not constrain them.

The third type is a functioning insurance company, licensed by the jurisdictions they service and compliant with the same regulatory and statutory burdens of any other insurance company. Lemonade is this third type, an Insurtech company that offers Homeowners and Renters Insurance. Schreiber begins to support his premise by asking the operative question:

Insurance is the business of assessing risks and pricing policies to match. As no two people are entirely alike, that means treating different people differently. But how to segment people without discriminating unfairly?

This isn't a new issue for insurance companies; they have been dealing with it long before the tern "Insurtech" was born. But the problem Schreiber poses is central to companies like Lemonade. In insurance terms, Schreiber lays out the issues in three phases:

Phase 1: Every customer pays the same amount for each unit of coverage. That seems fair. It eliminates discrimination. Or does it? Suppose we're looking at auto insurance. It favors those who are careless and penalizes those who are careful. It tends to make the insurance more expensive than competitors who evaluate more closely, which has the effect of sending the better drivers away, a concept in insurance called "adverse selection."

Phase 2: The population divided into subgroups according to their risk profile. This where the insurance industry is mostly today. It seems more reasonable than Phase 1, but it is entirely possible that these risk groups, created from data, maybe proxies for protected classes. Also, in reality, it is just more than one group like Phase 1, where some are bearing the burden of poorer risks in the group.

Phase 3: Using machine learning to break the groups down to a group of one. As Schreiber says:

Insurance remains the business of pooling premiums to pay claims, but now each person contributes to the pool in direct proportion to the risk they represent – rather than the risk represented by a large group of somewhat similar people.

Schreiber claims that Phase 3, which is only a proposal at this point, (the state's Insurance Commissioners regulate personal lines insurance, so such a radical change in rating would have to be approved) offers an opportunity to use AI, even black-box AI, to develop a genuinely fair and non-discriminatory way of assessing premiums. But how would one know that?

Just like the adaptive control diagram above, by evaluating outcomes. Schreiber uses differential loss ratios as the answer. A loss ratio is the percentage of claims to the total premiums. There are many variants of this, but for brevity, we can leave it at that. What Schreiber proposes is: once you've charged people a differential premium (that means you can evaluate their risk so precisely), you can then examine differential loss ratios, meaning, any schema of grouping should have uniform loss ratios. Use machine learning to create groups of people in novel ways, and if the loss ratios are not the same for any arrangement of grouped policies, then you've made a mistake.

This is all speculative, but not unreasonable, but how does this help us with the broader problem of black boxes? Is turning to outcomes the method for handing black boxes?

My take

This approach is plausible for insurance that deals with individuals, but it remains to be seen if it would work. The driving factor for AI automation is speed, where retrospective analytics may not uncover a problem until it is too late. However, I think this is on the right track. As the Zen Master Suzuki said, "If your cow is unruly, give it a bigger pasture." Perhaps we should look to AI to police AI.