Most chatter about AI in other than research and academic institutions is about Machine Learning (ML) and various forms of neural nets and deep learning. Natural Language (speech recognition, language generation, text analytics) are appearing more often also. These approaches use algorithms that sift through data and find relationships (and software products to assist in the creation and maintenance of semantic metadata are a rapidly improving approach to harmonizing data from many sources). All of these disciplines are "bottom-up" processes using algorithms to process vast sets of data. The drawback is that data rarely speaks for itself – or by itself.
There are also powerful and mature techniques to generate reasonable, obtainable and useful “top-down” methods to model and understand uncertainty and to deal with probable outcomes and even causation. Top-down techniques, like Bayesian models, address key issues of uncertainty and causation. A Bayesian model produces probabilities of outcomes and in most cases, a pretty clear understanding of causation, too, something that is missing in ML until the operator offers an interpretation of the results. Bayesian statistics uses probabilities to attack statistical problems and provides for people to update their conclusions as new data.enters the model.
Judea Pearl, the “inventor” of Bayesian Belief Networks, previously quoted in a recent diginomica article, pointed out (his perception) of the difference between Bayes Nets and ML:
AI is currently split. First, there are those who are intoxicated by the success of machine learning and deep learning and neural nets. They don’t understand what I’m talking about. They want to continue to fit curves. But when you talk to people who have done any work in AI outside statistical learning, they get it immediately. I have read several papers written in the past two months about the limitations of machine learning.
Realistically, to refer to everything other than Bayes Nets as just “curve fitting” is an exaggeration (but Pearl did win the Turing Award, so his opinion is worth considering). However, many AI/ML algorithms are very productive and can offer significant value provided organizations are willing (and able) incorporate them into decision-making activities. In fact, Pearl later modified his statement to offer that many useful results can and have been produced by the application of ML. His point is that Bayesian inference, in particular, can add crucial clarity to complex decisions.
Understanding Bayesian Inference
In my practice, I find most people involved with advanced analytics, such as predictive, data science, and ML, are familiar with the name Bayes, and can even reproduce the simple theorem below, but very few have any experience implementing Pearl's Bayesian Belief Netwprks:
The simple interpretation of the equation is that if B is true and we want to determine if A is also true, we look next at the outcomes when B is true then look at all the outcomes in this area where A is true. To illustrate what the motivation for this approach is as opposes to the "frequentist statistics" behind ML, here is a summary of an excellent article from the Better Explained series:
- Tests are not the event. …a test is separate from the event
- Tests are flawed… detect things that don’t exist (false positive), and miss things that do exist (false negative).
- False positives skew results
- People prefer natural numbers… “Of those 100, 80 will test positive”
- Even science is a test…are “potentially flawed tests”
Bayes’ theorem converts the results tests into the probability of the event:
- Correct for measurement errors.
- Relate the actual probability to the measured test probability.
In an article, An AI That Knows the World Like Children Do, Alison Glopnik provides a plain language of Bayesian inference:
Bayesian models, named after 18th-century statistician and philosopher Thomas Bayes, combines generative models (your ability to create hypotheses) with probability theory using a technique called Bayesian inference. A probabilistic generative model can tell you how likely it is you will see a specific pattern of data if a particular hypothesis is true. A Bayesian model combines the knowledge you already have about potential hypotheses with the data you see to let you calculate, quite precisely, just how likely it is that the issue in question makes sense.
If logical deduction gives you proofs of truths, Bayesian inference tells you the probabilities of possibilities. You start with what you know, the hypothesis. As results flow into the model, even hypotheses that seemed that you didn't consider can rise to be the most probable. It's true, Bayesian nets have a certain aura of mystery about them, and it is not simple to build them, but they can be applied to solving problems that are not at all esoteric, for example:
“Starting with what you know and formulating a hypothesis versus crunching through the data to see what it may reveal, Bayesian networks have been successfully applied to create consistent probabilistic representations of uncertain knowledge in diverse fields such as medical diagnosis, factory and other mechanical diagnosis, HR skills diagnosis, root cause analysis, trade-off analysis, image recognition, language understanding and search algorithms.” PR-OWL Bayesian Networks
And one more succinct description by Alison Gopnik:
Bayes nets search out causes, not just associations. It assumes that you can derive abstract knowledge from concrete data because we already know a lot. They give you a way to understand both things that just happen and with interventions–things you observe others doing to the world or things you do to the world.” Alison Gopnik - Wikipedia.
Applying Bayesian Inference to complex decisions
To repeat, data rarely speaks for itself (topic for another article: all data has context, meaning, it may not necessarily be an accurate representation of events.) To add to that, people often do a poor job of speaking for data. To successfully employ Bayesian inference, you need to consider three key issues to apply it to decisions.
Key Issue 1: Fundamentally, you have to have some understanding of the subject and probabilities
The use of Bayes nets has two drawbacks. First, operators must have some understanding of the subject matter, unlike most unsupervised ML (actually, even unsupervised ML requires analysis of the outputs.) Second, and more importantly, is that many managers and executives have no experience in decision-making based on probabilities. If the highest probability from the net is a change in pricing policy that is a departure from current practice, how is that to be evaluated? However, because Bayes can explain causation to a certain extent (as opposed to a Deep Learning Neural Net, whose conclusions are completely opaque), it provides more than just a probability, it also reveals important features of the reasoning that generated that probability.
Key Issue 2: The goal is probability of possibilities, not “Yes” or “No”
Regression analysis and even classification and clustering algorithms, all parts of ML, assume the useful conclusions are in the data themselves. Any conclusions about causation are subjective after evaluating the results. No use of probability is included. With Bayes nets, you can decide if a hypothesis is useful at 90% probability, or move forward at 80%, or not unless it hits 99%. The types of ML generally being brought to market today do not deal with uncertainty.
Consider this regression analysis of the Dow Jones Index and frequency with which a certain actress appears in magazines:
Comparing the Dow and the Appearance of Jennifer Lawrence In Magazines
At first glance it would appear that Jennifer Lawrence magazine appearances and the stock market move together. Run the stats, and a correlation coefficient of 0.8 pops out of the data; further “proof,” right? Is there some mysterious relationship between Jennifer Lawrence and the stock market? Watch out for spurious explanations that may immediately lead your team to try to formulate some ideas of what underlying causes are at play. A Bayesian model would quickly show that there is, in fact, no relationship. In fact, the correlation is merely the misapplication of time-series analysis.
Key Issue 3: Bayes models can be lightweight and update in real-time
Bayes models are continuously updated with new data streaming in. We see broad application of Bayes nets in all types of sensor applications, especially IoT. Their lightweight code can fit into the limited capacity of IoT at the edge and act on data streaming from the sensor. The canonical applications of IoT are sensors used to monitor and alert in factories, oil wells, aircraft and other critical real-time issues. But Bluetooth beacons that cost $50 or less can communicate with smartphones via Bluetooth in stores to offer personalized, hyper-local, and in-store retail promotions through a smartphone app. Existing personalization and recommendation engines act on analytics that are not updated in real-time, unlike a Bayes net. Ultimately, your business can start generating real benefits by investing in promulgating the use of Bayesian nets and inference among decision makers.
An Example: Operational risk in a financial services firm
Regulations of banks and other financial institutions require them to quantify their exposure to operational risk in much the same way as they quantify credit and market risk exposure. While there is a wealth of data and well-established statistical methods for calculating credit and market risk, no such data or methods exist for operational risk. In fact, there are no official methods for understanding the problem of predicting rare, high consequence operational loss events. The major banks need to develop methods that incorporate the small amount of relevant historical loss data, with more subjective data about processes and controls.
One bank needed to develop an operational risk solution that satisfied Basel 2, with the additional constraint that any solution had to integrate with the organization’s existing data and IT structure.
The bank developed a class of risk maps created dynamically from the bank’s existing database of risk and control information. The solution quantifies and rates qualitative and numeric risk and integrates self-assessment questionnaires and operational risk models. It also takes account of dependencies when modeling total losses from external and internal risks. Finally, it deals with the credibility of information and uncertainty including differences in expert opinions.
The Bayes net provides quantitative predictions even when data is unavailable because expert judgment is built into the models. It reduces, manages and mitigates risks, hence leads to reduced costs, a better reputation and increased profits. A most valuable benefit is that it aggregates total loss forecasts over business lines, by taking account of risk dependencies, to forecast the capital charge in the form of a value-at- risk (VaR) distribution, “what-if?” scenario analysis.
Bayes nets are inappropriately overlooked in the current swell of AI. ML and neural nets are drivers of more software, more capacity and more consulting which is why the industry is so enamored of them. A performing Bayes net can run on a good desktop computer, or even on edge devices. (Personal note: I’ve developed Bayes nets for medical diagnosis and treatment recommendation and nuclear waste disposal safety.)
Bayes models are continuously updated with new streaming data streaming in. I see broad application of Bayes nets in all types of sensor applications, especially IoT. Their lightweight code can fit into the limited capacity of IoT at the edge and act on data streaming from the sensor. For example, Bluetooth beacons that cost $50 or less can communicate with smartphones via Bluetooth in stores to offer personalized, hyper-local, and in-store retail promotions through a smartphone app. Existing personalization and recommendation engines act on analytics that are not updated in real-time, unlike a Bayes net. Ultimately, your business can start generating real benefits from the use of Bayesian nets and inference for decision makers.
There is one complication, however. Decision makers who do not understand probability will have to adjust to being informed with probabilities of possibilities rather than yes or no answers. It will be culture shock to your organization to start thinking about analytics in terms of probabilities, so start small and gain some noticeable wins. Bayesian network software is plentiful and models can be developed with little to no code.
There are a multitude of good learning case studies of real-world examples using Bayes nets. Leave a comment if you would like to see more.
Appendix: Further Reading
An excellent overview of Bayes Networks: A Tutorial on Inference and Learning in Bayesian Networks: http://www.ee.columbia.edu/~vittorio/Lecture12.pdf