In part one of this series, we covered some basic probability theory principles - and compared Machine Learning approaches to Bayesian Belief Nets (Can Bayesian Networks provide answers when Machine Learning comes up short?). In this article, we'll dig a little deeper into Bayesian Belief Networks and how they can be applied to complex decisions.
Understanding Bayesian Inference
In my practice, I find most people involved with advanced analytics, such as predictive, data science, and ML, are familiar with the name Bayes, and can even reproduce the simple theorem below. Still, very few have any experience implementing Judea Pearl's Bayesian Belief Networks:
One way to explain Bayes Theorem is ascertaining the truth of A depends on the truth of B. In other words, something we already know, the probability of (B), can determine A's probability.
One would read this "Probability of A given the probability of B." The next step is to look next at the outcomes when B is true and evaluate all the outcomes in this area where A is true.
And that's it. The difference with ML is that Bayes starts with the knowledge you already have. As new information streams into the model, hypotheses seemed that unlikely display the greatest probability. Bayesian Nets have a certain aura of mystery about them, and building one of any complexity is not simple. However, despite the complexity, they can be set to solving problems that are well-known, such as anomaly detection, diagnostics, automated insight, reasoning, time series prediction, and decision making under uncertainty.
Logical deduction gives you proofs of truths, and Bayesian inference tells you the probabilities of possibilities.
For a more detailed explanation, see An Introduction to Pearl's Do-Calculus. In an article, An AI That Knows the World Like Children Do, Alison Glopnik provides a plain language of Bayesian inference:
Bayesian models, named after 18th-century statistician and philosopher Thomas Bayes, combines generative models (your ability to create hypotheses) with probability theory using a technique called Bayesian inference. A probabilistic generative model can tell you how likely you will see a specific data pattern if a particular hypothesis is true. A Bayesian model combines the knowledge you already have about potential hypotheses with the data you see to let you calculate, quite precisely, just how likely it is that the issue in question makes sense.
And from pr-owl.org, on Bayesian Networks:
Starting with what you know and formulating a hypothesis versus crunching through the data to see what it may reveal, Bayesian networks have been successfully applied to create consistent probabilistic representations of uncertain knowledge in diverse fields such as medical diagnosis, factory and other mechanical diagnosis, HR skills diagnosis, root cause analysis, trade-off analysis, image recognition, language understanding and search algorithms.
And one more succinct description by Alison Gopnik:
Bayes Nets search out causes, not just associations. It assumes that you can derive abstract knowledge from concrete data because we already know a lot. They give you a way to understand both things that happen and with interventions- things you observe others doing to the world or something you do to the world.
Applying Bayesian Inference to complex decisions
Data rarely speaks for itself (a topic for another article: all data has context, meaning, it may not necessarily be an accurate representation of events.) To add to that, people often do a poor job of speaking for data. To successfully employ Bayesian inference, you need to consider three key issues to apply it to decisions.
Key issue 1: fundamentally, you have to have some understanding of the subject and probabilities.
The use of Bayes Nets has two drawbacks. First, you have to start with a hypothesis, with some understanding of the subject matter. Unsupervised ML, on the other hand, requires analysis of the outputs. Second, and more importantly, many managers and executives have no experience in decision-making based on probabilities. If the net's highest probability is a change in pricing policy that is a departure from current practice, how is it evaluated? However, because Bayes can explain causation to a certain extent (as opposed to a Deep Learning Neural Net, whose conclusions are entirely opaque), it provides more than just a probability; it also reveals essential features of the reasoning generated that probability.
Key issue 2: the goal is the probability of possibilities, not "Yes" or "No."
Regression analysis and even classification and clustering algorithms, all parts of ML, assume the useful conclusions are in the data themselves. Any conclusions about causation are subjective after evaluating the results. No use of probability is included. With Bayes Nets, you can decide if a hypothesis is useful at 90% probability or move forward at 80%, or not unless it hits 99%. The types of ML generally being brought to market today do not deal with uncertainty.
Consider this regression analysis of the Dow Jones Index and the frequency with which a certain actress appears in magazines:
Comparing the Dow and the appearance of Jennifer Lawrence In magazines, via svds.com.
At first glance, it would appear that Jennifer Lawrence magazine appearances and the stock market move together. Run the stats, and a correlation coefficient of 0.8 pops out of the data; further "proof," right? Is there some mysterious relationship between Jennifer Lawrence and the stock market? Watch out for spurious explanations that may immediately lead your team to formulate some ideas of what underlying causes are at play. A Bayesian model would quickly show that there is, in fact, no relationship. The correlation is merely the misapplication of time-series analysis.
Key Issue 3: Bayes models can be lightweight and update in real-time.
Bayes models are continuously updated with new data streaming in. We see broad application of Bayes Nets in all types of sensor applications, especially IoT. Their lightweight code can fit into IoT's limited capacity at the edge and act on data streaming from the sensor. IoT canonical applications are sensors used to monitor and alert in factories, oil wells, aircraft, and other critical real-time issues. But Bluetooth beacons that cost $50 or less can communicate with smartphones via Bluetooth in stores to offer personalized, hyper-local, and in-store retail promotions through a smartphone app. Existing personalization and recommendation engines act on analytics that is not updated in real-time, unlike a Bayes net. Ultimately, your business can start generating real benefits by investing in promulgating the use of Bayesian Nets and inference among decision-makers.
Example: operational risk in a financial services firm.
Regulations of banks and other financial institutions require them to quantify their exposure to operational risk in much the same way as quantifying credit and market risk exposure. While there is a wealth of data and well-established statistical methods for calculating credit and market risk, no such data or procedures exist for operational risk. There are no official methods for understanding the problem of predicting rare, high consequence active loss events. The central banks need to develop strategies that incorporate the small amount of relevant historical loss data, with more subjective data about processes and controls.
One bank needed to develop an operational risk solution that satisfied Basel 2, with the additional constraint that any solution had to integrate with the organization's existing data and IT structure.
The bank developed a class of risk maps created dynamically from the bank's existing risk and control information database. The solution quantifies and rates qualitative and numeric risk and integrates self-assessment questionnaires and operational risk models. It also takes account of dependencies when modeling total losses from external and internal threats. Finally, it deals with the credibility of information and uncertainty, including differences in expert opinions.
The Bayes net provides quantitative predictions even when data is unavailable because expert judgment is built into the models. It reduces, manages, and mitigates risks, leading to reduced costs, a better reputation, and increased profits. A most valuable benefit is that it aggregates total loss forecasts over business lines by taking account of risk dependencies to forecast the capital charge in the form of a value-at-risk (VaR) distribution, "what-if?" scenario analysis.
Bayes Nets are inappropriately overlooked in the current swell of AI. ML and neural nets are drivers of more software, more capacity, and more consulting, which is why the industry is so enamored of them. A performing Bayes Net can run on a good desktop computer, or even on edge devices. (Personal note: I've developed Bayes Nets for medical diagnosis and treatment recommendation, and nuclear waste disposal safety).
Bayes models are continuously updated with new streaming data streaming in. I see broad application of Bayes Nets in all types of sensor applications, especially IoT. Their lightweight code can fit into IoT's limited capacity at the edge and act on data streaming from the sensor.
For example, Bluetooth beacons that cost $50 or less can communicate with smartphones via Bluetooth in stores to offer personalized, hyper-local, and in-store retail promotions through a smartphone app. Existing personalization and recommendation engines act on analytics that is not updated in real-time, unlike a Bayes Net. Ultimately, your business can start generating real benefits from the use of Bayesian Nets and inference for decision-makers.
There is one complication, however. Decision-makers who do not understand probability will have to adjust to being informed with probabilities of possibilities, rather than yes or no answers. It will be a culture shock to your organization to start thinking about analytics in probabilities, so start small and gain some noticeable wins. Bayesian Network software is plentiful, and models can be developed with little to no code.
Appendix: further reading - an excellent overview of Bayes Networks: A Tutorial on Inference and Learning in Bayesian Networks.