Premise. Hype about artificial intelligence (AI) and machine learning (MI) for data-related activities obscures reasonable, obtainable and useful top-down methods to model and understand uncertainty and to deal with probable outcomes and even causation. Top-down techniques, like Bayesian models, address key issues of uncertainty and causation.
The term Machine Learning (ML) has become ubiquitous and generic. An executive of an AI research firm told Wikibon recently that, “announcements of added ML capabilities in existing software products are as common as twisters in trailer parks.” This follows a long trend of industry buzzwords from analytics, advanced analytics, predictive analytics — and now ML — vendors. ML includes a wide variety of tools and techniques. Those most common additions in commercial software tend to group around algorithms to sift through data and find relationships, and to assist in the creation and maintenance of semantic metadata. The drawback is that data rarely speaks for itself – or by itself. As a result, vendor marketing usually positions the benefits of these technologies in terms of significant hardware, software, and services costs.
However, many of the most productive AI/ML algorithms are quite mature and, properly applied, can generate significant returns today – if business executives take steps to incorporate them into decision making activities. Wikibon believes that Bayesian inference, in particular, can add crucial clarity to complex business decisions, but that requires execs to:
- Understand Bayesian inference.
- Address three challenges to applying it to business decisions.
- Use a case model approach to promulgate adoption.
Understanding Bayesian Inference
Bayesian models, named after 18th century statistician and philosopher Thomas Bayes, combines generative models (your ability to create hypotheses) with probability theory using a technique called Bayesian inference. A probabilistic generative model can tell you how likely it is you will see a specific pattern of data if a particular hypothesis is true. A Bayesian model combines the knowledge you already have about potential hypotheses with the data you see to let you calculate, quite precisely, just how likely it is that the issue in question makes sense.
If logical deduction gives you proofs of truths, Bayesian inference tells you the probabilities of possibilities. You start with what you know, the hypothesis. You won’t jump to an unlikely hypothesis right away. Still, if enough data accumulates, even an initially unlikely hypothesis can turn out to be right. Starting with what you know and formulating a hypothesis versus crunching through the data to see what it may reveal, Bayesian networks have been successfully applied to create consistent probabilistic representations of uncertain knowledge in diverse fields such as medical diagnosis, factory and other mechanical diagnosis, HR skills diagnosis, root cause analysis, trade-off analysis, image recognition, language understanding and search algorithms.
Bayes nets search out causes, not just associations. It assumes that you can derive abstract knowledge from concrete data because we already know a lot. They give you a way to understand both things that just happen and with interventions–things you observe others doing to the world or things you do to the world.
Address Three Considerations to Applying Bayesian Inference to Complex Business Decisions
To repeat, data rarely speaks for itself. To add to that, people often do a poor job of speaking for data. To successfully employ Bayesian inference, execs need to consider three key issues to apply it to business decisions.
Key Issue 1: You Still Have to Know the Subject and Probabilities
The use of Bayes nets has two drawbacks. First, operators must have some understanding of the subject matter, unlike most unsupervised ML (actually, even unsupervised ML requires analysis of the outputs.) Second, and more importantly, is that many managers and executives have no experience in decision-making based on probabilities. If the highest probability from the net is a change in pricing policy that is a departure from current practice, how is that to be evaluated? However, because Bayes can explain causation to a certain extent (as opposed to a Deep Learning Neural Net, whose conclusions are completely opaque), it provides more than just a probability, it also reveals important features of the reasoning that generated that probability.
Key Issue 2: The Goal Is Probability of Possibilities, Not “Yes” or “No”
Regression analysis and even classification and clustering algorithms, all parts of ML, assume the useful conclusions are in the data themselves. Any conclusions about causation are subjective after evaluating the results. No use of probability is included. With Bayes nets, you can decide if a hypothesis is useful at 90% probability, or move forward at 80%, or not unless it hits 99%. The types of ML generally being brought to market today do not deal with uncertainty.
Consider this regression analysis of the Dow Jones Index and frequency with which a certain actress appears in magazines:
Figure 1: Comparing the Dow and the Appearance of Jennifer Lawrence In Magazines
Source: https://svds.com/avoiding-common-mistakes-with-time-series/
A quick eyeballing of the graph seems to suggest that Jennifer Lawrence magazine appearances and the stock market move together. Run the stats, and a correlation coefficient of 0.8 pops out of the data; further “proof,” right? Is there some mysterious relationship between Jennifer Lawrence and the stock market? Many other investment-heavy may immediately lead your team to try to formulate some ideas of what underlying causes are at play. A Bayesian model would quickly show that there is, in fact, no relationship. In fact, the correlation is merely the misapplication of time-series analysis.
Key Issue 3: Access to the Data Reveals the Patterns, Not Access to Hardware
Bayes models are continuously updated with new data streaming in. We see broad application of Bayes nets in all types of sensor applications, especially IoT. Their lightweight code can fit into the limited capacity of IoT at the edge and act on data streaming from the sensor. The canonical applications of IoT are sensors used to monitor and alert in factories, oil wells, aircraft and other critical real-time issues. But Bluetooth beacons that cost $50 or less can communicate with smartphones via Bluetooth in stores to offer personalized, hyper-local, and in-store retail promotions through a smartphone app. Existing personalization and recommendation engines act on analytics that are not updated in real-time, unlike a Bayes net. Ultimately, your business can start generating real benefits by investing in promulgating the use of Bayesian nets and inference among decision makers, and not a great deal of hardware or other infrastructure to build, run, and maintain complex ML systems.
Three Case Studies: Using Bayesian Nets
To help you generate excitement for applying the technique (as opposed to buying a bunch of ML-related products and services), we offer three case studies for your consideration.
Case #1: Operational Risk in a Financial Services Firm
Problem
Regulations of banks and other financial institutions require them to quantify their exposure to operational risk in much the same way as they quantify credit and market risk exposure. While there is a wealth of data and well-established statistical methods for calculating credit and market risk, no such data or methods exist for operational risk. In particular, there are no established methods for dealing with the problem of predicting rare, high consequence operational loss events. The major banks need to develop methods that incorporate the small amount of relevant historical loss data, with more subjective data about processes and controls.
One bank needed to develop an operational risk solution that satisfied Basel 2, with the additional constraint that any solution had to integrate with the organization’s existing data and IT structure.
Solution
The bank developed a class of risk maps created dynamically from the bank’s existing database of risk and control information. The solution quantifies and rates qualitative and numeric risk and integrates self-assessment questionnaires and operational risk models. It also takes account of dependencies when modeling total losses from external and internal risks. Finally, it deals with the credibility of information and uncertainty including differences in expert opinions.
Benefits
The Bayes net provides quantitative predictions even when data is unavailable because expert judgement is built into the models. It reduces, manages and mitigates risks, hence leads to reduced costs, better reputation and increased profits. A most valuable benefit is that it aggregates total loss forecasts over business lines, by taking account of risk dependencies, to forecast the capital charge in the form of a value-at-risk (VaR) distribution, “what-if?” scenario analysis.
Case #2: Retaining Talent in China
Problem
Talent retention has become a significant issue for international firms in China. High quality local employees often quickly switch companies, and employees who are trained by their employers often seek better positions once their skill set has been enhanced. The goal is to systematically hire and retain quality employees.
Solution
A Bayesian net was developed to 1) establish variables of interest, 2) collect data, 3) encode domain knowledge, 4) create a predictive model, 5) collaborative utility estimation, 6) test model, 7) implementation of access system, 8) post-implementation, establish tasks and variables of interest.
The first task was to establish which variables may be of interest. The list of variables collected included employment history, education, language ability, age, sex, relationship status, demand for applicant’s skills, salary, professional training, language training, overseas opportunities, position type, hours, career path opportunities, Chinese managerial viability (including head office), prestige, outcome, length of employment – and average company satisfaction with employee over length of employment.
Questions to be answered were:
- Should the prospective employee be hired?
- If he/she is hired, what sort of contract and conditions should he be given to maximize the expected value of the hire, measured in terms of satisfaction with his/her contribution and retention whilst minimizing costs?
Benefits
Hiring managers saw a marked increase in hiring and retention of qualified applicants by continually refining their model and hypotheses based on the experience over a two-year period.
Case #3: Railroad Customer Satisfaction
Problem
A large railway transport organization in Australia considers Customer Satisfaction as one of its most significant business Key Performance Indicators. In order to gauge current customer satisfaction levels, Queensland Rail utilizes a traditional questionnaire based survey methods. Although the surveys provide detailed information, they are rarely translated into a model and therefor the capability to draw inferences from the survey data is limited. For example, it is difficult to understand how individual factors affect overall customer satisfaction, make predictions about satisfaction levels, or analyze various cause and effect scenarios in order to plan customer service management. Bayesian Network models can successfully meet these requirements.
Solution
The principal objective of this model was identified as the level of Customer Satisfaction among the customers, which translated into a top level node Customer Satisfaction in the Bayesian Network. The data made available by the railroad for the purpose of developing the model consisted of a questionnaire. Customers were asked about their experiences regarding various attributes (factors) of customer service. The survey was conducted with a large number of passengers traveling at peak and off peak times. For each service factor, the customer responses were categorized as Positive, Neutral or Negative experiences.
Benefits
The results showed that any probability change in a particular node affects all the subsequent child nodes but other parallel nodes and their child nodes will remain unaffected. The model’s capabilities in analyzing changes in multiple nodes makes Bayes net models a powerful tool for decision support. A manager can use such a tool in two ways:
- As a planning tool, whereby the manager can create hypothetical scenarios and simulate the outcomes before finalizing an action plan. This provides the manager with quantitative and visual comparisons between decision options.
- As a performance management tool, whereby the manager evaluates the changes in the overall performance based on completed actions. This provides the manager with the capacity to compare planned and achieved goals.
Action Item. Regard claims of ML capability in software with some skepticism. These capabilities will no doubt improve, but for the present, they are more talking points than features. Investigate the use of Bayesian models. It will be culture shock to your organization to start thinking about analytics in terms of probabilities, so start small and gain some noticeable wins. Bayesian network software is plentiful and models can be developed with little to no code.
Appendix: Further Reading
An excellent overview of Bayes Networks: A Tutorial on Inference and Learning in Bayesian Networks: http://www.ee.columbia.edu/~vittorio/Lecture12.pdf