Is the data-driven enterprise an oxymoron?
- Summary:
- Does the data-driven concept make sense? is Big Data the right approach to decision-making? We tend to focus on the obstacles to data quality, but a deeper look at the tensions in enterprise decision-making is in order.
Previously on diginomica, I asked the question, "If causal emergence is a viable alternative to reductionism, perhaps it's an answer to that 1996 question, 'Why do we need all this data?'"
In a thread on LinkedIn, my friend Matt Watkinson (author of" The Ten Principles Behind Great Customer Experiences" and "The Grid") asked, "What is the point of a business?"
Is it to earn a profit? To make and keep a customer? To maximize returns to shareholders?
To provide meaning and purpose for employees? To contribute to society? To have some fun with your friends? To seek personal fulfillment? To feed your family?"
Watkinson labels these questions as the "philosophical realm" of a business, or, what is the philosophy of your business? I was deep into philosophy early in college until I realized that if I switched to math, I wouldn't have to write papers. Hitchhiking one day, a truck driver picked me up. As we talked during a long stretch, he asked, "What do you do?" I responded, "I'm a college student." "Good for you. What are you studying?" I said, "Philosophy." In all seriousness and enthusiastically, he said, "That's great. Let me tell you my philosophy of driving trucks."
I suppose it's not unreasonable to consider the question, "Why are we in business?" within the realm of philosophy.
Watkinson asserts that facts, evidence and objectivity, the bedrock of science, are not the drivers of business decision-making. Instead, "subjectivity, opinion, and values" are. Elaborating on this premise, he observes that "data gathering, analyzing, structured experimentation, and formulating of principles," plays a minor role compared to the philosophical realm, "which dominates our decision-making, the activities we undertake, and the reasons we undertake them."
Reading between the lines, Watkinson is saying that business decisions aren't science. It's a provocative assertion. It goes to the heart of the current proposition that companies need to be data-driven. What is a data-driven company? There are many ways to define a data-driven company, but the consensus is one which makes strategic and tactical decisions that are rooted in data analytics.
There was something Watkinson said in the thread, one word: Gigerenzer. Gerd Gigerenzer is a German psychologist and a fellow at the Max Planck Institute for Human Development who has studied the use of bounded rationality and heuristics in decision-making. Gigerenzer sparked a controversy by attempting to debunk the Nobel Prize-winning duo of Kahneman and Tversky (see their 1974 article, Judgment Under Uncertainty: Heuristics and Biases) in a very public dispute (see Gigerenzer versus Kahneman and Tversky: The 1996 face-off). Henry Kissinger once opined why arguments between academics become some vicious: “Because the stakes are so low.” But reviewing that discussion is a diversion.
Bounded rationality is a human decision-making process in which we attempt to satisfice rather than optimize. In other words, we seek a decision that will be good enough rather than the best possible.
Herbert Simon proposed the theory of bounded rationality, which states that individuals do not make perfectly rational decisions because of cognitive limits (the difficulty in obtaining and processing all the information needed) and social limits (personal and social ties among individuals). I assume this is why Matt made this cryptic mention of Gigerenzer.
An article Gigerenzer authored recently in Behavioral Scientist, with the beguiling title, One Data Point Can Beat Big Data, used the example of Google's Flu Trend model released in 2008 to predict the spread of flu a week in advance to illustrate his point, the value of recency in decision-making. The Law of Recency, (Thomas Brown, 1838) states that recent experiences prevail over those in the past. Contemporary research indicates that people do not automatically rely on what they recently experienced but only do so in unstable situations where the distant past is not a reliable guide for the future:
A group of economists, including Nobel laureate Joseph Stiglitz, showed for instance that the recency heuristic can predict consumer demand in evolving economies better than traditional "sophisticated" models. And the great advantage of simple rules is that they are understandable and are easy to use.
The accuracy of the model, at first, was quite good, but Gigerenzer argued it didn't require:
50 million search terms and calculated which of these were associated with the flu. Then they tested 450 million different algorithms to find the one that best matched with the data and came up with a secret algorithm that used 45 search terms (also kept secret). The algorithm was then used to predict flu-related doctor visits in each region on a daily and weekly basis.
The problem was the emergence of a "black swan," Swine Flu, in 2009, which reversed the peak period from winter to summer. The model crashed. Google scrambled to improve the model. Gigerenzer argued that Google continued with the Big Data/AI approach:
To do so, there are two possible approaches. One is to fight complexity with complexity. The idea is that complex problems need complex solutions, and if a complex algorithm fails, it must be made more complex. The second approach follows the stable-world principle—complex algorithms work best in well-defined, stable situations where large amounts of data are available. Human intelligence has evolved to deal with uncertainty, independent of whether big or small data are available.
There is no guarantee that a complex algorithm using data from the past may yield good predictions in uncertain conditions. But Google's engineers went for more complexity. Instead of 45 search terms (features), they used "about 160 (the exact number has not been made public) and continued to bet on big data." No matter how much historical data was applied or how many algorithms were tested, they needed to develop an accurate prediction.
Gigerenzer goes to lengths to show that a prediction based on the same data used by the Google model, but using a simplified concept of recency, yielded better predictions. Simply deriving next week's forecast based on the previous week's reported numbers provided a superior prediction to Google's model, which was eventually withdrawn. This is undoubtedly counter-intuitive if you buy into the AI/big data/data science approach. In an unstable world, reducing the amount of data and complexity can lead to more accurate predictions.
My take
Watkinson referenced Gigerenzer to point out that business decisions are often not driven by data but are heavily influenced by a philosophy of the business. The nuance for that argument belongs in a separate article, but the industry is enamored with the complexity approach. In 2008, in The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, Chris Anderson, editor-in-chief of Wired, announced:
"Correlation supersedes causation, and science can advance even without coherent models … It's time to ask: 'What can science learn from Google?'" Getting it right with data-driven decision-making depends on many important factors. Only the first relates to data: bad data, weak analysis, flawed recommendations, poor communication, poor interpretation, wrong decision, faulty execution and no learning. That's why, in my opinion, the data-driven enterprise is an oxymoron.