Weekend rant: big data is a bad joke, gimme context
- Summary:
- As an antidote to the blether around Big Data, I muse on the importance of nuanced context. Without it, all the Big Data algorithms in the world will never solve our deep seated business problems. Nor will it relieve us of the tyranny of old benchmarks.
From the time I first heard the expression a couple of years ago, I have found it difficult to keep a straight face as vendor after vendor jumped on this latest fashionista bandwagon and earnestly bent my ear about Hadoop, MapReduce and other arcane technologies.
The thought that there is an explosion of data holds no interest to me whatsoever. Especially as social media has been named as one of the really big contributors. Well...if you can find value in the inane ramblings on YouTube, Twitter or Facebook - go knock yourself out.
So as last week came and went, I was briefed on some 'stuff' that's coming up where....Big Data...will loom large. The purpose of the briefing was to sense test a presentation. I get that. It's a valuable back and forth that hopefully leads to better results for listeners. In another discussion, this time around improving retail bank portfolio trading, my correspondent used the BD expression. My take?
Welcome to Nobody Cares.
I have never ever heard an exec from any company ask for Big Data, neither do they care. Instead they want outcomes - as Ray Wang of Constellation Research opines. I say they want to solve problems and don't give a rat's ass what or where the data comes from, providing it is meaningful and contextually relevant.
Often, that means solving the problems of understanding what I call Little Data, the stuff we already have strewn across multiple systems. But that's not to say the data that is 'out there' i.e. outside our firewalls, isn't of use and value.
So what's this context thing about about?
Context is king
Once again, our good friend Vijay Vijayasankar comes up with a mind bender to keep us on our toes. He uses the everyday analogy of driving instructions to demonstrate that context is variable but vital. Data just isn't enough. I totally get that.
A few weeks ago, my wife was in another car trying to follow me home. She has been on this route a hundred times but never as a driver. She took a wrong turning and got 'lost.' Now lost is a relative term because she knew where she was, I didn't though it turns out we were no more than 400 metres away from each other. I had a rough idea but because she could not describe the location with precision, nor for that matter give me any idea which direction she was facing it took me a good 30 minutes to track her down. (Note to GPS people: is there a tracking thingy out there which I could use like they do in Aliens so I can find my missus from signals on her smartphone?)
You get the point. Being in a location you know is meaningless unless that information can be communicated in a way that makes contextual sense. Simply saying we're behind the supermarket on the side road could be anywhere. And so it often is with business problems.
Vijay argues that:
Answer to every question has a core (which has great precision) and a context (less precise , but without it -core cannot be meaningfully interpreted).
Solving these basic problems is a serious intellectual undertaking. But all too often I see people falling into the trap of reducing the problem to presumed knowledge. This from Redmonk analyst James Governor:
@dahowlett meh. seriously- understanding the right questions to ask is absolutely key
— monkchips (@monkchips) September 7, 2013
Ummm not quite. In most business scenarios, we start out with the following: "I see we have an exception (out of range result) in our (pick your metric) - what's that about?" A high level answer may easily be found such as: "We ran out of widgets so sales fell," If that is all you want to know then fine. But most often it is the next question that sets the cat among the pigeons. "How did that happen?" And then you're in trouble.
Questions upon question?
There could be 101 reasons why the particular outcome occurred. Everything from a freak of nature all the way through to an as yet hidden but systemic process failure. Our initial question is valid but where will the answer come from? Who is asking the question and what are they trying to discover? Do they even know?What patterns of activity should we be considering? What business conditions existed at the time? Who was running the show at the time? What pressures were there elsewhere that impacted that outcome? Who was sick at the time? What were the raw material inventories and capacity planning assumptions at the time? Was there a fire at the plant? And on....and on...and on.
In many situations, you can consider multiple starting points based upon past knowledge as indicators of where to go but what happens if that doesn't work? What happens if we have already exhausted many of the process efficiency opportunities? How for instance can you factor the sudden impact of a competitor entering the market from under your nose? That's a good example where the introduction of other data sources becomes valuable - maybe a Chatter feed or a tibbr flow.
Enter the data scientist?
We've been hearing a lot about the new role of the data scientist in these complex scenarios. These folk are essentially stats geeks with a penchant for modelling. To date, many of them have been hanging out in dealing rooms inside financial institutions. I'm not so sure they are a panacea.
Earlier in the year I met with Warwick Analytics who have done some amazing work around root cause analysis in manufacturing failure. Apparently, many manufacturers have come to accept that there will always be a certain level of failure and that the job at hand is therefore to minimise the impact and factor in the cost. It varies from industry to industry but is often based upon historically assembled benchmarks. Warwick goes way further. It asks the question - why something fails. The impact is enormous. But they readily agree that this is often a trial and error effort. As Simon Wardley said:
@monkchips : answer - we don't know. We have to ask a lot of wrong questions before we gain of model of understanding /cc @dahowlett
— swardley (@swardley) September 7, 2013
Side note: Don't you just love laser focus on a problem?
Out of box thinking?
My main concern in all this is that the answers to so many problems may be incapable of being modelled and reduced to an algorithm. An answer like: "Jim blew a fuse and pissed off a major customer without realizing it until it was too late" is not the sort of thing for which it is easy to build a discovery model. But it is the kind of thing that could be inferred from the intuitive use of additional data sources and the application of sensitivity analysis.
Nevertheless, I think Vijay is onto something. The question comes - how readily can we package up methods of contextualizing questions without burying ourselves in dross or without spending so much time that the answers are no longer relevant? Where I disagree with Vijay is around complexity of the core. I think that's a done deal with today's tried and tested reporting solutions, high speed in-memory databases, commodity hardware and massive compute power. But I could be wrong.
Moving one step on. how do we evolve the quality of question from 'How did X happen' on to 'How do we achieve an improved outcome in this range of scenarios?' I suspect the answer lays somewhere in among a firm understanding of both macro and micro contexts enriched by selected external data. Or maybe the question gets nuanced again to something like: 'Under what conditions can we improve margin in X region by 5 percent?' Perhaps that's where the answers to Vijay's questions point.
Over to you. But in the meantime...Big Data? Bleh!!!
Image credit: Gapingvoid cartoon, Featured image credit: SAP
Bonus points: Adrian Bridgwater has had a swipe at this topic too. Get your geek hat on!!
More bonus points: Sameer Patel has dinged this in the past as well - talking beef burritos. Don't ask - just read!