On my way home from two events this week, I realized I heard the term Big Data bandied about quite liberally. That got me thinking about last year when I opened up a Big Data conference in the valley by saying ‘There is no such thing as Big Data.’ I think I got the conference organizers more than a bit worried – until I explained to the audience what I meant.
Of course, Big Data will have its run, like Service-Oriented-Architecture (SOA) did a few years ago. As much as I don’t buy into the term, I’ve used it in more than one presentation as a common frame of reference. A heavy Shakespearean influence in my formative years let me manage this duality easily – after all ‘what’s in a name?’
A rose by any other name
The term I prefer is All Data. While it’s not perfect either, it is more inclusive than Big Data. And similar to SOA, the concepts and thinking behind Big Data/All Data are solid and not all hype. These concepts get us thinking about changing the design principles of data management as data processing technologies get cheaper, faster, and safer.
So let’s break down the key dimensions of the new world of data management:
1. Deep data – All the data is stored at the highest level of granularity, no data left behind. The highest level of granularity enables the greatest level of flexibility in answering complex questions on the data. Big Data does talk about high volumes of data without any consideration for small pieces of information distributed across the organization (e.g. config files – “.ini” from yesteryear.) The key here is to capture all such data – once.
2. Broad Data – All the types of data. Herein most of the times we think of text, tabular, blobs, binary etc. – based on various types of data types and hence the term variety is often used with Big Data. But breadth also includes all the data distributed broadly across the organization, across the supply chain networks, and the web.
3. Speed – The term seems self-explanatory and it implies information coming to you at breakneck speeds, streaming data captured in a millionth of a second. Big Data promotes this with the concept of velocity, and frankly I’ve been prey to this as well using “real-time” to highlight this capability. (or as the famous quote goes “Better three hours too soon than a minute too late”) Frankly, we ignore the whole idea of all the data at rest and in motion (at different speeds) e.g. slowly changing or static historical data is just as important, and perhaps more so in establishing context than recent streaming data.
4. Interactive – Ask any question of your data and receive immediate answers so you can ask the next question and iterate to the answers you are looking for. We are already expanding beyond humans asking the questions. With the Internet of Things (IoT – another fun term), on the rise we expect this now to include machines and sensors talking to humans and to each other and making decisions based on responses. While IoT is being tied to Big Data it clearly needs all the data to be effective.
5. Simple – This is a key aspect of data management – we can now store only the actual data and derive what we need from the simplest form of the data. In other words, think on-the-fly creation of aggregates with no data preparation or tuning. Yes there could be dimensions, facts, data marts etc. – but these are logical – there is no need to have materialized aggregates or store snapshots of data and then spend most of our time keeping these current and struggle to maintain one source of the truth. The Big Data term doesn’t capture this characteristic well, although some technologies in the market today deliver this simplicity.
Words without thoughts never to heaven go
Data, big or small, without context just like words without thoughts doesn’t serve much purpose. Context is perhaps the most important characteristic of information management. Without context – how can any data based analysis, simulation, or prediction be meaningful? And yet Big Data ignores context or it is vaguely implied in the combination of volume, variety, and velocity – and I’ve often heard the term ‘value’ feebly bandied about to make a fourth ‘v.’ Establishing context across all data is critical and who delivers context wins!
Context can be arrived in different ways, whether it is data semantics, or a blending of different characteristics to make it relevant for consumption. At a very basic level think of sales data. Think of a specific data item, the number 29. To understand that better you need to know the units of measure (e.g. thousands or millions), time period (e.g. hourly, daily, weekly or monthly), you need to know if the data says 29 could it mean that one day in February in the leap year. You get the picture.
Consider derived context – e.g. FICO credit scores for issuing loans, blended aptitude testing and academic score for school admissions, and even more recently, Klout scores denoting social influence – these are contextual and blended scores based on a number of criteria and data points. As you can see context is not a new idea, but with the advent of data management technologies and the growth in popularity of data science, establishing, managing and delivering context to make sense of it all has become critical.
As you like it
In summary, Big Data, All Data etc. are terms we will eventually tire of though the concepts behind these terms will prevail. In my humble opinion – Big Data is ‘All Data with Context.’ Those that can deliver the best context and help drive business decisions and outcomes will win…you can draw your own conclusions.
Editor’s note: Aiaz Kazi will be speaking on the Smart Data panel at DEMO Enterprise on Thursday, April 3 – if you are there, look him up. SAP AG Executive Board Member Vishal Sikka published a related post on the five dimensions of data on saphana.com, and he also covered this topic in part three of his Insights on HANA video series.