Earlier this year I wrote a post discussing how the perspective of IT needs to shift from a foundation in scarcity to one based on abundance. Exponential growth (so prevalent in technology) is something I’ve thought about for a while but I ran into a discussion this week that really brought home how deep a change in thinking is required. I worked with business intelligence activities extensively at the start of this century. While we were working with what seemed like large quantities of data at the time, the goal was to always have a controlled amount of very clean data. This allowed our limited algorithms and computing capabilities to make the most sense from the data. We gathered the raw data in an operational data store, loaded it into our star schema and started our analysis; dumping the raw data before the next load. It was effective for its time but clearly based on a ‘scarcity perspective.’
With big data, you want to start with as much wild and untamed data as you can get. There is an abundance of storage, computing and algorithms that can be brought to bear on it. With additional data, you have more observations and a greater understanding of the context in the real world, not some purified fantasy world. Sure it can be messy but you also have the opportunity to see those anomalous clusters that used to be sanitized out of existence. The fact that your ‘nice’ normal distribution actually appears to be bimodal when you look at the raw unfiltered data, points to something unexpected. That second little bump in the road may be the most important element of your contextual understanding. After all, people don’t make decisions of data, they make decisions off the context the data describes.
If a few years down the line you want to use new algorithms or compare it against new observations, by keeping the raw data set – you’ll be able to go back to that well over and over, answering questions that you can’t comprehend today.
Another example I’ve played with when talking about ‘thinking out of the box with data’ is based on the use of genomic information. In 2017, you should be able to store the genomic information of every person on earth using about $140 million worth of storage. By 2020 it will cost about $25 million based on current storage improvement trends and assuming there is no distributive technology that takes storage to a whole new learning curve. Assuming all that data is available, then that cost is well within the reach of large companies and will be increasingly available to the mid-range of companies in a few years’ time.
It is unlikely that you’ll be able to get all that information, but let’s say you can reliably get 1%, a goal that companies like 23andMe are focused on meeting. That means that for $250,000 (something in the range of most medical businesses), we can fundamentally change how healthcare is delivered.
These were a few simple examples of the shift an abundance perspective enables. New devices will enable techniques like augmented reality to present the derived context in new ways for new roles. Predictive analytics, machine learning and automation will consume our ever increasing capabilities to act upon those well understood situations, freeing up scarce resources for even greater discoveries. Fortunately, IT organizations have been dealing with these issues for decades and we’ve used these advancing capabilities to automate IT operations. We can use our experience to apply the same levers to a new mass of business processes.
But — we have to get away from the notion that scarcity is good. It never was but it was the best we could do at the time. All that has changed. We now live in an era of abundance. Let’s get over our preconceptions and learn to enjoy the messiness of abundant data with the prospect it brings to make new discoveries that were hitherto impossible. I for one am looking forward to that. Are you?
Image credit: © Mark Carrel – Fotolia.com