This is a subject, however, that plays an important part in the thinking of Databricks CEO Ali Ghodsi, and he sees fundamental, structural changes coming in the way data is exploited in the coming years.
Finding new, meaningful ways of describing the rate of data growth is one of the interesting by-products of all this. For example, most data-driven businesses have generated the majority of all their data in their last year of operations. According to Ghodsi, the figure for Facebook is estimated at 70%. And an increasing amount of that data for all businesses is now out in the cloud, which is a key component of the changes he sees coming.=:
So if you just decide to move to the cloud this year, in about a year's time the majority of your data will be in the cloud. But usually the data you have isn't good enough. You need to combine it with other data and enrich it. Well, guess where all the other data that you want to enrich it with is - it's in the cloud.
He sees a new economy of data providers emerging, where data is the `new oil’. Because they hold the data they will in a position to trade it, trade vast datasets that can give any company all the background and historical data they could wish for about their chosen market sector or technology development. These new data providers are already a key target for Databricks.
As a marker of the potential size of such markets he talked of one unnamed customer charges its customers $200 Million to access one particular dataset. And the customer maintains control of the process throughout because said dataset is big, so the only practical way it can be available is in the cloud.
Let's make money
The new model here is that the question now goes to the source of the answer, like Greeks visiting the Oracle of Delphi. With this in mind, and given the continued exponential growth of data volumes it is possible to see that estimates of cloud services – currently estimated to be a $50 billion marketplace – growing to $250 billion in fairly short order may not be overly optimistic, reckons Ghodsi:
I think it's going to be a $1 trillion Market, not just the $250 billion. I think nearly every large corporation on the planet will outsource all of IT more or less to the Amazons of the world.
It makes pretty obvious sense as well. Ghodsi reckons that most in-house IT departments would now acknowledge that the big cloud service providers are much better resourced. When individual on-premise budgets are always under pressure they are always going to end up on the distaff side when it comes to both systems investment and attracting new talent and expertise, so outsourcing IT to them can only make more sense:
But the big companies that are outsourcing their IT still own the data. They're not giving it to Amazon, they're storing it in Amazon but Amazon is not allowed to look at it. And presumably, they're not looking at it, because if they were or it leaked that they were, they'd probably go out of business. So that's the silver lining.
Refining the 'crude oil' is where the answers will emerge
So if data is the 'new oil', the people who own the new 'oil wells’ are still going to be the big, existing corporations, that have been around for years. But, as with actual oil wells, the real money will be in selling the refined products extracted from the crude oil. This does give cloud service providers like Amazon the chance to move in if they can start negotiating the rights to depersonalise specialist cuts of some customers’ data and sell access to that as well.
Whatever happens here he sees vendors like Databricks being on the winning side, because they offer a platform that doesn't look at the data but just provides a toolkit that unifies data science and data engineering. There is, Ghodsi suggests, the added advantage that it also makes limited staff resources much more productive:
What if I make two people like ten people, because the unified platform is raising the obstruction level, making their lives easier so they don't have to do all the detail stuff? So that's what we're doing. They don't want to outsource the part that is their real IP their secret sauce.
Using Databricks then becomes a key part of building the questions businesses need answers to. It gives them the chance to combine their own data with those large, valuable datasets and patent the process. It also helps them to keep their data scientists, putting them on contracts and their names a secret, and maybe locking those people in with huge salaries and equity so they can't leave the company. – that is the way he sees the future developing.
He also sees Databricks as the intermediary between those huge data 'wells' and a company's need to develop the AI functionality that they require:
That's exactly what we are. We are the layer between the data link and the data scientists of those companies which are building this technology and the cloud providers below us are basically the commodity storage and so on.
And that layer will become more important the more big companies move towards a multi-cloud service strategy to avoid getting locked in to one cloud provider. To this end Databricks already has a partnership with Microsoft Azure, not least because Microsoft has such a deep relationship with enterprises with well over 75% of Fortune 2000 enterprises having an enterprise agreement with the company.
It is also talking with AWS and Google about similar partnerships, which is an obvious move all round. If using Databricks can help businesses can use all three, they are very likely to want to use all three, a situation that downgrades straight competition between them and opens up scope for specialisations in types of services offered.
This also complements the way he sees businesses wanting to exploit the data, building AI environments from a large a number of little projects rather than pitching straight at the huge Meta-AI project. Ghodsi concludes:
Basically, the transformative companies of the future will all be doing this.
If anyone thought that IT was a place to make real money it is likely to soon be seen as chicken feed compared to the revenues that will be generated by holding and exploiting data. And expect some old established names to come out on top in the long run.