Will a new data economics govern how infrastructures grow?

Profile picture for user mbanks By Martin Banks August 18, 2020
As businesses become evermore data-centric and data- heavy, the costs associated could shift from marginal to punitive, even to the point of affecting decisions on where businesses locate themselves.


In all the brouhaha that blew up earlier this year over Huawei and 5G service provision, an important side issue surfaced, though only obliquely. That subject was data economics and its long-term impact on the way businesses are structured out into the future.

The basis of that specific discussion point was the cost of moving data around, particularly across mobile networks, and the effect on that cost which could come from the greatly expanded bandwidth 5G would give over 4G. There is a useful metric for this: cost per Gbyte of data. This may not seem that important to some business leaders – and in particular CIOs. They may not yet consider their data volumes to be that significant.

But as AI, IoT, data analytics, Virtual Reality and other graphics applications become the standard business tools for all companies, even the smallest, every business will become heavily data-centric, and the cost of moving, storing and working with data will become a key component of every business budget. Indeed, it may play a major role in where businesses locate themselves, or at least those divisions/operations with particularly data gravity.

For example, earlier this year it was reported that the $/Gbyte cost of moving data in the UK was around $1.30, while in China that it was just $0.60c. In the USA, however, the $/Gbyte cost was $8.00. In the great scheme of things that may seem in the realm of marginal costs. But grow the numbers a bit, to just a Terabyte, and the cost is up to $1,300, and up to $8,000 in the USA. In China the cost is just $600 - or one thirteenth of the USA cost. 

As the use of AI, IoT, data analytics, VR and the rest get put together with increasing levels of edge computing, with 5G opening up consumers to a growing need to access Gbytes of data as a matter of course, the majority users will be running close to the Tbyte level, at least, most days of the week even if they think they are not big consumers of data.

For small to medium sized businesses, especially those operating in traditional business areas, the data costs associated with their location may not be punitive enough to become a major stumbling block. But for large enterprises – or cloud-based startups with a totally data-centric business plan involving the frequent movement of very large volumes of data - the economic cost of the global location of that data, and maybe the very business itself, may well be directed, even governed, by the local cost of working with data, because it will be costing real money.  

The tripod of data economics    

The chance to chat to Matt Watts, Chief Technology Evangelist for the Worldwide Enterprise & Commercial Field Organisation at storage systems specialists NetApp, gave me the opportunity to get a vendor view of this trend and its part in the way businesses will need to structure themselves and their operations out in the future. He told me:

 There are three scenarios that jump to mind. The first is where you put a data center. There is an inherent cost in having a data center facility that stores the data which different organizations will use and that is different depending on geographies. The second aspect is what are we trying to do with that data? For example, with an IoT environment, where that data gets transmitted to suddenly becomes a very important aspect. It’s probably still cheaper to do that in-country.  And then the third aspect of it is how do you move the minimum amount of data to be able to do what you need to do?”

Perhaps it should be no surprise that these three areas are also of particular interest to Watts and NetApp. It is already well into the first area by targeting the goal of getting greater efficiencies into the way data centers manage and store data. In this area it is interesting to note that he suggests that the major hyperscalers, especially those delivering cloud services, are not amongst the most efficient in that area, and they are now a prime target. For businesses users that find themselves moving large volumes of data between the source and the data center, especially if it is then crossing national boundaries, can find themselves rapidly disadvantaged.  

Ways for users to start addressing the second part of cutting back the generation of data traffic should now, he suggests, certainly include Virtual Desktop services, where the data and applications remain in the server and the users receives an interactive representation of what is being performed there. NetApp has also started work on data caching. So how can we have the data located where users can ensure they have the ability to cache the data out to any number of remote locations, such that they benefit from the lower cost of the majority where the data is stored, and provide the low latency access to different regions?

The third part is where 5G communications will play a big part. This will depend on closely managing what data is sent where over the network, with the fundamental goal being to reduce the volume. This will include data compression, compaction, and selective duplication, together with management of where that data is moved to. Caching, and its management, is likely to be a core part of the overall data economy as edge-based services develop.

Watts also sees a real role for the virtualisation and distribution of the `data center’ around the network in order to bring the compute functions and processes nearer to where the data is created. These functions involve large volumes of data, but how much needs to be retained? He used the classic example of a manufacturing AI application designed to check the quality of welds produced by a robotic system.

The data produced here covers several areas: the data governing the robot welder, which needs to be local, the weld quality check, which also needs to be local but anomalies and failures need to be reported higher up the chain together with a statement of the failure mode and the solution local services arrived at:

I think companies will do a lot more of this kind of edge compute intelligence, and where IoT is becoming AI. It's great to have all this IoT information, but we need to be able to respond to it, work on it and make decisions in real time. You can't do that if you're trying to take data back to some core location for some sort of Central Processing to then send results back again.

New architectures for new data types

In his view, this points to a fundamental re-architecting, both physical and logical, of the whole network over a period of time. And as many elements of that network are increasingly likely to be located at optimal sites where the economics of data play an important contributing part, the thought seems pertinent that a new formula may be required. 

The objective would be to create a formula to aid CIOs and CDOs decide not only what they do, but where they do it, in an environment where being cost efficient will be one of the key drivers. This will have to include not just the cost of storing and processing data, or the cost of moving it, but also factors such as the consequential cost of the latency in all actions undertaken with data.

Watts agreed with the principle, but indicated that this is starting to emerge as tools. For example, NetApp is moving in this direction with its Cloud Insights application, a monitoring tool that gives visibility across a complete infrastructure. The company has also recently acquired Spot, a company that delivers Application-Driven Infrastructures (ADIs). These are defined as cloud infrastructures that use analytics and machine learning to adapt resources to the needs of applications in real time. The company claims this can make significant savings on both cloud-delivered infrastructure for both compute and storage without affecting required SLAs: 

It's about looking at historical pricing trends across the hyperscale providers, and helping companies make decisions or even making decisions for them. Then if you want to bring online a particular service, then we can make a recommendation as to where would be the cheapest location that you could run that service and then monitor that service over time.

He pointed to one important factor he suggested CIOs and CDOs need to address to exploit data economics:  the need to create a new understanding of what data is. File-based data is still the norm, yet he sees it as just a massive `bucket’ of data. There is a need to increase the tracking, intelligence and attributes around that data. Much greater use of metadata can change a `blob of data’ into a more intelligent object with particular attributes that can be linked to particular projects:

There is a need to evolve data into something like an object platform where you can associate a piece of data with anything with certain characteristics. I think that We'll start to give us a lot more scope to create more intelligence, more awareness to make better data driven decisions over the top. That then starts to give you the foundations: you can apply formulas where everything that is this type of data from this type of device would be set to these characteristics. It would be so much cheaper to create policy-driven engines.

My take

It would surprise me that, for the majority of businesses, the actual cost of moving data around is not seen as a high priority when compared to the purchase of storage technologies, compute technologies and the 101 software tools that seem to be needed to manage the movement. But all businesses are bound to become evermore data heavy, even if they don’t want to be. Single invoices long ago started to grow from being a few kilobytes of text and are now multiMegabyte extravaganzas of branding and interactivity. Businesses, even very modest, very ordinary ones, now move Terabytes around every day. 

For bigger companies new questions, such as where to store data, and in what relationship to its generation and processing, join forces with those concerning the infrastructure, hardware and software needed to contain it. The economics of data are becoming a real issue, and will require real answers in the not too distant future.