Cloud power shifts as mega vendors abstract everything up to running code
- Summary:
- Highly abstracted cloud infrastructure services are creating a clear space for enterprise innovation to thrive. Here's how.
Cloud services are shifting the competitive landscape among server vendors and their suppliers. It's a trend I've mentioned many times that's been underway for several years and both acknowledged by industry executives and reflected in respected market estimates.
For example, IDC's recent Q1 2017 market estimates showed that the top five, big name server vendors lost over 9 points of market share in the past year to ODMs and other whitebox suppliers, which saw consolidated sales grow by almost 36%. Gartner has similar numbers. The more consequential question is what's driving the change? According to one of IDC's analysts,
We may be witnessing a shift in how workloads are deployed in the future, and what architectural choices will be made around modularity, operating environments, software, and cloud services. As indicated by this quarter's results, one large server customer appears to be betting on a major transition to cloud services, as it alone accounted for approximately a quarter of a million servers deployed in the first quarter.
In last week's column in making a case for renewed competitive urgency at Intel, I was less equivocal.
The exploding growth of mega cloud providers that are both extremely sophisticated buyers and looking for every iota of competitive edge has significantly changed server market dynamics. Add in new, workload-optimized system architectures to power their infrastructure.
The fact that Amazon, Google and Microsoft spend billions of dollars a year on data center equipment and have armies of the smartest engineers and developers optimizing their infrastructure and processes is a significant catalyst for the change.
These companies have the financial and intellectual resources to customize systems to their precise requirements and see less and less value from middlemen like Cisco, Dell, HPE and Lenovo. Their factor-direct, hands-on approach to data center procurement from design through delivery has fed growth at the Asian ODMs and alternative equipment standards under cooperative groups like the Open Compute Project (OCP), the Open Network Operating System (ONOS) and the OpenPOWER Foundation .
However, there's another factor fueling change that is more important in the long run, and it results from something I covered earlier in July: the higher level of abstraction cloud services provide shield service consumers from hardware implementations.
Decoupling IT applications and services from infrastructure
Although the oldest cloud services are little more than VMs-by-the-hour in which the user does worry about OS-level details like processor architecture and source code compatibility, more advanced services operate at higher levels of abstraction by delivering application, network, database and AI services that insulate the user from the implementation. As I recently wrote,
...AWS has spent the intervening decade relentlessly developing higher-value database, data analytics, AI, application, serverless, DevOps and automation services that progressively decouple users from the details of the underlying server OS and hardware implementation. … Each cloud service portfolio fits the textbook definition of a good abstraction by generalizing things that can be made abstract while enabling customization and integration through configurable parameters and APIs.
Using the cloud service delivery model, the user neither cares whether the provider is using a server from HP or Inventec, nor does it matter if the system is powered by an x86, AMD, ARM, POWER or GPU.
Google drove home this point last year when it revealed last year that it has been using a proprietary TPU chip for many deep learning algorithms, a detail that was previously both unknown and irrelevant to users.
The cloud is particularly challenging for traditional IT equipment powerhouses that historically catered to internal IT departments operating private infrastructure and for whom consistency, standardization and vendor hand-holding are highly valued.
Not only do the mega clouds have little need of legacy vendor hand-holding, by abstracting away infrastructure minutia, the cloud shifts influence over digital initiatives and implementations away from traditional IT to application developers and business analysts.
Consequently, the focus of the architects of enterprise digital technology is on application, data and AI services, not system specifications and CPU instruction sets. By moving the decision from speeds, feeds and instruction sets to services, features and APIs, the cloud serves to catalyze a renaissance in innovative system designs using chip internals optimized for different workloads.
Cracks in x86 hegemony
Intel can legitimately channel Twain since any rumors of the death of the x86 are greatly exaggerated, however at the mega cloud providers, the versatile, do-it-all Xeon will be increasingly displaced by platforms that are superior at specific workloads.
Whether on price, performance, efficiency, system density or power consumption, alternatives both within the x86 ecosystem, like AMD's new EPYC chips, or those using vastly different architectures like ARM, POWER, NVIDIA or custom will increasingly displace Intel workhorses as the engines powering cloud services.
Earlier this decade, as Google pondered expanding its use of deep learning algorithms for things like image tagging, voice search and chatbots, it reached an alarming conclusion.
The computational expense of using these models had us worried. If we considered a scenario where people use Google voice search for just three minutes a day and we ran deep neural nets for our speech recognition system on the processing units we were using, we would have had to double the number of Google data centers!
There had to be a better solution than throwing more money at the problem, and, by inference, at Intel. Google answered that problem by establishing a project to develop a processor optimized for the deep learning algorithms that are the heart of such intelligent services. Instead of using a processor designed for a broad spectrum of traditional applications, Google tailored its hardware to the problem.
In short, the TPU design encapsulates the essence of neural network calculation, and can be programmed for a wide variety of neural network models. To program it, we created a compiler and software stack that translates API calls from TensorFlow graphs into TPU instructions.
The result is a product that improves performance, lowers cost and has become the foundation of Google's AI services,
TPUs allow us to make predictions very quickly, and enable products that respond in fractions of a second. TPUs are behind every search query; they power accurate vision models that underlie products like Google Image Search, Google Photos and the Google Cloud Vision API; they underpin the groundbreaking quality improvements that Google Translate rolled out last year; and they were instrumental in Google DeepMind's victory over Lee Sedol, the first instance of a computer defeating a world champion in the ancient game of Go.
As I detailed in covering the NVIDIA developer conference this spring, Cloud AI services and the algorithms underpinning them have also fueled explosive growth in the use and technological evolution of GPUs. Not only do each of the major cloud services offer GPU-powered compute instances for DIY development, but GPUs are behind higher level products like Azure Cognitive Services, Amazon Alexa and Google Cloud Machine Learning.
Not just for AI
Although the cloud providers are guarded about releasing infrastructure details that provide a competitive advantage, they say enough to show active work with other non-x86 platforms that promise better performance for particular workloads. For example:
- Google Cloud has worked with Rackspace to develop an OCP motherboard for POWER9 CPUs as it looks "forward to a future of heterogeneous architectures within our cloud."
- Microsoft's porting Windows Server to the ARM platform and committing to deploy ARM servers in Azure. The company has also released an OCP server design (Project Olympus) that is compatible with ARM server processors for its "cloud services, specifically our internal cloud applications such as search and indexing, storage, databases, big data and machine learning. These workloads all benefit from high-throughput computing."
- Microsoft has developed and deployed an FPGA associated system to accelerate Bing and Azure searches. As the head of Azure's Cloud Silicon team put it, "FPGAs used to be relegated to the back room, performing tasks sent to them. Now, the FPGAs are the first to see every message going into the server, enabling them to both make decisions on how to handle each message and perform the work, often without the processor’s involvement."
- Amazon's 2015 acquisition of a startup that develops custom ARM-based processors along with rumors of a major expansion demonstrate a commitment to building custom, ARM-based SoCs. Although the company has said little about its plans, it is likely developing proprietary silicon for consumer products like smart home devices, industrial IoT devices (using Greengrass) and AWS servers.
My take
The disruptive effects of cloud services are multifaceted, spanning IT operations and budgets, application design and deployment, data center system architecture and equipment supply chains.
The conceptual foundation for much of the dislocation results from increased levels of service abstraction that isolate users of IT applications and infrastructure from the implementation.
My focus earlier this month was on the implications of public cloud services on private cloud products, however by turning IT services into a utility, the mega clouds are upending the data center equipment market. Just as a homeowner doesn't care if their electricity comes from a hydro turbine or photovoltaic farm, users of cloud services don't care if they run on x86 hardware or custom silicon.
Enterprise IT has always lived on inertia, so change is never rapid. However those organizations that can exploit the new economics of cloud-optimized hardware will have an advantage. Similarly, incumbent system and component suppliers that can best adapt to the needs of infrastructure driven by service requirements, not legacy compatibility, can thrive, while those that persist in milking cash cows will see customers go elsewhere.