A case for cloud repatriation, but let’s be careful before extrapolating to mainstream enterprises
- Summary:
- Challenging some of the Silicon Valley influencers who've made the case for cloud repatriation...
Technology leaders usually listen when one of the largest and most influential venture capital firms in Silicon Valley opines on a topic. Thus, when two partners at Andreessen Horowitz published a lengthy article on runaway cloud spending, it created quite a discussion. In The Cost of Cloud, a Trillion Dollar Paradox, Sarah Wang and Martin Casado analyze the financial results of several dozen software companies and demonstrate the deleterious consequences of cloud spending on operating expenses and gross margins as companies scale.
Their analysis found that the committed cloud spending at public software companies amounts to about half of their cost of revenue (COR), an aggregate $8 billion for the 50 largest software companies that disclose cloud spending in their financial reports. Furthermore, Yang and Casado figure these companies could cut their infrastructure costs in half by bringing workloads in house to a privately owned and operated cloud.
A significant caveat to this estimate is the limited subset of companies reporting cloud expenses since (most?) of the largest do not share such detail in their SEC 10Qs. For example, Adobe, one of the largest SaaS vendors only itemizes a broad category of “hosting services and data center costs,” making it impossible to determine how much might be spent with hyperscale cloud providers like AWS or Google Cloud. As documented in the appendix, most of the companies considered in the a16z analysis are born-in-the-cloud SaaS firms, of which only Adobe ranks among the top-20 largest software companies.
The a16z analysis was immediately cheered by cloud skeptics and jeered by cloud advocates, however, as with everything in technology, the details are nuanced and easily manipulated to support one’s existing beliefs and preconceptions. Although the article makes a compelling case for software companies operating a business from the cloud, its analytical assumptions don’t allow extrapolating the conclusions across enterprises operating business processes in the cloud.
SaaS companies and the strong case for in-sourcing
Dropbox is the archetype those making the cloud repatriation argument invariably highlight to bolster their case. Thus, it’s not surprising that Yang and Casado use the company to demonstrate the degree of waste SaaS companies incur by continuing to use cloud infrastructure. However, the reason Dropbox makes such a compelling case for internalizing cloud operations also illustrates why its example doesn’t translate to the typical enterprise.
Five years ago, when Dropbox announced its “Magic Pocket” project to build and operate internal storage infrastructure, it explained that its scale and storage architecture meant that “our use case for block storage is unique” and that at its scale, migrating off AWS S3 would result “in better unit economics.” Indeed, the announcement details just how unique Dropbox is (emphasis added):
We knew we’d be building one of only a handful of exabyte-scale storage systems in the world. It was clear to us from the beginning that we’d have to build everything from scratch, since there’s nothing in the open source community that’s proven to work reliably at our scale. Few companies in the world have the same requirements for scale of storage as we do. And even fewer have higher standards for safety and security. We built reliability and security into our design from the start, ensuring that the system stores the data in a safe and secure manner, and is highly available. The data is encrypted at rest, and the system is designed to provide annual data durability of over 99.9999999999%, and availability of over 99.99%.
It is understandable how a company like Dropbox, whose storage requirements are within an order of magnitude of the cloud operators themselves, can save costs by eliminating the service provider and their associated mark-up. Other storage-centric SaaS providers studied by Yang and Casado like Box, Fastly, MongoDB and Snowflake that have reached 9- or 10-figure revenue numbers would likely see similar cost efficiencies by insourcing some of their infrastructure.
With half of the COR (what is called cost of goods sold, COGS, for the widget makers) goes to cloud service providers, it indicates these are companies whose main input costs are the IT infrastructure necessary to run their business, something that is not true of enterprises outside the SaaS market. In contrast, most enterprises have significant input expenses for components, equipment, labor, transportation, real estate and energy that often dwarf the amount spent on IT. More importantly, the costs of cloud services come with benefits in flexibility, capital efficiency, scalability-reserve capacity and innovative, hard-to-duplicate capabilities that justify any savings that might accrue from repatriating infrastructure for mature, predictable, well-understood workloads.
Cloud infrastructure isn’t all for predictable and well-defined systems
Repatriation cheerleaders waving their Dropbox pom-poms have forgotten the original poster child for cloud in-sourcing (the analyst community had yet to coin the repatriation buzzword), Zynga. The online video game pioneer gained fame and fortune with FarmVille and other titles and made a splash back in 2012 by heralding its migration off AWS, which it used to launch the company, to an internally owned and operated zCloud optimized for gaming workloads. Although the Zynga story gave fuel to early cloud skeptics convinced that shared services were a poor option for companies at scale, the story doesn’t end there.
An impending technological shift to mobile gaming coincided with the operational realities of a cloud-scale infrastructure made clear that running an internal cloud wasn’t all gain and no pain. As Zynga’s CIO wrote in a case study (emphasis added):
We were maintaining equipment over a typical three-year lifecycle. In our East Coast data center, which had the oldest equipment, we were replacing between 80 and 100 hard drives a month. The move to mobile gaming had begun, suggesting a very different way of doing things. Our data center infrastructure was originally built for massive Facebook games that gradually declined as players shifted to mobile. Mobile clients allowed for a richer and smarter experience and we realized that we didn’t need nearly as much equipment infrastructure to serve the emerging mobile game market.
To Zynga’s credit, it didn’t succumb to the sunk-cost fallacy and soon realized that returning to AWS was the best strategy. As its CIO rightly observed (although I would observe that almost no company, not even its competitors, can keep up):
AWS is innovating on technology at a pace that we simply cannot keep up with, while the flexibility and the learnings we achieved from operating our own cloud allowed us to find a path back to AWS.
For example, the scale and modernity of the AWS hardware allowed Zynga to cut the size of a zCloud analytics cluster in half, to 115 nodes. It shaved another 40% off after redesigning the application around AWS services. Zynga also boosted game performance by using AWS database services, cutting query times for its Poker app by 97%.
Using Zynga to declare the death of cloud infrastructure in 2012 was embarrassingly premature. Not only did Zynga do a u-turn and end up closing its two data centers, but AWS revenue has increased 15-fold since Zynga’s foray into private cloud infrastructure.
As seasoned SV VCs, Yang and Casado understand cloud technology, however Rich Hoyer, SADA's Director of Customer FinOps, suggests their column contains several misconceptions he often sees in consulting with those new to the cloud. The fundamental mistake people make is assuming that saving money is the primary reason to use cloud infrastructure. Hoyer says:
Well-architected and well-operated cloud deployments will be highly successful compared to data center deployments, period. However, ‘highly successful’ may or may not mean less expensive. A singular comparison between the ‘cost’ of cloud versus the ‘cost’ of a data center shouldn’t be made as an isolated analysis. Instead, the differential ROI of one set of costs versus the alternative should be analyzed.
Focusing on costs neglects the revenue side of the ROI analysis. Hoyer says cloud services can accelerate product cycles and time-to-revenue and expand revenue opportunities through scalable and geographically distributed infrastructure that can serve previously untapped markets. Furthermore, as all organizations adopting a hybrid approach realize, the choice between public and private cloud isn’t binary. Instead, Hoyer says organizations should identify and migrate workloads that most benefit from the cloud’s scalability, geographic flexibility and technical innovation. Hoyer suggests using the following questions as a guide:
Which workloads benefit from the elasticity, geo-flexibility, or technological innovation that cloud offers? Which workloads can really ‘take off’ if migrated, or currently rely on innovative new services only offered in the cloud? These are the best candidates to be run on a public cloud.
My take
That Yang and Casado’s paper was addressed to software entrepreneurs, not mainstream enterprise IT executives, should be clear by its extensive analysis of the correlation between P/E multiples and gross margins and the potential for significant equity price expansion through cost cutting by insourcing cloud infrastructure underscores that. Nonetheless, when influential industry voices opine on a subject as fraught with agendas as cloud computing, there’s bound to be some over-generalizing and recriminations by the cloud cynics.
Two quotes from the a16z paper provide clues to essential lessons all cloud users should internalize,:
If you’re operating at scale, the cost of cloud can at least double your infrastructure bill.
And:
You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it.
The key points being:
- Cloud services should be the default choice for both startups and new enterprise projects where access to technology, the flexibility to quickly try new designs and technologies and the ability to scale and replicate resources is paramount.
- When well-characterized and understood workloads scale immensely beyond their conditions when the cloud decision was first made, it’s time to reconsider the ROI of operating private infrastructure.
- Extrapolating usage assumptions and cost data drawn from high-scale SaaS vendors to enterprise IT in other industries will lead to faulty conclusions about the optimal public-private cloud balance since the costs and benefits of cloud services aren’t equally weighted across organizations and markets.