Cloud services have grown into multi-billion dollar businesses for Amazon and Microsoft. Together with Apple, Facebook and Google, each of which serves upwards of a billion users, these cloud titans require infrastructure on a truly staggering scale. They build warehouse-sized data centers nearby cheap hydropower plants, supplemented with acres of solar panels and wind farms, often in cold climates and soon, even underwater, all to maximize efficiency and reduce the operating costs at facilities that use as much energy as a small city. The upshot of these design decisions makes clear that hyper-scale cloud facilities don’t merely differ from the traditional enterprise data center in scale, but in kind.
That same specialization, in which the incessant drive for efficiency and scale changes the nature of hyper-scale implementation, now manifests itself inside the data center – where cloud providers can’t drive continued improvements in cost, performance and scale merely by adding more and larger off-the-shelf systems. Instead, they must rethink the basic design of servers, storage arrays and network interconnects to wring every wasted Watt, unnecessary chassis and redundant cable out of the system. This translates into some different system requirements, which given the large and growing buying power of the cloud collective, is pushing the data center hardware industry to embrace radically new designs.
Cloud builders dominate OCP Summit
The public face of this warehouse-scale approach to hardware is the Open Compute Project (OCP). Founded when Facebook published specifications for a new type of rack-scale system over 6 years ago, the project’s stated mission is “to design, use, and enable mainstream delivery of the most efficient designs for scalable computing,” which it does by “openly sharing ideas, specifications, and other intellectual property” intended to increase innovation and reduce complexity of data center components. The most comprehensive update on the OCP’s progress came last week at its annual summit where the burgeoning OCP ecosystem has plenty to show both in technological improvements and industry acceptance.
As evidenced by Summit attendance exceeding 2,000, OCP has grown to include hundreds of participating companies supporting dozens of projects. Noteworthy for developing standards around Facebook’s unconventional 21-inch racks and associated equipment, the project has since incorporated a 19-inch design from Microsoft. But the real purpose of OCP isn’t redoing the data center floor plan, so much as disaggregating server and storage components so that they can be mixed and matched as a rack-scale system.
A presentation by Google’s data center guru, Urs Hölzle in which he pitched the company’s 48 V DC power distribution system epitomized the design ethos. By consolidating AC-to-DC power conversion at the rack-level, rather than at individual components, Google claims it can improve efficiency by 30%. Although Google has been doing rack-level DC power distribution for years, one of its goals in finally joining OCP is to gain broad, industry support for a standard that will incentivize vendors to produce compliant equipment in volume and lower costs for everyone: at least those fellow OCP members that require racks and racks of servers and storage.
Cloud fuels growth in hardware sales
That stipulation is key to understanding the dichotomy between warehouse/hyperscale operators and the rest of enterprise IT. Most of what happens at OCP is irrelevant to the average enterprise IT shop (unless of course, they have already migrated to the cloud, in which case it helps keep consumption rates on a downward path) since the technologies are only applicable to those operating hundreds or thousands of racks, not a few dozen servers.
The growing gulf between the needs of the cloud operators and the rest of IT is apparent not just at OCP, but in industry sales figures. As I wrote last month, cloud spending is still a tiny fraction of the total IT market, however look inside the equipment numbers and you’ll see its influence. According to Gartner’s Q4 2015 server estimates, while overall revenue grew 8.2%, Jeffrey Hewitt, research vice president at Gartner says:
The real growth driver for the quarter in terms of absolute value was the Other Vendors category. This collection of unspecified vendors that includes original design manufacturers (ODMs), like Quanta and Wistron, contributed over $750 million in revenue and over 170,000 server unit shipments for the period. This demonstrates that the growth of hyperscale data centers, like those of Facebook, Google and Microsoft, continues to be the leading contributor to physical server increases globally.
According to a Bloomberg estimate, the white box ODM segment now represents 11% of global x86 servers revenue, yet has grown 22% over the past year. OCP intends to be the vehicle driving design decisions for these ODMs supplying the cloud. Indeed, my colleagues at The Next Platform report that IDC expects “that by 2020 almost half of the servers sold into hyperscale accounts will be based on OCP designs” and that already, 80% of the new servers at financial giants Fidelity and Goldman Sachs follow OCP specifications.
Cloud equipment specialization
OCP and the cloud-scale operators are driving changes in network switches, where Facebook, Google, Microsoft and other were instrumental in making 25-50 Gbps Ethernet interfaces a standard and motivating vendors like Broadcom to develop compliant switch silicon. Facebook had previously donated its Wedge switch to OCP and provided more details this year including how it aggregates 6-packs of the base switch to interconnect its backbone network fabric.
Other evidence of cloud-scale specialization comes from Intel, a long time OCP memory and primary arms merchant to data centers large and small. The company announced a variant of its Broadwell-architecture Xeon system-on-a-chip (SoC) that was originally built to ward off ARM competition in high-density, cloud data centers. According to Intel Cloud Platforms Group VP Jason Waxman, it is:
Our highest performing datacenter SoC to date. This SoC was optimized for Facebook’s workload requirements and is enabled in an OCP form factor design.
Waxman also reiterated Intel’s commitment to integrating Xeon CPUs and Altera-derived FPGAs in multi-chip packages to accelerate specific workloads. It’s a feature cloud vendors have long desired: in fact, Microsoft has already been testing FPGAs that nearly double the speed of Bing searches.
As a supplier of core technology, Intel is on the frontline of cloud-driven growth and it estimates that demand from both cloud and telecom service providers increase at more than 20% annually for the rest of the decade. By 2020, Intel predicts that over three-quarters of all applications will be delivered via cloud infrastructure.
OCP demonstrates that warehouse-scale cloud data center operators are fueling both design innovation and equipment demand, filling a void in both created by stagnant enterprise data center spending. However, the cloud-builder’s needs are increasingly distinct and specialized from those of internal IT. Much like the dichotomy between military and civilian aircraft, the OCP crowd is pushing the technological envelope in directions that aren’t relevant to smaller deployments. Furthermore, spending by service providers, along with a few very large enterprises and government labs, is reaching a level where their needs are driving product decisions throughout the IT infrastructure ecosystem.
For enterprises that persist in operating their own infrastructure, there will undoubtedly be collateral benefits in equipment price/performance, efficiency and density. The more significant value however will flow to those using cloud services, regardless of where in the XaaS stack, to deliver enterprise applications.
Image credit - Open Compute servers at Facebook's Lulea datacenter, courtesy of Facebook