General Motors' decision to bring its datacenters back in-house after decades of outsourcing to its spin-off EDS (now part of HP) is indicative of a sea-change in the computing strategies of large enterprises triggered by the rise of cloud connectivity and smart devices.
GM's two new mega datacenters will each house 10,000 or more low-cost servers, reports GigaOm, which are required to service "the explosion of on-board vehicle electronics coupled with GM’s unique need for reliability" (Pro subscription required for full report).
GM's two new datacenters — the first opened in May, the second is set for completion by 2015 — will replace no less than 23 separate facilities operated by its various outsourcing partners. Designed to utilize many of the best practices pioneered by cloud giants Google, Amazon and Facebook, GM's datacenters will necessarily operate more of a hybrid infrastructure than these cloud pureplays. The GM facilities will also accommodate "significant legacy systems, mainframes, and traditional vendors," GigaOm reports.
For those thousands of low-cost servers, best practice in economies of operation would argue for GM to turn to custom-built OEM designs from the likes of fast-growing Quanta or Tyan rather than traditional server suppliers such as HP or Dell. They may even choose to source designs based on the open-source hardware specifications of the Open Compute Project (OCP).
Rise of Open Compute
Originally developed to meet Facebook's cloud datacenter needs but subsequently opened out to a broad community of cloud datacenter operators, the Open Compute designs are already finding their way into the datacenters of financial giants such as Fidelity Investments and Goldman Sachs. Don Duet, managing director Goldman Sachs is one of the five OCP Foundation board members.
As more and more large enterprises consolidate computing into mega datacenters like GM's to leverage the economies of scale and flexibility of cloud computing, so the penetration of Open Compute designs into those datacenters seems destined to rise.
The demand for OCP designs stems from the lack of a pre-existing industry standard for cloud datacenter servers.
Traditional PC servers are built to satisfy a wide range of operating parameters — they have to be designed to operate reliably at anything from a very light load to an extremely heavy load, with many different combinations of memory, disk and other peripherals.
Operating parameters in a cloud datacenter lie within a far narrower range. Thousands of machines all share the same configuration, and the load is distributed so that each machine is either fully loaded or off; no power is wasted on keeping machines underutilized, which completely changes, for example, the preferred specification for the power supply unit.
For several years now, the large-scale cloud operators have by-passed this mismatch by populating their datacenters with built-to-order, custom servers. Facebook's aim in forming the Open Compute Project was to create a more broadly-based, collaborative design based on open-source principles that could go further than these individual proprietary custom designs and also create a larger market for which server makers would be motivated to build.
The OCP designs extend beyond servers into datacenter infrastructure minutiae as diverse as power distribution, rack design, networking and interconnects.
All of this is in furtherance of three core aims:
- reduce running costs,
- consume less power and
- generate less waste.
Facebook is measuring the impact at its newest datacenter, which went live earlier this year in Luleå, in northern Sweden.
This is Facebook's greenest datacenter yet, relying almost exclusively on hydroelectric power, and using ambient temperature rather than chillers for cooling. Early measurements suggest a power usage effectiveness (PUE) of 1.07, which means the entire operating overhead of the facility amounts to just 7 percent of the power consumed by the computing equipment within it (anything less than a PUE of 1.3 is considered good for a modern datacenter).
Luleå is the first datacenter to run 100 percent on Open Compute hardware. Speaking at GigaOm's Structure Europe conference earlier this month, Facebook's VP of infrastructure and OCP president Frank Frankovsky said that the experience had shown the machines are significantly more reliable than traditional designs.
"One of the consequences of going 100% Open Compute is the quality we're seeing," said Frankovsky. Based on the number of requests for a technician to attend a machine, "It's around 3 percent in the US; we're seeing 1 percent in Sweden."
The largest difference was in callouts to database machines, which had fallen from 5 percent to 1 percent, although that is probably due to Facebook's Open Compute database machine spec moving from traditional hard drives to a new 3.2-terabyte solid-state flash memory card designed for Facebook by flash drive specialist Fusion-io.
Other contributors to reliability include stripping out unnecessary components such as the decorative bezel that has traditionally carried the maker's branding.
"We have this design tenet in Open Compute that it should be a vanity-free device. If it's an ancillary feature then it should be removed," explained Frankovsky. "You actually have to spin your fans faster to pull air through that silly plastic bezel."
The open-fronted design also speeds accessibility when parts need changing. "We made everything accessible from the front so our technicians don't have to spend time at the back of the server where the hot air exhausts," said Frankovsky.
"We eliminate any of the proprietary management goop. We eliminate anything that leads our system administrators to any sort of management complexity.
"Eliminating a lot of those features helps deliver higher quality. And because Facebook and other adopters within the Open Compute community test for their specific requirements — there's a depth of testing you can do for a bespoke design."
Outside of the box
Not all of Facebook's initiatives will be applicable to other cloud datacenters.
For example, the fact that it operates what is in effect a single large application with hundreds of millions of concurrent users means that it doesn't waste cycles on virtualisation — hence the requirement for a dedicated database machine design.
It is willing to spend extra on waste elimination, even to the extent of designing special packaging for transporting machines to Luleå.
But in other ways, its willingness to think (literally) outside of the box could have significant impacts.
The Open Rack specification has even challenged the familiar 19" rack specification, which, points out the Open Compute Project, was defined in the 1950s and owes its genesis to the specifications of railroad signaling relays. It prefers a 21" format.
Most disruptive of all for traditional servers is its initiative this year to define a more modular, rack-level server architecture that will allow a datacenter operator to swap out components such as processors — even between arch processor chip rivals AMD and Intel — without having to replace the rest of the server.
Frankovsky calls it "a vision for a disaggregated rack." The object is to be able to reconfigure server hardware at the same rate of change as developers today expect in software:
"How can we complete sleds of equipment so that we can modify the hardware closer to the time of need? It's like a just-in-time approach but applied at the hardware level. We're really envisioning a next-generation rack fabric that allows this disaggregation to occur."
These concepts are clearly disruptive for established server makers and outsourcers. But if Frankovsky, who previously worked at Dell and Compaq, has his way, they are on their way to an enterprise datacenter near you:
"If you believe that the trend towards cloud computing is going to continue, you'd assume that those design patterns are going to be more and more applicable to a wider market."
Photo credit: courtesy of Facebook