Intel, IBM and NVIDIA highlight growing stratification of application hardware

Kurt Marko Profile picture for user kmarko April 6, 2016
The worlds of hardware and software are colliding and the development of leading edge applications will require greater collaboration between designers in each arena

Circuit board © jelenaaloskina -
Major announcements by two giants in the components business, key arms merchants to the entire IT food chain, epitomized the divergent paths data center system, application and hardware design is taking.

Within days of each other, the juxtaposition of Intel's Cloud Day and Xeon product refresh with NVIDIA's annual GTC developer conference, its premier forum for strategy and product announcements, only serves to highlight key trends in software architecture and hardware optimization for emergent applications. Whether for mature categories like business and scientific data analytics or nascent areas like robotics, autonomous vehicles, image/pattern recognition and VR, it's clear that the days of homogeneous server farms with racks and racks of largely identical systems are over.

Indeed, this past week's events highlight a continuation of the trend towards greater hardware and software specialization for specific workloads. Just as the era of mainframe hegemony, where IT was built upon a one-size-fits-all platform, was largely replaced by cookie-cutter x86 systems powering client-server and Web applications, today a new generation of hardware accelerators designed for various categories of applications and workloads are displacing the do-everything, plain-vanilla x86 system.

The exemplar of this move to purpose-built hardware is the GPU, which has been repurposed from its roots in computer graphics into an accelerator for all manner of parallelizable problems. A better acronym would be the PPU.

The catalyst for today's GPU renaissance are a class of algorithms known as neural networks, modeled after the brain's synaptic structure, that form the heart of machine or deep learning software and have proven unmatched at solving all manner of pattern matching, recognition and data inference problems including image classification and tagging, robotics and autonomous driving and pharmaceutical and genomic analysis. Problems where the data are easily decomposed into elements that can be independently analyzed, results recombined, and amenable to the technique of recursive algorithmic 'learning', can use GPUs to converge on a solution in much less time than serial, brute force approaches.

While neural networks have pioneered the use of GPU acceleration, they're not the only problem domain. Companies like MapD, Blazegraph and GPUdb use GPUs and their very fast local memory to greatly accelerate database queries, analytics and data visualization, while IBM has done work on accelerating Apache Spark for distributed data analytics of the type often done with Hadoop.

IBM and the open source, multi-vendor OpenPOWER organization it founded as a means of restoring the POWER CPU architecture to relevance piggy-backed off of GTC to update its progress, highlight new products, update its roadmap and recruit new members. A key selling point of the OpenPOWER argument is that Moore's Law has turned into something more like Moore's history lesson as the pace of CPU performance improvements no longer follows the proscribed rate, causing Intel to slow the cadence of semiconductor process improvements and add another tock (microarchitecture improvement) to its traditional tick-tock product cycle.

IBM and its OpenPOWER fellow travelers from places like Canonical, Rackspace, Redis Labs and ... NVIDIA contend that the only way to return to historical rates of progress are through the use of application- and domain-specific hardware accelerators. To this end, OpenPOWER incorporates some unique I/O interface features like CAPI (direct, shared access to CPU memory) and on-chip NVLink (extremely fast GPU-to-GPU and, with OpenPOWER, GPU-to-CPU communication) that it hopes will enable an accelerator ecosystem that not only includes GPUs, but storage accelerators, in-memory database caches and application-specific devices for problems spanning genomics to oil and gas sample analysis.

An obvious inference is that Intel represents and is defending the past, an era of its x86 hardware dominating 90% of the data center market, while NVIDIA and the OpenPOWER ecosystem are the future. Indeed, in transforming itself from a niche specialist in consumer gaming to the thought- and technology-leader for a new era of application design with ambitious new products like the massive P100 GPU unveiled at GTC, NVIDIA is betting the company on GPU-accelerated computing.

Although superficially true, this oversimplifies the situation since Intel is anything but a static, sclerotic monolith, while it may turn out that NVIDIA has effectively traded one niche (gaming) for another (deep learning and AI), albeit one with much more up side.

As Intel's recent Xeon announcement made clear, the company is not standing still and recognizes the changing needs of hyperscale data center workloads.

Intel is designing for a 2020 world where it expects 85% of applications to be running on some form of cloud service, whether public or private, and continues to add features to its Xeon product line designed to improve performance of shared, virtualized workloads, scalability and bandwidth of distributed systems and to facilitate the automation of software/service deployment and management. Its new Broadwell v4 lineup includes two major product families (E5 and E7) targeting scale-out (cloud) and scale-up (large memory databases) workloads and that collectively includes a wide variety of product configurations with various speeds, core counts and thermal envelopes. It's all part of Intel's strategy to give customers little reason to look elsewhere by delivering a broad performance range without leaving the familiar x86 architecture.

However, if none of the standard products suit your needs and you're a cloud giant like Facebook, the company is happy to build a custom Xeon variant tailored to your needs. Longer term, Intel has its own workload acceleration strategy using FPGAs (from the Altera acquisition) and its own processor designed for parallelizable workloads, Xeon Phi.

My take

The synchronicity of announcements from three major suppliers of IT hardware provides a good opportunity to reflect on broader trends in system design.

The key insight is that the worlds of hardware and software are colliding and that the development of leading edge applications will require greater collaboration between designers in each arena, with a concomitant increase in cross-domain understanding. Hardware people must better understand the needs of software algorithms and systems and vice versa.

For enterprises, this presents both an opportunity and a threat. Those that persist in riding the safe and familiar performance train provided by Moore's Law technology improvements over the last few decades will increasingly find themselves at a competitive disadvantage to competitors that exploit innovative new system designs that provide substantial, measurable and value-producing performance and cost advantages. The major cloud providers, which are both driving and serving as the near-term targets of many of these platform architectural changes — and that, due to their size, have the most to gain or lose — already realize this.

IT organizations in industries with demanding workloads like oil and gas, big pharma, Wall Street and aerospace are close behind, if not leading the transition in certain niches. However it behooves those with more quotidian needs to not ignore the trend since even a 20-30% price/performance advantage is significant when you're talking about equipment that forms the foundation of one's business processes and services.

Image credit - Digital hardware closeup. Microchips and condensers assembly on the circuit board macro - @fotolia

Disclosure - The author's travel and expense for attending related events was covered by the companies mentioned.

Read more on:
A grey colored placeholder image