Competition heats up for AI workloads: GPUs aka NVIDIA no longer the only option

Kurt Marko Profile picture for user kmarko May 16, 2018
Summary:
Deloitte believes that a slew of new chips may dramatically increase the use of ML. We think they are correct. GPUs aka Nvidia may face problems going forward.

Deloitte-GPU-FPGA
AI applications, namely those using parallelizable machine learning algorithms that use simultaneous calculations across thousands or millions of data points, have fueled the enormous demand for GPUs to accelerate calculations.

GPUs became so popular not because they were specifically designed for the most repetitive types of machine learning (ML) problems using deep learning, but because the architecture required to simultaneously process scene data through a set of graphics rendering algorithms is also inherently parallel.

Fortuitously, the characteristics of graphics rendering and machine learning algorithms, namely the need to process thousands or millions of data points through an algorithm that uses simple floating point calculations, are similar enough that GPUs have become the preferred engine for AI research and implementation.

How NVIDIA became the engine of AI

By GPUs, I really mean NVIDIA GPUs since the company has single-handedly created an AI renaissance by building hardware and software APIs (CUDA) that make whole categories of recursive algorithms feasible to calculate using real-world datasets, not toy samples. Over time, the company has bifurcated its design with one branch optimized for graphics rendering and gaming and another built for deep learning and other data center applications like databases. Every generation of the data center products using the Tesla brand incorporate features designed for deep learning, not rendering, such as:

  • TensorCores, namely programmable hardware units designed for matrix-multiply-and-accumulate that can deliver up to 125 Tensor TFLOPS for training and inference applications. According to NVIDIA developer documentation, “each Tensor Core provides a 4x4x4 matrix processing array which performs the operation D = A * B + C, where A, B, C and D are 4×4 matrices.” Such operations are the foundation of deep learning algorithms and implementing them in hardware greatly improves performance, particularly since the latest Tesla V100 includes 640 such units which act in parallel.
  • NVLink which is a dedicated, high speed bi-directional interconnect that with a maximum of 300 Gbits/sec on the Tesla V100. While NVLink is primarily used for inter-GPU communication, some processors such as the IBM Power 8+ and 9 embed NVLink circuits to allow direct access to GPU memory from the CPU complex.
  • Faster data paths using high-bandwidth memory (HBM) with Unified Memory and Address Translation that simplifies by providing a single virtual address space for CPU and GPU memory that simplifies application code and makes it easier to port applications to use GPUs.
  • Reduced precision calculations (16-bit floating point, 8-bit integer) that can significantly increase performance on problems that don't require high precision or high dynamic range such as deep learning using sensor data of limited accuracy.

These features combined with the highly parallel architecture of GPUs mean that they are often an order of magnitude or more faster than a conventional CPU when training deep learning models while providing several times the throughput when executing deep learning such models, i.e. inference.

NVIDIA benefitting from an exploding ML hardware market

By enabling new types of algorithms and making deep learning feasible for a broader range of applications, GPUs have ignited a virtuous cycle of software and hardware innovation. The proximate result has been a robust market for GPU hardware and related cloud services, i.e. GPU instances from AWS, Azure and Google Cloud. Deloitte estimates that GPUs were used in virtually all deep learning applications in 2016, with ML-driven sales of GPUs expected to grow 2.5-times, from 200,000 to 500,000 over the past two years.

Look no further than NVIDIA's financial statements for the amazing ramifications, where the company has seen revenue almost double and net income explode 5-times in only two years. Over the same period, it's stock price has similarly increased almost 5-fold, far outpacing the 33 percent gain in the S&P 500 Index. Results like this are bound to stoke competition, but befitting a dynamic, disruptive market like AI, it's not all coming at NVIDIA from traditional sources in the processor market, namely Intel and AMD.

Understanding why requires some background information which is nicely summarized in Deloitte’s report. According to its assessment (emphasis added),

By the end of 2018, over 25 percent of all chips used to accelerate machine learning in the data center will be FPGAs (field programmable gate arrays) and ASICs. … Deloitte Global expects that GPUs and CPUs in 2018 will still be the largest part of the ML chip market, measured by chip units, and will still be growing. But the new kinds of chips may dramatically increase the use of ML, enabling applications to use less power and at the same time become more responsive, flexible and capable, which is likely to expand the addressable market.

Note that the entire market for AI acceleration hardware is expanding, so these new forms are so much displacing GPUs, at least initially, but supplementing them. However, by being designed for specific types of AI algorithms such those encoded using the TensorFlow framework or particular categories of deep learning algorithms, Deloitte contends that the new generation of AI accelerators can significantly expand the market, particularly if  they can deliver order-of-magnitude  performance improvements comparable to those gained by moving from CPUs to GPUs (emphasis added),

If the various FPGA and ASIC solutions offer similar order-of-magnitude improvements in processing speed, efficiency, price or any combination thereof, a similar explosion in utility and adoption seems probable. That said, there are certain tasks that ML is good at and others where it has its limitations. These new chips are likely to allow companies to perform a given level of ML using less power at less cost. But on their own, they are not likely to give better or more accurate results.

My take

Deloitte is correct and while the overall market for AI acceleration hardware will continue to grow, GPUs can no longer count on gleaning all of it as new types of custom chips, both dedicated (ASICs) and reprogrammable (FPGAs) emerge as better options for certain workloads and deployment scenarios.

More in part 2 tomorrow.

Loading
A grey colored placeholder image