NVIDIA has refined and evangelized a clear strategy of AI first as the next era of computing and used historical data to rationalize its contention that GPUs have conquered the limits of a sputtering Moore's Law, yielding what some call Huang's Law. In turn, NVIDIA has been rewarded with revenue more than doubling over the past three years while its stock price has multiplied tenfold.
For all practical purposes, Intel stood idly by, ceding the most important new technology market in a generation to an unlikely rival that was once merely a niche component supplier to Intel's PC partners.
The days of acquiescence are over. Intel has signaled that it doesn't intend repeating its mistake in mobile and letting the AI chip market get away. As I discussed in my last column, Intel acknowledges that AI is a critical area of growth in its data center business, with an estimated 30 percent CAGR over the next 5 years. That translates into an additional $7 billion in total projected annual chip sales. Cognizant of the high stakes, Intel made its most serious, frontal assault on the AI chip market to date at its recent Data-Centric Innovation Summit, including a surprise announcement that AI-related workloads were responsible for $1 billion in revenue last year.
More to the AI chip market than GPUs
NVIDIA's expertly crafted narrative that GPUs are behind the current wave of AI progress is like all tech legends: enough truth to be believable but omitting key facts that round out the story.
As Intel explains, the biggest sin of omission is the conflation of AI software with GPU hardware, an oversimplification that glosses over the messy reality that in AI workloads, one size doesn't fit all. As the head of Intel's AI products group, Naveen Rao explains in this blog accompanying his presentation at the DCI Summit, recent successes in AI rest on three pillars: improved and maturing software tools, better hardware (in many forms, not just GPUs) and a thriving research and development ecosystems, often open source. He goes on to write (emphasis added),
Customers are discovering that there is no single “best” piece of hardware to run the wide variety of AI applications, because there’s no single type of AI. The constraints of the application dictate the capabilities of the required hardware from data center to edge to device, and this reiterates the need for a more diverse hardware portfolio. Covering a full variety of applications wherever they occur will deliver the highest return for Intel’s customers.
Increasingly complex and labyrinthine deep learning models that require massively parallel processing capability to train and optimize get the headlines. However, as Rao suggests, that’s only part of the AI software lifecycle.
As this post from an AI executive at Lenovo illustrates, the path from idea to implementation is long, involving as much work in planning, data collection, and validation as it does in model development and implementation. Furthermore, as Rao highlighted in his DCI Summit presentation, the execution environment for AI is diverse, encompassing small devices like IoT sensors and phones, autonomous vehicles, remote appliances on the network edge and distributed systems in cloud data centers.
The cornerstone of Intel’s AI strategy is breadth, with hardware products for each of the runtime environments outlined above and investments in a variety of mostly open source projects intended to improve software performance and developer productivity.
Rao’s presentation at the Intel’s AI developer conference reiterates its three-legged AI mantra: tools (software), hardware and community (developer ecosystem). Intel’s AI product portfolio reflects its strategy to have the hardware for every need, whether on endpoints, edge systems or data centers. The current lineup is a mix of home-grown and acquired (Altera, Mobileye, Movidius, Nervana) technology and includes:
- Low-power Atom processors with Movidius vision processing units (VPU) for IoT and mobile devices
- Mobileye Advanced Driver Assistance Systems (ADAS) and EyeQ SoCs for autonomous vehicles
- Desktop and mobile x86 processors with onboard GPUs, GNA neural net co-processors and Inference Engine acceleration
- Altera-derived FPGAs for both low-power, embedded systems (Arria series) and high-performance data center applications (Stratix series) to accelerate deep learning model execution (inference) particularly for real-time data evaluation.
- Nervana Neural Network processors with the performance and flexibility of GPUs to tackle the same type of model training and inferencing workloads.
- Xeon Scalable family x86 processors for general purpose workloads, light-duty AI model training and production-scale model inference.
Product news sparse, but telling
Intel's goal at the DCI Summit was to lay out a strategic narrative and gain mindshare. Hence a dearth of product news was to be expected. Aside from the aforementioned billion-dollar in AI revenue tidbit, the company discussed new and updated AI-related technologies, notably:
- DL Boost x86 instructions to accelerate deep learning calculations via support for low-precision 16-bit floating point operations using the bfloat16 standard. DL Boost works with the following extension to its AVX-512 instructions.
- Vector Neural Network Instruction (VNNI) set, an extension to AVX-512 that further accelerate neural network training with support for 8-bit multiplications with 32-bit accumulations. DL Boost and VNNI will be included in the forthcoming Cooper Lake Xeon processor, the 2019 successor to this year's Cascade Lake iteration.
- The nGraph compiler to optimize AI code written in one of the popular development frameworks for deployment on different hardware platforms. nGraph currently supports models developed TensorFlow, MXNet, and neon with secondary support for CNTK, PyTorch, and Caffe2 through the ONNX exchange format. It can target code to Xeon, Nervana, Movidius, Stratix FPGA and GPUs.
- An update to the MKL-DNN math library to improve matrix multiplication performance.
- A tease for the forthcoming (sometime in 2019) Nervana NNP L-1000 neural processor designed to take on GPUs for data center deep learning workloads.
Intel's head data center CPU architect, Sailesh Kottapalli, also shared insights into current and future design considerations to boost processor performance. Aside from the improvements to matrix operations and support for lower and mixed precision operations mentioned above, he noted that Xeon processors would continue to increase caches at all levels, expand memory bandwidth and reduce latency in succeeding generations. Each of these will improve performance for AI and allow general purpose CPUs to be a viable alternative to GPUs for many workloads.
Intel’s AI-related acquisitions, notably Nervana (co-founded by Rao), Movidius, Mobileye and most recently, Vertex.ai are gelling into a coherent strategy that addresses the gamut of AI execution environments.
Such an expansive portfolio is necessary given the diversity of AI workloads and the resource limitations of devices running AI software. However, Intel’s strategy is spread across several processor architectures and instruction sets, making its work on software libraries and optimizations like the nGraph compiler all the more critical.
Such programmatic heterogeneity is a complexity NVIDIA doesn’t share, with its lineup of large (Volta), small (Jetson) and vehicular (Xavier) GPUs sharing the CUDA programming platform and associated libraries. Despite its early-mover advantage, I suspect that CUDA is a shrinking moat as developers adopt higher-level AI frameworks such as TensorFlow that can target other platforms, whether via an intermediary compiler like nGraph or directly as in Google’s TPU processors.
Similarly, NVIDIA’s performance advantage with neural network-optimized GPUs using Tensor Cores could be narrowed enough via the next-generation Xeon's new features to make x86 systems good enough for even more and perhaps most data center (if not R&D) workloads. You can bet that Intel's Xeon base will expect Intel to structure pricing in line with their tried and tested commodification strategy.
It will be an exciting battle to watch as the CPU empire aka Intel strikes back at the GPU rebels aka NVIDIA. If Intel’s claim of a billion dollars in AI-derived revenue last year is remotely true, it means the company is already winning as much or more business in this nascent, but mushrooming market as its chief rival.
If nothing else, the DCI Summit demonstrated that Intel has the slideware to depict a cohesive AI strategy. Now Intel needs to complete the critical work of hardware and software engineering that stitches the pieces tightly together and presents the market with an irresistible smorgasbord from which to dine.
Regardless of the market fascination with NVIDIA, it would be a foolish person who writes off Intel anytime soon.