Despite the fact that Nvidia derives 60% of its revenue from computer graphics, its future belongs to AI and autonomous machines.
The centrality of graphics to Nvidia’s strategy as presented at its annual GTC conference was gone about 6,000 attendees ago.
Instead, the reason the event has outgrown the San Jose Convention Center is AI, the role Nvidia GPUs, software and now systems have played in fueling an AI research renaissance and accompanying ecosystem of products.
As Jensen Huang, CEO Nvidia proudly highlighted, GTC has become the world’s premier AI conference. where academics, scientists, business executives, entrepreneurs and venture capitalists mingle to see the latest magic Huang and his multi-billion-dollar research budget can deliver.
By not having a new GPU platform to announce, Nvidia undoubtedly disappointed hardware geeks and chip analysts. But the breadth of a record-setting two-and-half hour keynote that hit every layer of the AI ecosystem with diverse examples from medical imaging and product design to robotics and autonomous vehicles was breathtaking.
Feeding the entire AI food chain
GTC is known for being the event where Nvidia unveils its next-generation GPU products that feature significant improvements to the functional architecture, chip density and performance. The trend was broken this year because, with 2017’s release of the Volta platform, Nvidia pushed the technology to its limits with the largest chip physically possible using current 12 nm technology.
Slowing progress in shrinking semiconductor fabrication processes left it no headroom for dramatic improvements this year. Instead, Nvidia doubled the on-chip high-bandwidth memory to allow larger data sets to reside in local cache, providing a performance bump for huge deep learning models.
Ironically, doubling the on-chip memory is barely enough to satisfy the mushrooming complexity of AI models, meaning that there's a class of problems with working sets that far exceed the 32 GB of local memory.
Nvidia's only solution was to borrow a technique from supercomputers by building a system with a unified memory space that spans processor modules. The approach requires a high-speed, low-latency interconnection, which Nvidia had in the NVLink interface built into its recent GPUs, and a sophisticated switch to connect multiple chips. Called NVSwitch, the switch silicon comprises of a non-blocking crossbar design delivering an aggregate of 900GB/sec of bidirectional bandwidth. It is a significant technical achievement that allows for connecting 8 GPUs in a full mesh.
Nvidia uses two of the switch-GPU complexes in its new DGX-2, a behemoth that includes a pair of Intel Xeon Platinum (Skylake architecture) CPUs, 1.5 TB of system memory, 30-60 TB of fast NVMe SSDs and an 8 combination EDR Infiniband-100 Gbit Ethernet network interfaces.
The beefy 10kW box can replace several racks worth of equipment for compute-intensive workloads like deep learning model training and physical process simulations in industries like aerospace, drug development, oil and gas exploration and finance, which, even with the $400K list price tag, can end up providing a healthy ROI.
Hardware is only half the story
The DGX-2 is targeted at businesses and researchers. However Nvidia has also been busy working on software that is relevant across platforms and applications.
First up is an update to its TensorRT code optimizer for deep learning inference workloads, namely running trained models on production data both in the cloud and on edge devices. TensorRT 4 improves performance for low-precision 8- and 16-bit operations by up to 3x and now supports the most popular AI development frameworks including frameworks PyTorch, Caffe2, MxNet, CNTK and Chainer.
Nvidia next unveiled support for GPU nodes in Kubernetes clusters. When combined with its GPU Cloud container library of the most widely used ML, deep learning and data science frameworks, Kubernetes support dramatically simplifies the deployment of AI models in a hybrid cloud environment. That's regardless of whether the system uses Kubernetes on-premise or via a managed container service like AWS EKS or Google Cloud GKE.
Highlighting the opportunities, Nvidia showed a compelling demo at the GTC keynote in which on-premise nodes were disconnected from a cluster under load, which caused the Kubernetes controller to spin up four new nodes on AWS and then load balance across them, all in a matter of seconds.
Containers are a perfect environment for inherently distributed ML/DL inferencing workloads and many model training situations and should become the preferred environment for anything that doesn't need to be locally executed on a remote device.
Indeed for ML/DL at the edge, Tensor RT4 provides the performance kick needed to accommodate embedded devices, with one caveat: it requires hardware support in the form of a GPU or GPU-like accelerator.
Nvidia addressed this shortcoming by providing details regarding a previously announced partnership with ARM to integrate the open-source Nvidia Deep Learning Accelerator (NVDLA) architecture into ARM’s Project Trillium platform for machine learning.
The ARM-Nvidia combination will enable AI-accelerated SoCs that include hardware acceleration features similar to those provided by the Neural Engine on Apple's A11 chip that powers the FaceID feature recognition software in the iPhone X.
Open sourcing NVDLA allows designers to take the base IP provided by Arm and customize it to suit a wide variety of devices and applications, whether small, battery-powered home appliances or compute-intensive industrial robots and drones.
This year’s Nvidia Inception Award finalists included several startups employing deep learning to power autonomous machines, such as this example of a pint-sized quadruped robot from Ghost Robotics. Indeed, I expect to see a future version of Raspberry Pi or similar product targeted to hobbyists and educators needing a cheap, easy to use platform ready for tinkering, experimentation and learning.
The next frontier: closed-loop simulation and AI-building-AI
A significant problem with developing autonomous machines that interact with humans is accumulating enough time in real-world conditions to adequately train the AI models.
As we recently saw with the Uber incident, essentially running experiments over millions of miles and thousands of hours can have tragic consequences when systems fail.
In his keynote, Huang noted that autonomous vehicle control systems will require billions of miles before they have enough data to handle the wide variety of conditions and situations one encounters on the road.
Such massive experimentation, which still amounts to less than 1 percent of miles driven each month in the U.S., is impossible in a reasonable amount of time, even with a vast test fleet.
To address this gap, Nvidia is developing a closed-loop simulation system, DRIVE Constellation, in which one system, DRIVE Sim, uses models to simulate input to a vehicles many sensors, including cameras, radar and Lidar, which then feeds the vehicle control system which reacts and adapts as if it were encountering the conditions in real life.
The simulated data is used to refine the vehicle’s deep learning models, which are then used with another round of simulated conditions.
Such a closed-loop system not only enables dramatically accelerating the AI training process, but incorporating corner cases such as driving into the glaring sun, on wet roads with degraded lane markings, or with unpredictable obstructions such as potholes and darting pedestrians that are rarely encountered in daily driving.
Chip simulations enable more powerful processors, which then allow more sophisticated simulations, adding simulation to the autonomous vehicle development process creates a virtuous cycle that should dramatically accelerate technology development.
It's hard not to be captivated and inspired by Huang's enthusiasm and optimism as the company tirelessly works to advance the state of AI technology and push it into every aspect of business and society.
However, as the Uber accident dramatically illustrates, there are real and significant dangers of pushing AI too rapidly into areas where machines interact with or replace humans. One event doesn’t make a trend, but it's a cautionary signal when AI is being used for such life-critical tasks as driving, medicine, mobile, self-directed robots.
Indeed, whether its AI-powered robots, cars and business processes, or Internet-enabled data collection, aggregation and monetization, recent events underscore how technology continues to move much faster than our individual or institutional abilities to adapt.
GTC was awash in techno-utopianism, and I was encouraged by Huang's support for greater testing and safety protocols (including Nvidia's suspension of its self-driving test program until details from the Uber accident are understood) along with the sober, nuanced discussion I heard at a panel on the clinical integration of AI in medicine.
Unfortunately, there's still too little attention paid to the integration of AI into society and how to mitigate the inevitable collateral damage to individuals and affected workforces. Jon Reed’s column on AI and humanity illustrates a far too common occurrence: AI optimizing humans into smart machines, instead of humans imbuing smart machines with human traits.
After viewing the panoply of products and rapid pace of progress on display at GTC, it's clear that AI and robots promise to address our every material need, but the more profound question is whether they will erode and sap our humanity, namely the things that give meaning, purpose, hope and inspiration.
It's a question I haven't fully explored, much less answered, but surely something that I and others will revisit in the coming years as autonomous machines take over more physical and intellectual tasks.