What exactly is a supercomputer? There is a simple and inadequate answer that goes like this: Supercomputers are designed to work on types of problems whose primary constraint is calculation speed.
Commercial computers, mainframes, and large clusters of servers, on the other hand, deal with problems constrained by input/output and which demand reliability above all else.
So while supercomputers are ideal for performing complex calculations on a large data set, servers are are well suited for performing thousands upon thousands of concurrent transactions.
I've written about the main supercomputers in the world today, and what they are good for. One of the most famous ones, Titan, was decommissioned on August 2019, disassembled and sold for scrap. Titan was a new generation of supercomputer with a revolutionary architecture that combined AMD 16-core Opteron CPUs, and NVIDIA Kepler accelerated processors known as GPUs, which tackled computationally intensive math problems while the CPUs efficiently directed tasks.
When the system debuted at No. 1 in 2012, Titan delivered ten times the performance of Jaguar, reaching peak performance of 27 petaflops. But supercomputers wait for no one. OLCF had to make room for the 2021 exascale system, Frontier. Quadrillion is a big word, but exascale is a bigger one - it means passing the 1000 petaflop speed.
Let's review. Flop is a floating-point double-precision operations per second.
- 1970 Megaflop - 1,000,000 flops. (Gigaflop - 1,000,000,000 flops)
- 1996 Teraflop - 1,000,000,000,000 flops
- 2007 Petaflop - 1,000,000,000,000,000 flops
- 2021(?) Exaflop - 1,000,000,000,000,000,000 flops
Supercomputers are fast, but why should you care?
So to put this in perspective, Titan was junked at 27 petaflops to make room for a new computer that will run, theoretically, at 1.5 exaflops. In less than ten years, the top speed will be 1500 - 2000 times faster. But that isn't the whole story. Speed is one measure of a supercomputer, but there are other considerations: reliability, uptime, energy cost, cooling infrastructure, and more.
Why should you care about supercomputers (or as is fashionable now to call them HPC, High-Performance Computing)? Their whole reason for existence has been primarily to support nuclear weapons research. The whole point for ASCI Red was to deal with the Nuclear Test Ban Treaty of 1992. If you can't blow them up, maybe you can simulate them? That was - and still is - the main reason.
However, some of the brightest people in the world work on these machines, and the governments (the U.S. is not the only supercomputer power) realized that perhaps the massive capability of these machines could be put to use to solve problems in science medicine, climate research and a host of other not-fully- understood problems.
So a few years ago, the (U.S.) government commissioned a pair of these, Summit and Sierra, one for classified use, and the other open to researchers. In fact, in 2018, Summit, the open computer, was rated as the #1 fastest in the world.
Can High Performance Computing aid the fight against COVID?
Biology, genomics, climate change, astronomy - all of these topics were given time on the HPC to solve a problem at scale and speed they could not imagine. Now, in 2020, obviously virology, and specifically, COVID-19, are getting the scrutiny of these big data crunchers.
But how exactly do these machines offer advantages over just lashing together a few hundred servers in the cloud? The answers are: they are a different kind of beast from a commodity Linux server.
One thing to notice in the list of the petaFlop computers is a shift in the industry from predominantly IBM to now HPE (Hewlett Packard Enterprise). Part of the reason is the acquisition of Cray by HPE in 2019, and the integration of Cray technology with HPE's abundant manufacturing capacities, its Apollo data server line, and hundreds of other specialties. Let's focus on what HPE is doing with COVID-19, and then we can wrap up with the nitty-gritty about jut why HPC in the superior platform for this kind of research
Urgent scientific problems
The COVID-19 High Performance Computing Consortium, (HPCC) a White House endorsed network of academic and industry leaders, federal agencies, and national laboratories - all persevering for the same cause: to stop the virus and save lives.
The Nationals Labs, (Lawrence Livermore, Oak Ridge, Argonne, Los Alamos, and others) Department of Energy, NIH and others are opening their supercomputers and staff, providing supercomputing software and applications expertise free of charge. A sampling of what's underway:
- COVID-19 Open Research Data Set and WhiteHouse "Kaggle" Challenge
- HPE AI experts collaborating to solve the new machine-readable COVID-19 open research dataset
- Cryo-EM Research: High-resolution structural biology - imaging and analysis of proteins and macromolecular complexes at resolutions that allow one to conclude structural mechanism. This has long been the province of techniques like X-ray crystallography and NMR. Now it possible to achieve these types of resolutions by cryo-electron microscopy, which does not require crystallization, and can be applied to complexes large and small alike.
- HPE is providing tightly-integrated, scalable, high-capacity storage and compute structure through HPE Apollo, to help deliver on the promise of Cryo-EM.
- Enabling researchers worldwide to accelerate drug discovery for COVID-19 with the open-source release of PharML.Bind, a software A.I. program that can be used in supercomputers and laptops. Utilizing deep neural networks and big data PharML.Bind correlates experimentally-derived drug affinities and protein-ligand X-ray structures to create novel predictions.
- Many HPE HPC systems are being used for COVID-19 research around the globe at customers and in the cloud. For example, researchers are working on computer models and simulations to understand the virus's structure better. They need a vast system that can handle that type of research.
Applying HPC to COVID-19
I had the opportunity to speak with Jim Brase of Lawrence Livermore Labs, who is spearheading the effort to apply their HPC technology to discover therapeutics and vaccines for COVID-19. He said something that resonated. I asked him why supercomputers were needed to find these cures as opposed to COTS (Commercial Off the Shelf) computers, such as 100 or 200 clusters of servers on AWS.
I got a two-part answer. Different computers are built to do different things, and servers (or even mainframes, there are still lots of them around) are designed to do as many things as possible in parallel. HPC is more consent to do a single massive job as fast as possible. Trillions of trillions of calculations.
But this got me thinking that, for the kinds of machine learning commercial organizations we are doing today, HPC isn't needed. First of all, a large part of the process is data preparation and supervised learning. But when I think about transfer learning and autotuning at exascale, that's a different kettle of fish. I wonder, though, if exascale can crack the problem of Explainability. The black-box problem is autotuning, essentially it is an optimization problem in finding the right recipe of parameters for it to solve a given problem optimally.
My real question is, for enterprise customers, and that is our main audience at diginomica, where does HPC come in for A.I. beyond the special cases like designing aircraft or predicting weather or -omics. To be honest, most A.I. we see is used for selling stuff. If it gets some things wrong, it's still a success. Other applications, like autonomous cars, have to be right.
Jim corrected my thinking. Modern HPC machines typically have both CPUs and GPUs. While the GPU's fly through the floating-point calculations, the CPU takes care of the workload management, optimization of resources, and .even, data management. Just because you have 200petaFlops of power, or 1000 or 2000, doesn't mean you don't have to organize your data for processing.
His second answer was what I like to call a brilliant flash of the obvious. Something you just never thought about but should have. Historically, scientists in laboratories tried to figure out which molecule reacted to which molecule, a laborious process that could take years, if ever to solve. The HPC machines available now for scientific study can thrash through zillions of combinations and, potentially, find candidates that look promising. In a day, or a week, or a month, Then, the model can be given to a laboratory to go through their usually bench testing. He referred to this as computer first, experiment second.
I probed a little about the type of "A.I." they were using. Two types he mentioned were supervised learning models on existing data sets (presumably to get a starting point of feature engineering), and the other was Generative Neural Nets. We ran out of time, but maybe I'll get the chance to dig into that a little more. For now, this quote from the University of Texas resonates:
At a smaller scale, but no less significant, is the announcement last week that TACC (Texas Advanced Computer Center) would also join the HPCC giving U.S. scientists currently working with TACC to create a massive computer model of the virus - 200 million atoms - that they expect will give insight into exactly how it infects the human body. This work builds on other past models also prepared with TACC, including atomic-scale influenza virus simulations.
For those of us who build predictive models and deep learning models to figure who will switch phone carriers or who might be interested prosthetic feet, simulations at these levels are incredibly complex. They are massive in size, meaning they are just not feasible without high-power, high-speed systems.
Supercomputing resources such as those TACC provides enable researchers to run more detailed calculations and experiments at even faster rates. As reported by Medical Xpress:
"It's a brilliant test of our methods and our abilities to adapt to new data and to get this up and running right off the fly," said Rommie Amaro, professor at the University of California, San Diego and the researcher leading efforts to create the full model of COVID-19.
SARS-CoV-2 spike protein of the coronavirus was simulated by the Romaro Lab of U.C. San Diego on the NSF-funded Frontera supercomputer of TACC at UT Austin. The visualization shows the main viral protein involved in host-cell coronavirus infection.
Amaro says that developing this full model will enable researchers to understand how the virus interacts with other important cellular components and functions, which she hopes will expose its vulnerabilities. Hopefully, this new information will aid in the design of novel drugs and vaccines. Long-term, researchers will better understand how the coronavirus subcategory of viruses (which includes SARS and MERS) infects in general. This assumes, of course, their underlying understanding of how thigs work is sound, which I still have my doubts.
The analysts were not thrilled with the HPE acquisition of Cray, but it seems to be paying off in the HPC world. Whether supercomputers lead to a COVID-19 breakthrough remains to be seen, but it's clear they have the potential to play a unique role.