The [Intel] empire strikes back to reclaim the performance crown - and impressed Oracle is

Profile picture for user kmarko By Kurt Marko April 7, 2021
Summary:
Intel's announcement of its next-generation Xeon included some previously unpublished benchmarks and endorsements from every major cloud provider, including Oracle.

star wars

Lack of competition breeds complacence. Intel, of all companies, whose co-founder penned a business book entitled Only the Paranoid Survive, should know this. Unfortunately for Andy Grove's successors, vanquishing AMD to years in the wilderness, struggling to survive, didn't keep Intel from getting sloppy. Consequently, the company faces multiple threats from both its traditional rivals and new CPU architectures. Lisa Su resurrected AMD by sticking to a long-term plan that yielded an arguably superior x86 architecture that has been adopted by all the hyperscale cloud operators, major PC vendors and server manufacturers. 

AMD is now back with a vengeance and capturing more than 20% of the x86 market, Meanwhile, Arm has expanded from its mobile base with Apple's Arm-based M1 SoC outclassing Intel in PC performance and AWS, Ampere and possibly Microsoft building Arm chips for cloud infrastructure. Nonetheless, things are looking up at Intel with old hand and ex-CPU designer Pat Gelsinger back to his semiconductor roots as CEO, announcing a major fab expansion and a new generation of the company's cash cow Xeon server CPUs in quick succession. 

The announcement comes at a pivotal moment for Intel, with revenue in its Data Center Group (DCG) coming off an explosion in cloud spending earlier in 2020 as spending by enterprises, governments and telecom operators continues a long-term slide. Indeed, Intel warned in its Q4 2020 earnings announcement that data center revenue would slide 25% in Q1 from a year earlier as cloud operators remain in a "digestion cycle."

When evolutionary is good enough

Intel first detailed Ice Lake's internals and new Sunny Cover microarchitecture at last summer's Hot Chips event where it showed a 1- and 2-socket version of the chip built using a version of its 10nm FinFet (but presumably not SuperFin) process. Aside from the 10nm process node, Ice Lake is notable for upgrading the micro-architecture that includes:

  • Improved branch prediction with higher capacity instruction windows.
  • Wider allocation and execution resources.
  • Enhancements in single-thread execution and buffer size.
  • Larger caches.
  • Four new cryptography instructions. 
  • Two new compression-decompression instructions.

km
New 3rd Gen Intel Xeon Scalable Processor (Codename: Ice Lake-SP) via Irma Esmer Papazian (km)

Collectively, the architectural improvements boost raw integer performance (IPC) by 18% and the speed of various cryptographic algorithms by between 1.5 and 8-times. The Ice Lake SoC also includes a new I/O and memory design with four memory controllers, Intel's Total Memory Encryption (TME) that improves memory bandwidth by 40% or more. Improvements to its Speed Select technology responds faster to changing demands and allows customizing power use and frequencies for each core to better match CPU performance and efficiency for each workload.

As has been typical of recent Xeon launches, generation 3 is available in an enormous number of SKUs:

km2

That's more than 50 stock models plus an undisclosed number of variants customized for the top-10 hyperscalers eligible for bespoke products built to spec.

Although Intel is only now releasing Ice Lake products for general consumption, its preferred cloud customers — which recall are its only significant source of growth — have been receiving chips for months. According to its SVP of the Xeon and Memory Group, as of March, Intel had shipped 115,000 Ice Lake units to 30 customers

Oracle sees advantages for HPC workloads

Although AMD has successfully won some cloud business based on the Epyc processor's superior price-performance, Intel hopes to retain its advantage with high-end database and HPC workloads. As expected for a major product, Intel's unveiling featured a host of customers announcing their support for Ice Lake in new servers and cloud services while praising its performance. Most of these are familiar with the drill, having just gone through it last month when AMD introduced its third-generation Epyc Milan processors

Oracle is one of the Ice Lake fans, saying its new instances are up to 42% faster on HPC workloads than the prior-generation X7 Skylake-based instances. According to Taylor Newill, head of HPC at Oracle, Ice Lake offers the best performance available for several popular HPC applications, including video encoding, EDA, CFD (computational fluid dynamics), crash simulation, data science and ML inference. Better still for Oracle customers, the boost is free since the company is keeping its price for HPC-optimized bare metal instances at $0.075/core-hour. When asked why Oracle stuck with Intel instead of shifting to AMD's third-generation Epyc for its HPC bare metal servers, Newill said that after two months of benchmarking, it found Ice Lake beat Milan for almost all workloads. The only area where AMD's 64-core parts with integrated (using a separate in-package chiplet) memory and I/O controllers came out on top were a few workloads requiring significant data bandwidth like combustion simulation.

Oracle's HPC infrastructure uses a dedicated 100 GbE backend RDMA over converged Ethernet to connect up to 20,000 cores in a cluster. Newill says customers are migrating HPC workloads to OCI to exploit its high-performance RoCE-based cluster network with sub-10 microsecond intra-node latency under most conditions. He notes that eliminating networking as a bottleneck allows HPC users to scale performance by distributing the load across thousands of machines using infrastructure that would be extremely expensive to reproduce in a private data center.

As Intel stressed during the unveiling, a design focus of Ice Lake was speeding up AI and HPC workloads. It claims the new chips deliver about 50% faster performance than an equivalent generation-2 part across a range of HPC workloads. AI inferencing and ML workloads see a similar 50% speedup due to improvements in the DLBoost instructions. Intel's benchmark comparisons for traditional workloads like server virtualization, container microservices, database analytics and Java code show similar inter-generational gains of 50-70%.

With the new hardware, Oracle is also taking a page from Google Cloud's custom machine sizes by introducing the ability to customize the size of bare metal instances to suit the workload, for example, an odd-sized number of cores with an arbitrary amount of memory. Indeed, Google, along with AWS, Azure, Alibaba, Baidu, IBM and every other significant cloud provider plus several major telecoms have joined Oracle in adopting the new Ice Lake CPUs.

My take

Intel's third-generation Xeon is a worthy successor by providing a decent performance boost over the prior generation, security and acceleration features designed for today's cloud-based workloads and the balanced performance and broad application support Intel used to displace proprietary RISC processors as the server standard. With Ice Lake, Intel addresses many shortcomings that attracted cloud operators, online service providers and HPC users to AMD, NVIDIA GPUs and FPGAs and its technical briefing illustrates that the company is intent on presenting use cases and benchmarks that highlight its superiority for deep learning, ML, HPC and database workloads.

Ice Lake lags AMD Milan and Arm (AWS Graviton2 and Ampere Altra) in core count, primarily due to Intel still being stuck on a 10nm process node versus the TSMC 7nm technology used by its competitors. Nevertheless, it makes up for it with a host of workload-specific acceleration and security features while staying within a similar power budget. The resulting competition provides a broad range of processing options when building cloud, edge or carrier network infrastructure, but requires technically sophisticated due diligence to assess the best price-performance fit for each need. As Oracle demonstrated, there are many application categories where Intel remains the performance champ, but server and cloud buyers can no longer assume that Intel Inside is the best option.

While Ice Lake doesn't unequivocally vanquish the competition, its architectural additions keep Intel from sliding into irrelevance and buy time for Gelsinger to clean up Intel's manufacturing mess. Given Gelsinger's background, he well understands that making up for substandard process technology through clever design tricks isn't a sustainable strategy.