AMD Radeon Instinct MI100 ‘CDNA GPU’ Alleged Performance Numbers Show Faster Than NVIDIA’s A100 in FP32 Compute, Impressive Performance / Value But Beat by Amps in AI & HPC Workloads
Alleged performance numbers and details of AMD’s next-generation CDNA GPU-based Radeon Instinct MI100 accelerator have been leaked by AdoredTV. In an exclusive post, AdoredTV covers performance benchmarks of the upcoming HPC GPU against NVIDIA’s Volta and Ampere GPUs.
AMD Radeon Instinct MI100 ‘CDNA’ GPU performance benchmarks leak, reportedly faster than NVIDIA’s Ampere A100 in FP32 Compute with better performance / value
AdoredTV claims that the slides they received are from the official AMD Radeon Instinct MI100 presentation. Those placed on the source appear to be modified versions of the original, but the details remain intact. In our previous post, we confirmed that the Radeon Instinct MI100 GPU was on the market in 2H 2020. The AdoredTV slides shed some more light on the launch plans and server configurations we could expect from AMD and its partners in 2020. & further.
AMD Radeon Instinct MI100 1U server specifications
First, AMD plans to unveil an HPC-specific server with a 2P design with dual AMD EPYC CPUs that can be based on the Rome or Milan generation. Each EPYC CPU connects to two Radeon Instinct MI100 accelerators via the 2nd generation Infinity Fabric interconnect. The four GPUs can deliver a sustained 136 TFLOPs of FP32 (SGEMM) output, indicating approximately 34 TFLOPs of FP32 processing power per GPU. Each Radeon Instinct MI100 GPU has a TDP of 300W.
Additional specifications include the total GPU PCIe bandwidth of 256 GB / s, which is made possible with the Gen 4 protocol. The combined memory bandwidth of the four GPUs is 4.9 TB / s, which means that AMD uses HBM2e DRAM dies (each GPU pumps out 1,225 TB / s bandwidth). The combined memory pool is 128 GB or 32 GB per GPU. This suggests that AMD still uses 4 HBM2 DRAM stacking technology and each stacking case contains 8 hi DRAM dies. It appears that XGMI is not offered on standard configurations and is limited to specialized 1U racks.
In terms of availability, the 1U server with AMD EPYC (Rome / Milan) HPC CPUs would launch by December 2020, while an Intel Xeon variant is expected to launch in February 2021.
AMD Radeon Instinct MI100 3U server specifications
The second 3U server is expected to launch in March 2021 and will offer even more powerful specifications such as 8 Radeon Instinct MI100 GPUs connected to two EPYC CPUs. Each group of four Instinct MI 100s is interconnected via an XGMI (100 GB / s bidirectional) and a quad bandwidth of 1.2 TB / s. The four Instinct accelerators total a total of 272 TFLOPs with FP32 computing power, 512 GB per second PCIe bandwidth, 9.8 TB / s HBM bandwidth, and 256 GB memory DRAM capacity. The rack has a nominal power consumption of 3 kW.
AMD Radeon Instinct Accelerators 2020
|Accelerator name||AMD Radeon Instinct MI6||AMD Radeon Instinct MI8||AMD Radeon Instinct MI25||AMD Radeon Instinct MI50||AMD Radeon Instinct MI60||AMD Radeon Instinct MI100|
|GPU architecture||Polaris 10||Fiji XT||Vega 10||Vega 20||Vega 20||Arcturus|
|GPU process node||14nm FinFET||28nm||14nm FinFET||7nm FinFET||7nm FinFET||7nm FinFET|
|GPU clock speed||1237 MHz||1000 MHz||1500 MHz||1725 MHz||1800 MHz||1334 MHz?|
|FP16 Compute||5.7 TFLOPs||8.2 TFLOPs||24.6 TFLOPs||26.5 TFLOPs||29.5 TFLOPs||~ 50 TFLOPs|
|FP32 Compute||5.7 TFLOPs||8.2 TFLOPs||12.3 TFLOPs||13.3 TFLOPs||14.7 TFLOPs||~ 25 TFLOPs|
|FP64 Compute||384 GFLOPs||512 GFLOPs||768 GFLOPs||6.6 TFLOPs||7.4 TFLOPs||~ 12.5 TFLOPs|
|VRAM||16 GB GDDR5||4 GB HBM1||16 GB HBM2||16 GB HBM2||32 GB HBM2||32 GB HBM2|
|Memory clock||1750 MHz||500 MHz||472 MHz||500 MHz||500 MHz||TBD|
|Memory bus||256-bit bus||4096-bit bus||2048-bit bus||4096-bit bus||4096-bit bus||4096-bit bus|
|Memory bandwidth||224 GB / s||512 GB / s||484 GB / s||1 TB / s||1 TB / s||TBD|
|Form factor||Single slot, full length||Double slot, half length||Double slot, full length||Double slot, full length||Double slot, full length||Double slot, full length|
|Cooling||Passive cooling||Passive cooling||Passive cooling||Passive cooling||Passive cooling||Passive cooling?|
~ 200W (test board)
AMD’s Radeon Instinct MI100 ‘CDNA GPU’ performance numbers, an FP32 powerhouse in the making?
In terms of performance, the AMD Radeon Instinct MI100 was compared to the NVIDIA Volta V100 and the NVIDIA Ampere A100 GPU accelerators. Interestingly, the slides list a 300W Ampere A100 accelerator, although no such configuration exists, meaning these slides are based on a supposed A100 configuration rather than an actual variant that comes in two flavors, the 400W- configuration in the SXM form factor and the 250W configuration provided in the PCIe form factor.
According to the benchmarks, the Radeon Instinct MI100 delivers about 13% better FP32 performance compared to the Ampere A100 and more than 2x performance improvement over the Volta V100 GPUs. The perf / value ratio is also compared to the MI100 which offers approximately 2.4x better value compared to the V100S and 50% better value than the Ampere A100. Performance scaling has also been shown to be almost linear, even with up to 32 GPU configurations in Resenet, which is quite impressive.
AMD Radeon Instinct MI100 vs NVIDIA’s Ampere A100 HPC Accelerator (Image Credits: AdoredTV):
That said, the slides also mention that AMD will provide much better performance and value in three specific segments, including Oil & Gas, Academia and HPC & Machine Learning. In the rest of the HPC workloads such as FP64 compute, AI and Data Analytics, NVIDIA will provide much superior performance with its A100 accelerator. NVIDIA also has the advantage of multi-instance GPU architecture over AMD. Performance statistics show 2.5x better FP64 performance, 2x better FP16 performance and twice the tensor performance thanks to the latest generation of Tensor cores on the Ampere A100 GPU.
One thing to emphasize is that AMD has not mentioned NVIDIA’s thrift numbers anywhere in the benchmarks. With thrift, NVIDIA’s Ampere A100 offers up to 156 TFLOPs of horsepower, although it seems like AMD just wanted to do a specific benchmark comparison versus the Ampere A100. Looks like the Radeon Instinct MI100 seems to be a decent HPC offering if performance and value figures hold up at launch.