NVIDIA has just Posted the first real performance figures of his Ampere A100 GPU and the results are insane. The company has broken a total of 16 performance records in AI-specific benchmarks and also beat its main competitors in the specific machine learning performance category by a huge margin.
NVIDIA Ampere A100 GPU breaks 16 AI world records, up to 4.2x faster than Volta V100
The results come from MLPerf, an industry benchmarking group founded in 2018 with a focus purely on Machine Learning performance. The benchmark suite consists of a total of eight tests, and NVIDIA has posted a total of records with record training rates.
This is the third consecutive and strongest show for NVIDIA in training tests of MLPerf, an industry benchmarking group formed in May 2018. NVIDIA has set six records in the first MLPerf training benchmarks in December 2018 and eight in July 2019.
NVIDIA was the only company to market commercially available products for all tests. Most other submissions used the sample category for products that may be unavailable for several months or the research category for products that are expected to be unavailable for some time.
NVIDIA too reported eight additional records with its DGX SuperPOD system, a huge cluster of DGX A100 HPC systems connected together via HDR InfiniBand. The DGX SuperPod consists of 140 DGX A100 systems with a total of 1,120 NVIDIA Ampere A100 GPUs, 170 Mellanox Quantum 200G Infiniband switches, 4 PB storage and 15 km of optical cable.
That’s about 7.7 million Ampere CUDA cores in the DGX SuperPod system, which is amazing. The system is part of the DGX V expansion plan and adds nearly 700 Petaflops of computing power to the system currently deployed at NVIDIA’s headquarters in Santa Clara, California.
The AI performance benchmarks – Ampere vs Volta & more
NVIDIA has compared their Ampere A100 Tensor Core GPU accelerator to its predecessor, the Volta V100. The comparison also includes Google’s 3rd generation TPU and Huawei Ascend HPC chips. Have MLPerf yourself more detailed benchmarks listed and also a preview of upcoming AI accelerators like Intel’s Cooper Lake-SP Xeon CPUs and Google’s 4th generation TPU. That said, let’s take a look at the benchmarks for yourself.
According to MLPerf, their benchmark suite includes tests that focus on the performance workloads that are most relevant in the machine learning and AI categories. The NVIDIA Ampere A100 simply destroys the Volta V100 with a 2.5x performance improvement. Even with its minimal lead, the Ampere A100 delivers a 50% boost over the Volta V100 GPU, which is impressive. The chip scale here was normalized to a single GPU to provide a fair comparison between Ampere and Volta.
The Huawei Ascend chip was only able to complete one test on time, and that too with poor performance than the Volta V100, while Google’s TPU V3 managed to complete only two tests on time. In one test, the chip gained a 20% lead over NVIDIA Volta V100, while in the second test, it was 10% slower than the V100.
Compared to the Cooper Lake-SP 8 socket configuration that completes the image rating test in 1104.53 minutes, a dual NVIDIA A100 system can complete the same test in just 33.37 minutes. NVIDIA also continues to compare the performance of its Ampere A100 with the unreleased Google TPU V4 which is still in the research phase and is at least a year away from availability.
NVIDIA also shows how the performance of their GPU accelerators has improved over time with the latest full-stack innovations for AI. Compared to MLPerf 0.5 on Volta V100, the MLPerf 0.7 suite with Ampere A100 offers a stunning 4.2x performance improvement.
This shows just how impressive the chip’s NVIDIA Ampere A100 GPU is in real benchmarks within a suite recognized by all the major players in the AI community. The Ampere A100 GPU was also considered to be the fastest GPU ever included in any other benchmark, even compared to the Turing GPU which had hw accelerated techniques that were able to deliver better performance but still couldn’t match the Ampere A100 and its huge performance output. All of these benchmark options make us all the more excited to see Ampere in consumer form, which should definitely happen in a few months.