Published in Graphics

AMD miffed at Nvidia’s performance comparison

by on20 December 2023


Handbags at dawn

AMD has come out with a benchmark of its own between its Instinct MI300X and Nvidia’s H100 (Hopper) GPUs.

It appears that AMD was mightily miffed when Nvidia released benchmarks claiming their H100 was far superior.  But like most benchmark wars, it all depends on if you are looking at the quality or the width.

AMD's recreated test scenarios using TensorRT-LLM and factored in server workload latency, an important factor in server environments. AMD underscored its use of FP16 with vLLM, differing from Nvidia's exclusive use of FP8 with TensorRT-LLM. The MI300X graphics accelerator was introduced by AMD in early December and claimed a performance advantage of up to 1.6 times over Nvidia's H100.

However, Nvidia argued that AMD's comparison did not factor in optimisations for the H100 using TensorRT-LLM. They compared a single H100 against an eight-way configuration of H100 GPUs running the Llama 2 70B chat model.

AMD countered Nvidia's benchmarks by selectively using inferencing workloads with its proprietary TensorRT-LLM on the H100. This contrasted with the open-source and more common vLLM method.

Additionally, AMD claimed that Nvidia compared its vLLM FP16 performance datatype for AMD's GPUs against the DGX-H100's TensorRT-LLM with the FP8 datatype. AMD chose vLLM with FP16 due to its widespread use and lack of FP8 support in vLLM.

AMD slammed Nvidia for not considering latency in its benchmarks, focusing instead on throughput performance. This, according to AMD, does not reflect real-world server conditions.

AMD used Nvidia's TensorRT-LLM in its performance tests to measure latency differences between the MI300X and vLLM with FP16 against the H100 with TensorRT-LLM. The first test compared both GPUs using vLLM, and the second assessed MI300X's performance with vLLM against H100's TensorRT-LLM. These tests, replicating Nvidia's scenarios, showed improved performance and reduced latency for AMD. When running vLLM on both GPUs, AMD's MI300X achieved a 2.1x performance increase.

After that trash talk, we expect Nvidia to come back with comments about AMD’s mother. It will probably point out that the industry prefers FP16 over FP8 in TensorRT-LLM's closed system, and it might mention the moves away from vLLM.

 

Last modified on 20 December 2023
Rate this item
(3 votes)