Home Gaming Nvidia Explodes MLPerf V6.0, But at What Infrastructure Cost to Follow?

Gaming

Nvidia Explodes MLPerf V6.0, But at What Infrastructure Cost to Follow?

2 April، 2026

Nvidia is once again dominating the MLPerf inference benchmark v6.0. According to results published on April 1st and reported by ITHome, the Blackwell Ultra platform in GB300 NVL72 configuration outperforms all scenarios, with a nine-fold advantage in “wins” over the closest competitor. A key highlight for interactive LLMs: on DeepSeek-R1 in server mode, the company displays 8064 tokens per second per GPU, which is 2.77 times better than under MLPerf v5.1.

A range of models has been revamped in the MLPerf v6.0 suite. For LLMs, it integrates GPT-OSS-120B, focused on math/science reasoning and coding, and evolves DeepSeek-R1 with an interactive scenario that toughens TTFT and token throughput requirements, representing a real-time chatbot. Qwen3-VL-235B introduces a VLM for converting unstructured data into metadata in multimodal scenarios.

Another notable addition is WAN-2.2 for text-to-video, which shifts from server mode to SingleStream, better suited for video generator latency. The recommendation section switches to DLRMv3 (Transformer architecture by Meta), larger and more expensive than DCNv2. For edge computing, the detection benchmark transitions to YOLOv11 Large from Ultralytics.

On the DeepSeek-R1 server, Nvidia achieves a throughput of 8064 tokens per second per GPU, a reference for this edition. For Llama 3.1 405B, the announced gains reach 1.52—server mode and 1.21—offline mode. The highlighted hardware stack, GB300 NVL72, fits into the high-density multi-GPU systems optimized for large-scale inference.

These numbers reflect the evolution of kernels and memory paths as well as graph optimization for interactive scenarios, where TTFT and token throughput become decisive. Expanding MLPerf to include VLM and text-to-video validates Nvidia’s strategy of optimizing beyond just LLMs, on increasingly heterogeneous pipelines.

The push towards interactive and giant models will naturally enhance the appeal of NVL configurations and high-throughput intra-node networks. For hyperscalers, the advantage in tokens per second per GPU on DeepSeek-R1 and the leap on Llama 3.1 405B guide CAPEX decisions towards Blackwell for dialogue/agent workloads, at least in the short term, pending consolidated responses from competitors on these new scenarios.

Source: ITHome

Nvidia Explodes MLPerf V6.0, But at What Infrastructure Cost to Follow?

Latest News

Nvidia Explodes MLPerf V6.0, But At What Infrastructure Cost To Follow?

DIRECT. Crisis in Italian football: the president of the Italian Federation...

Womens Champions League: Lyon qualifies for the semi

Living with the times: adapting to new geopolitical realities (and coping...

European ETI Summit: ETIs facing the Innovation Challenge

All categories