Home Gaming Nvidia Explodes MLPerf V6.0, But At What Infrastructure Cost To Follow?

Gaming

Nvidia Explodes MLPerf V6.0, But At What Infrastructure Cost To Follow?

2 April، 2026

Nvidia once again dominates the MLPerf v6.0 inference benchmark. According to results published on April 1 and reported by ITHome, the Blackwell Ultra platform in GB300 NVL72 configuration achieves the best performance across all scenarios, with an announced advantage of nine times more wins than the closest competitor. Highlight for interactive LLMs: on DeepSeekâ€’R1 in server mode, the firm displays 8064 tokens per second per GPU, or 2.77 times better than under MLPerf v5.1.

Nvidia Explodes MLPerf V6.0, But At What Infrastructure Cost To Follow?

A panel of models brought up to date

The MLPerf v6.0 suite greatly expands the spectrum of loads. On the LLM side, it integrates GPTâ€’OSSâ€’120B, focused on math/science reasoning and code, and evolves DeepSeekâ€’R1 with an interactive scenario that tightens the requirements of TTFT and flow per token, more representative of a real-time chatbot. On the multimodal side, Qwen3â€’VLâ€’235B marks the arrival of a VLM for the conversion of unstructured data into metadata.

MLPerf performance charts with DeepSeek-R1 and Nvidia Blackwell Ultra GPU

Another notable addition, WANâ€’2.2 for textâ€’toâ€’video abandons server mode in favor of SingleStream, better suited to the latency of video generators. The recommendation section switches to DLRMv3 (Transformer architecture provided by Meta), larger and more expensive than DCNv2. For the edge, the detection benchmark switches to YOLOv11 Large from Ultralytics.

Blackwell Ultra empile les records

On DeepSeekâ€’R1 server, Nvidia credits a throughput of 8064 tokens/s/GPU, reference for this edition. On Llama 3.1 405B, the announced gains reach 1.52Ã— in server mode and 1.21Ã— in offline mode. The highlighted hardware stack, GB300 NVL72, is part of the line of high-density multi-GPU systems optimized for large-scale inference.

These figures reflect both the evolution of kernels and memory paths as well as the optimization of graphs for interactive scenarios, where TTFT and throughput per token become decisive. The expansion of MLPerf to VLMs and text-to-video also validates Nvidia’s strategy to optimize beyond just LLMs, on increasingly heterogeneous pipelines.

The push on interactive and giant models will mechanically strengthen the appeal of NVL configurations and very high-speed intra-node networks. For hyperscalers, the advantage in tokens/s/GPU on DeepSeekâ€’R1 and the jump on Llama 3.1 405B direct CAPEX arbitrations towards Blackwell on dialogue/agent workloads, at least in the short term, while awaiting consolidated responses from competitors on these new scenarios.

Source : ITHome

Nvidia Explodes MLPerf V6.0, But At What Infrastructure Cost To Follow?

A panel of models brought up to date

Blackwell Ultra empile les records

Similar articles

Latest News

Donald Trump’s latest approval numbers amid Iran war, shutdown, gas prices

Dynamo Dresden throws out rowdy fan

A new kind of international chocolate festival is coming to Bordeaux...

Technology 3/4: China expands its digital yuan program

F2P From Krafton: PUBG Blindspot Will Close on March 30, 2026...

All categories