Home Gaming MLPerf 6.0: MI355X Crosses 1M Tokens/s And Is Coming Behind B200/B300, Including...

Gaming

MLPerf 6.0: MI355X Crosses 1M Tokens/s And Is Coming Behind B200/B300, Including Multi-node

2 April، 2026

The milestone of one million tokens per second is crossed, and the multi-node scale follows almost linearly. AMD pushes its MI355X beyond the simple throughput record with reproducible scores among partners.

AMD Instinct MI355X et MLPerf 6.0Â : chiffres clÃ©s et pÃ©rimÃ¨tre

Engraved in 3 nm, AMD Instinct MI355X GPUs (CDNA 4 architecture) total 185 billion transistors, support FP4/FP6 and carry up to 288 GB of HBM3E. Up to 10 PFLOPS in FP4/FP6, capacity of up to 520 billion parameters on a single GPU and UBB8 node in aircooling or DLC: the platform is designed for large-scale inference.

MLPerf 6.0: MI355X Crosses 1M Tokens/s And Is Coming Behind B200/B300, Including Multi-node

In MLPerf InferenceÂ 6.0, AMD exceeds 1 million tokens/s on LlamaÂ 2Â 70B (Server and Offline) and GPT-OSS-120B (Offline) via multi-node deployments in MI355X. Partners reproduce scores at Â±4% (sometimes Â±1Â %), covering four Instinct generations: MI300X, MI325X, MI350X and MI355X.

Performance, scale and model coverage

Generation vs generation: on Llama 2 70B Server, a MI355X reaches 100 282 tokens/s, or 3.1x the throughput previously submitted on MI325X. The gains come from the CDNAÂ 4Â + ROCm couple, the calculation density, the FP4/FP6 formats and the HBM3E.

CompÃ©titivitÃ© simple nÅ“ud LlamaÂ 2Â 70BÂ : par rapport Ã NVIDIAÂ B200, la plateforme MI355X Ã©galise en Offline, dÃ©livre 97Â % en Server et 119Â % en Interactive. Face Ã B300Â : 93Â % en Server, 92Â % en Offline et 104Â % en Interactive.

GPT-OSS-120B (premiÃ¨re intÃ©gration MLPerf)Â : 111Â % of B200 en Offline et 115Â % en Server sur un nÅ“ud MI355X. Contre B300, 91Â % en Offline et 82Â % en Server.

Texteâ€‘versâ€‘vidÃ©o Wanâ€‘2.2â€‘t2v (Single Stream, soumission Open mais conforme Closed)Â : 93Â % de B200 et 87Â % de B300 en officiel. Post-deadline (non vÃ©rifiÃ© MLCommons)Â : 108Â % de B200 et paritÃ© B300 en Single Stream, 111Â %/88Â % en Offline.

AMD Instinct MI355X passes 1M tokens/s and shakes up MLPerf Inference 6.0

ScalabilitÃ© multinÅ“ud LlamaÂ 2Â 70BÂ : de 1 Ã 11Â nÅ“uds, proche de la linÃ©aritÃ©. Ã€ 11Â nÅ“uds/87Â MI355XÂ : 1Â 042Â 110Â tokens/s (Offline), 1Â 016Â 380Â tokens/s (Server) et 785Â 522Â tokens/s (Interactive). EfficacitÃ©Â : 93Â % (Offline), 93Â % (Server), 98Â % (Interactive).

Scalability multinÅ“ud GPTâ€’OSSâ€’120BÂ : Ã 12Â nÅ“uds/94Â MI355XÂ : 1Â 031Â 070Â tokens/s (Offline) and 900Â 054Â tokens/s (Server), with 92Â % and 93Â % of efficiency respectively. DeuxiÃ¨me modÃ¨le auâ€’delÃ de 1Â million de tokens/s en multinÅ“ud.

Ã‰cosystÃ¨me, hÃ©tÃ©rogÃ©nÃ©itÃ© et ROCm

Nine partners submit on Instinct: Cisco, Dell, Giga Computing, HPE, MangoBoost, MiTAC, Oracle, Supermicro, Red Hat. MI355X partner results stick to ±4% of AMD figures, including on new loads, a guarantee of field reproducibility.

PremiÃ¨re soumission hÃ©tÃ©rogÃ¨ne 3Â GPU (MI300XÂ + MI325XÂ + MI355X, DellÂ + MangoBoost)Â : 141Â 521Â tokens/s (LlamaÂ 2Â 70B Server) et 151Â 843Â tokens/s (Offline). ParticularitÃ©Â : MI355X aux USA (Dell), MI300X/MI325X en CorÃ©e, dÃ©montrant une orchestration interâ€‘gÃ©ographies.

ROCm drives FP4 execution, GPU communications for multi-node scale, dynamic distribution in heterogeneous environments and rapid activation of models (Llama, Wan, GPT – OSS). The result: performance, scalability and flexibility across the entire portfolio Instinct.

Roadmap: MI300X (2023) laid the GenAI foundation, MI325X (2024) increased compute and HBM3E, the MI350 series including MI355X (2025) adds FP4/FP6 and more model capacity for inference. In 2026, AMD plans the MI400 under CDNAÂ 5 and the Helios rack-scale solution.

The approach combines a tangible jump in throughput and an increase in software maturity. Parity or near parity in single nodes, linearity in clusters and multi-OEM reproducibility reinforce the credibility of a serious alternative to high-volume LLM and multimodal deployments, with a particular interest in reducing the cost per token via efficiency of scale.

Source : TechPowerUp

MLPerf 6.0: MI355X Crosses 1M Tokens/s And Is Coming Behind B200/B300, Including Multi-node

AMD Instinct MI355X et MLPerf 6.0Â : chiffres clÃ©s et pÃ©rimÃ¨tre

Performance, scale and model coverage

Ã‰cosystÃ¨me, hÃ©tÃ©rogÃ©nÃ©itÃ© et ROCm

Similar articles

Latest News

Hungary: Stakes of the legislative elections on April 12, 2026

Tsunami warning issued after powerful Indonesia earthquake

The sports news tour!

Iyuno announces a revolutionary innovation in content localization

Playing video games for real money: The paths of national video...

All categories