Benchmarks

Updated on 12 March, 2026

Detailed benchmarking of DeepSeek, Llama, Qwen3-VL, and Kimi models on AMD Instinct MI325X GPUs with stress and validation testing.

Stress Testing

DeepSeek V3.2 (685B) Stress Testing

Comprehensive stress testing of DeepSeek V3.2 (685B parameters) on 8x AMD Instinct MI325X GPUs.

Llama 3.1 (405B) Stress Testing

Comprehensive stress testing of Llama-3.1-405B-Instruct on 8x AMD Instinct MI325X GPUs.

Qwen3-VL (235B) Stress Testing

Comprehensive stress testing of Qwen3-VL-235B-A22B-Instruct (Vision-Language model, 235B parameters, 22B active) on 8x AMD Instinct MI325X GPUs.

Kimi-K2.5 (1T) Stress Testing

Comprehensive stress benchmarks for Kimi-K2.5 (1 trillion parameters, 32B active per token) on 8x AMD Instinct MI325X GPUs with thorough testing (3x multiplier).

Validation Testing

DeepSeek V3.2 (685B) Validation Testing

Validation benchmarks for DeepSeek V3.2 (685B parameters) on 8x AMD Instinct MI325X GPUs.

Llama 3.1 (405B) Validation Testing

Validation benchmarks for Llama 3.1 405B Instruct (dense architecture, 405B parameters) on 8x AMD Instinct MI325X GPUs.

Qwen3-VL (235B) Validation Testing

Validation benchmarks for Qwen3-VL-235B-A22B-Instruct (Vision-Language model, 235B parameters, 22B active) on 8x AMD Instinct MI325X GPUs.

Kimi-K2.5 (1T) Validation Testing

Validation benchmarks for Kimi-K2.5 (1 trillion parameters, 32B active) on 8x AMD Instinct MI325X GPUs.

Benchmarking Guide

This guide explains the methodology used for all benchmark results in this documentation, and provides the scripts to reproduce them.

Multi-Run Statistical Analysis

All results below are aggregated from 5 independent benchmark runs per model on 8x AMD Instinct MI325X GPUs. Each run used 100 requests per concurrency level with 2,048 input tokens and 512 output tokens.

AITER Ablation

AITER (AMD Inference and Training Engine for ROCm) provides optimized attention kernels for AMD GPUs. This study measures its impact on inference throughput across model architectures.

Memory Profiling

Detailed GPU memory measurements for all 4 models running on AMD Instinct MI325X GPUs (256 GB HBM3e per GPU). Measurements taken via `rocm-smi` after model loading and warmup completion.

Saturation Analysis

Fine-grained concurrency sweep from 500 to 1,000 concurrent requests (step 50) to identify the exact saturation knee for each model. Each concurrency level was tested across 3 independent runs with 200 requests per level.

GPU Monitoring

Real-time GPU monitoring data collected via `rocm-smi` during Kimi-K2.5 benchmark runs on 8x AMD Instinct MI325X GPUs. Data was sampled at 1-second intervals across 3 independent runs.

Benchmark Methodology

Complete documentation of the benchmark methodology, test environment, and tooling validation used for all results in this cookbook.