Validation benchmarks for Qwen3-VL-235B-A22B-Instruct (Vision-Language model, 235B parameters, 22B active) on 8x AMD Instinct MI325X GPUs.

| Concurrent | Throughput | Output tok/s | p99 Latency | Status |
|---|---|---|---|---|
| 5 | 1,915 tok/s | 286 | 3.50s | DEGRADED |
| 10 | 3,577 tok/s | 534 | 3.74s | DEGRADED |
| 25 | 7,824 tok/s | 1,167 | 4.28s | DEGRADED |
| 50 | 13,136 tok/s | 1,959 | 5.08s | DEGRADED |
| 75 | 13,677 tok/s | 2,040 | 4.88s | DEGRADED |
| 100 | 13,132 tok/s | 1,958 | 5.07s | DEGRADED |
Observations:
| Test Type | Mode | Throughput | Output tok/s | p99 Latency | Status |
|---|---|---|---|---|---|
| long_output | text | 995 tok/s | 839 | 8.93s | OK |
| long_context | text | 5,249 tok/s | 403 | 3.47s | OK |
| multi_image_3 | multi-image | 4,270 tok/s | 354 | 5.93s | OK |
| high_conc_vision | vision | 9,546 tok/s | 1,987 | 7.52s | OK |
Key findings:

| Concurrent | Throughput | Success Rate | p99 Latency | Status |
|---|---|---|---|---|
| 150 | 17,810 tok/s | 100% | 5.61s | OK |
| 200 | 17,553 tok/s | 100% | 5.70s | SATURATED |
| 300 | 17,569 tok/s | 100% | 5.69s | SATURATED |
| 500 | 17,707 tok/s | 100% | 5.66s | SATURATED |
Observations:
| Use Case | Concurrency | Expected Throughput |
|---|---|---|
| Low latency | 5–10 | 1,900–3,500 tok/s |
| Balanced | 25–50 | 7,800–13,100 tok/s |
| High throughput | 75–150 | 13,600–17,800 tok/s |
| Parameter | Value |
|---|---|
| Model | Qwen/Qwen3-VL-235B-A22B-Instruct |
| Test Mode | quick |
| Timestamp | 20260128_190627 |
| Vision Model | Yes |
docker run --rm \
--group-add=video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
--env "VLLM_USE_TRITON_FLASH_ATTN=0" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai-rocm:latest \
--model Qwen/Qwen3-VL-235B-A22B-Instruct \
--tensor-parallel-size 8 \
--max-model-len 32768 \
--kv-offloading-backend native \
--kv-offloading-size 64 \
--disable-hybrid-kv-cache-manager
| Specification | Value |
|---|---|
| GPU | 8x AMD Instinct MI325X |
| VRAM | 256 GB HBM3E per GPU (2 TB total) |
| Architecture | CDNA 3 (gfx942) |
| ROCm | 6.4.2-120 |
| vLLM | 0.14.1 |