Validation benchmarks for Kimi-K2.5 (1 trillion parameters, 32B active) on 8x AMD Instinct MI325X GPUs.

| Concurrent | Throughput | Output tok/s | p99 Latency | Status |
|---|---|---|---|---|
| 5 | 446 tok/s | 53 | 18.94s | DEGRADED |
| 10 | 599 tok/s | 71 | 28.19s | DEGRADED |
| 25 | 957 tok/s | 113 | 44.15s | DEGRADED |
| 50 | 1,777 tok/s | 210 | 47.54s | DEGRADED |
| 75 | 1,767 tok/s | 209 | 47.79s | DEGRADED |
| 100 | 1,749 tok/s | 207 | 48.30s | DEGRADED |
Observations:
| Test Type | Mode | Throughput | Output tok/s | p99 Latency | Status |
|---|---|---|---|---|---|
| long_output | text | 107 tok/s | 90 | 83.26s | OK |
| long_context | text | 454 tok/s | 35 | 40.11s | OK |
| multi_image_3 | multi-image | 491 tok/s | 32 | 66.53s | OK |
| high_conc_vision | vision | 887 tok/s | 149 | 100.83s | OK |
Key findings:

| Concurrent | Throughput | Success Rate | p99 Latency | Status |
|---|---|---|---|---|
| 150 | 1,827 tok/s | 100% | 69.34s | OK |
| 200 | 1,967 tok/s | 100% | 64.36s | OK |
| 300 | 1,920 tok/s | 100% | 65.96s | SATURATED |
| 500 | 2,053 tok/s | 100% | 61.67s | OK |
Observations:
| Use Case | Concurrency | Expected Throughput |
|---|---|---|
| Low latency | 1–5 | 450–600 tok/s |
| Balanced | 25–50 | 950–1,800 tok/s |
| High throughput | 100–200 | 1,750–2,000 tok/s |
| Parameter | Value |
|---|---|
| Model | moonshotai/Kimi-K2.5 |
| Test Mode | quick (0.5x multiplier) |
| Timestamp | 20260203_153639 |
| Vision Model | Yes (MoonViT) |
docker run --rm \
--name vllm-kimi-k25 \
--ipc=host \
--network=host \
--group-add video \
--group-add render \
--cap-add=SYS_PTRACE \
--cap-add=CAP_SYS_ADMIN \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "VLLM_ROCM_USE_AITER=0" \
--env "VLLM_USE_TRITON_FLASH_ATTN=0" \
rocm/vllm-dev:nightly \
vllm serve moonshotai/Kimi-K2.5 \
--tensor-parallel-size 4 \
--max-model-len 32768 \
--trust-remote-code \
--block-size 1 \
--mm-encoder-tp-mode data
| Specification | Value |
|---|---|
| GPU | 8x AMD Instinct MI325X |
| VRAM | 256 GB HBM3E per GPU (2 TB total) |
| Architecture | CDNA 3 (gfx942) |
| ROCm | 6.4.2-120 |
| vLLM | nightly (rocm/vllm-dev:nightly) |
| Tensor Parallel | 4 (required for AITER MLA) |