Real-time GPU monitoring data collected via `rocm-smi` during Kimi-K2.5 benchmark runs on 8x AMD Instinct MI325X GPUs. Data was sampled at 1-second intervals across 3 independent runs.

GPUs 0-3 (active for TP=4 inference) draw significantly more power than GPUs 4-7 (idle). The active GPUs average 640-680W each, well within the MI325X TDP of 750W.

Junction and HBM3e memory temperatures track closely. Active GPUs (0-3) reach 60-80°C junction, while idle GPUs stay below 40°C.

Total system power fluctuates as benchmark phases transition between concurrency levels. Peaks correspond to high-concurrency stress phases.

Temperature ramps correlate with sustained compute phases. The MI325X's cooling solution keeps junction temperatures well below throttling thresholds (100°C+).
| Metric | Value |
|---|---|
| Overall Mean Power (W) | 389.8 |
| Overall Max Power (W) | 798.0 |
| Overall Mean Junction Temp (°C) | 47.3 |
| Overall Max Junction Temp (°C) | 76.0 |
| Mean Memory Temp (°C) | 42.8 |
| Max Memory Temp (°C) | 66.0 |
| Total Samples | 54,528 |
| Runs | 3, 4, 5 |
With Kimi-K2.5 achieving ~952 tok/s peak throughput using 4 active GPUs at ~660W each, the effective power efficiency is approximately 0.36 tok/s per watt for the active GPUs. The remaining 4 idle GPUs draw baseline power but provide no inference throughput, which is a consideration for TP=4 models on 8-GPU systems.
All monitoring data was collected using:
rocm-smi --showpower --showtemp --showuse
at 1-second intervals as a background process during benchmark execution. This introduces negligible overhead (<0.1% CPU).