Concurrency Saturation Analysis

Updated on 11 March, 2026

Fine-grained concurrency sweep from 500 to 1,000 concurrent requests (step 50) to identify the exact saturation knee for each model. Each concurrency level was tested across 3 independent runs with 200 requests per level.


Throughput Saturation Curves

Saturation Curve Chart

Latency at High Concurrency

Saturation Latency Chart

Summary

Model Concurrency at Peak Throughput Peak Throughput (tok/s) p99 Latency at Peak Throughput Range (min-max)
Qwen3-VL-235B-A22B 700 60,131 tok/s 4.27s 59,742 – 60,131 tok/s
Llama-3.1-405B 500 34,050 tok/s 7.80s 33,376 – 34,050 tok/s
DeepSeek V3.2 500 37,413 tok/s 6.49s 36,693 – 37,413 tok/s
Kimi-K2.5 650 7,340 tok/s 35.97s 7,250 – 7,340 tok/s

Key Findings

Saturation Behavior by Architecture

  • Qwen3-VL-235B maintains throughput around 59,700-60,100 tok/s across the entire 500-1000 range, showing the smallest active parameter footprint (22B) translates to the highest throughput ceiling.
  • DeepSeek V3.2 plateaus around 36,700-37,400 tok/s, with minimal variance across the sweep. The MoE routing overhead stabilizes at high concurrency.
  • Llama-405B sustains around 33,400-34,100 tok/s. As a dense model with FP8 quantization, it achieves remarkably high total throughput despite its 405B parameter count.
  • Kimi-K2.5 stabilizes around 7,250-7,340 tok/s. Despite being the largest model (1T params), TP=4 limits its parallelism bandwidth.

Practical Implications

All models are fully saturated by 500 concurrent requests. Operating beyond 750 concurrent provides no throughput benefit and only increases tail latency. For production deployments, target the 200-500 range for optimal throughput-to-latency tradeoff.

Per-Model Detail

DeepSeek V3.2

Concurrency Throughput Mean ± CI95 Output Throughput p99 Latency p50 Latency
500 37,413 ± 82 tok/s 745 tok/s 6.4915s 0.3935s
550 37,106 ± 102 tok/s 744 tok/s 6.5826s 0.4376s
600 36,718 ± 1,324 tok/s 812 tok/s 6.6742s 0.4216s
650 37,157 ± 248 tok/s 755 tok/s 6.5508s 0.4080s
700 36,817 ± 709 tok/s 780 tok/s 6.6233s 0.4079s
750 36,879 ± 854 tok/s 771 tok/s 6.5957s 0.4150s
800 36,825 ± 640 tok/s 732 tok/s 6.6307s 0.4517s
850 36,693 ± 512 tok/s 753 tok/s 6.6265s 0.4142s
900 36,859 ± 662 tok/s 665 tok/s 6.5952s 0.4241s
950 36,911 ± 184 tok/s 742 tok/s 6.6158s 0.4511s
1000 36,697 ± 1,476 tok/s 897 tok/s 6.6644s 0.4148s

Llama-3.1-405B

Concurrency Throughput Mean ± CI95 Output Throughput p99 Latency p50 Latency
500 34,050 ± 1,149 tok/s 2,457 tok/s 7.7987s 7.7473s
550 33,736 ± 642 tok/s 2,369 tok/s 7.8475s 7.8003s
600 33,393 ± 26 tok/s 2,464 tok/s 7.9579s 7.9149s
650 33,959 ± 1,291 tok/s 2,452 tok/s 7.8168s 7.7641s
700 33,398 ± 76 tok/s 2,457 tok/s 7.9629s 7.9076s
750 33,442 ± 25 tok/s 2,464 tok/s 7.9392s 7.8950s
800 33,406 ± 44 tok/s 2,469 tok/s 7.9690s 7.9060s
850 33,376 ± 105 tok/s 2,437 tok/s 7.9624s 7.9059s
900 33,385 ± 72 tok/s 2,464 tok/s 7.9654s 7.9142s
950 33,440 ± 41 tok/s 2,457 tok/s 7.9508s 7.8954s
1000 33,398 ± 15 tok/s 2,461 tok/s 7.9597s 7.9067s

Qwen3-VL-235B

Concurrency Throughput Mean ± CI95 Output Throughput p99 Latency p50 Latency
500 60,085 ± 135 tok/s 4,528 tok/s 4.2601s 4.2342s
550 59,895 ± 163 tok/s 4,514 tok/s 4.2810s 4.2506s
600 60,014 ± 109 tok/s 4,523 tok/s 4.2684s 4.2392s
650 59,918 ± 336 tok/s 4,515 tok/s 4.2776s 4.2527s
700 60,131 ± 294 tok/s 4,531 tok/s 4.2693s 4.2358s
750 59,811 ± 132 tok/s 4,507 tok/s 4.2834s 4.2535s
800 60,062 ± 59 tok/s 4,526 tok/s 4.2637s 4.2378s
850 59,809 ± 162 tok/s 4,507 tok/s 4.2826s 4.2522s
900 59,754 ± 281 tok/s 4,503 tok/s 4.2862s 4.2552s
950 60,002 ± 284 tok/s 4,522 tok/s 4.2639s 4.2406s
1000 59,742 ± 77 tok/s 4,502 tok/s 4.2897s 4.2595s

Kimi-K2.5

Concurrency Throughput Mean ± CI95 Output Throughput p99 Latency p50 Latency
500 7,316 ± 49 tok/s 552 tok/s 36.0929s 35.7474s
550 7,331 ± 45 tok/s 553 tok/s 35.9924s 35.6532s
600 7,268 ± 51 tok/s 548 tok/s 36.3368s 36.0086s
650 7,340 ± 57 tok/s 554 tok/s 35.9741s 35.5474s
700 7,282 ± 80 tok/s 549 tok/s 36.2510s 35.8314s
750 7,256 ± 92 tok/s 547 tok/s 36.3904s 36.0571s
800 7,272 ± 17 tok/s 548 tok/s 36.3158s 36.0500s
850 7,289 ± 3 tok/s 550 tok/s 36.2254s 35.7805s
900 7,250 ± 16 tok/s 547 tok/s 36.4129s 36.1525s
950 7,292 ± 56 tok/s 550 tok/s 36.2153s 35.8572s
1000 7,260 ± 94 tok/s 547 tok/s 36.3648s 35.9700s

Comments