Dynamo

Updated on 12 March, 2026

Explore NVIDIA Dynamo’s architecture for disaggregated LLM inference, including routing, KV cache tiering, and optimized deployment with vLLM.


NVIDIA Dynamo is an open-source inference framework that adds disaggregated serving, intelligent routing, and tiered KV caching on top of vLLM. Version 0.9.1 supports single-node deployment with in-memory service discovery (--store-kv mem): no external infrastructure (etcd, NATS) required.
Executive summary of tested configurations, benchmarks, and optimization strategies for LLM inference on NVIDIA HGX B200 GPUs with vLLM.