Dynamo

Updated on 12 March, 2026

Explore NVIDIA Dynamo’s architecture for disaggregated LLM inference, including routing, KV cache tiering, and optimized deployment with vLLM.

NVIDIA Dynamo

NVIDIA Dynamo is an open-source inference framework that adds disaggregated serving, intelligent routing, and tiered KV caching on top of vLLM. Version 0.9.1 supports single-node deployment with in-memory service discovery (--store-kv mem): no external infrastructure (etcd, NATS) required.

Executive Summary

Executive summary of tested configurations, benchmarks, and optimization strategies for LLM inference on NVIDIA HGX B200 GPUs with vLLM.

Dynamo

Products

Features

Solutions

Marketplace

Resources

Company