Latest ContentInference Cookbook Model Library

MiMo V2.5

Xiaomi MiMo V2.5 is a native omnimodal Mixture-of-Experts model designed for unified multimodal reasoning across text, image, video, and audio. It features 310B total parameters with 15B active, using a 48-layer architecture with 4,096 hidden size and hybrid Sliding Window and Global Attention for efficient long-context processing. The model integrates a 729M-parameter vision encoder and a dedicated audio encoder, enabling rich perception capabilities. Supporting up to a 1M token context, it excels in long-horizon reasoning and advanced agentic workflows.

Type	Omni Model
Capabilities	Text Generation, Instruction Following, Reasoning, Mathematical Reasoning+7 more
Release Date	28 April, 2026
Links	Blog\|HF Model Card
License	MIT

Inference Instructions

Deploy and run this model on NVIDIA B200 GPUs using the command below. Copy the command to get started with inference.

CONSOLE

docker run -it --rm 
 --runtime=nvidia 
 --gpus all 
 --ipc=host 
 --shm-size=128g 
 -p 8000:8000 
 -v ~/.cache/huggingface:/root/.cache/huggingface 
 -e HF_TOKEN='YOUR_HF_TOKEN' 
 vllm/vllm-openai:mimov25-cu130 
 XiaomiMiMo/MiMo-V2.5 
  --tensor-parallel-size 4 
 --enable-expert-parallel 
 --max-model-len auto 
 --max-num-batched-tokens 65536 
 --gpu-memory-utilization 0.95 
  --max-num-seqs 1024 
 --enable-auto-tool-choice 
 --reasoning-parser mimo 
 --tool-call-parser mimo 
 --generation-config vllm 
 --trust-remote-code

Note

For MiMo V2.5 support, use vllm/vllm-openai:mimov25-cu130 for CUDA 13, vllm/vllm-openai:mimov25-cu129 for CUDA 12.9, or any subsequent official vLLM release.

Model Benchmarks

Each model was tested with a fixed input size and total token volume while increasing concurrency to measure serving performance under load.

MiMo V2.5

Inference Instructions

Model Benchmarks

ITL vs Concurrency

Time to First Token

Throughput Scaling

Total Tokens/sec vs Avg TTFT

NVIDIA HGX B200

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Blogs