Latest ContentInference Cookbook Model Library

DeepSeek V4 Pro

DeepSeek V4 Pro is an ultra-large Mixture-of-Experts model designed for high-performance long-context reasoning and large-scale deployment. It features 1.6T total parameters with approximately 49B activated, using 384 routed experts with 6 selected per token across a 61-layer architecture with 7,168 hidden size and 128 attention heads. Built with hybrid compressed attention mechanisms and manifold-constrained hyper-connections, it supports up to a 1M token context window. With FP4 and FP8 mixed precision and advanced optimization techniques, it delivers strong efficiency, stability, and agentic reasoning performance across complex workloads.

Type	MoE LLM
Capabilities	Text Generation, Instruction Following, Reasoning, Mathematical Reasoning+5 more
Release Date	24 April, 2026
Links	Blog\|HF Model Card
License	MIT

Inference Instructions

Deploy and run this model on NVIDIA B200 GPUs using the command below. Copy the command to get started with inference.

CONSOLE

docker run -it --rm 
 --runtime=nvidia 
 --gpus all 
 --ipc=host 
 --shm-size=128g 
 -p 8000:8000 
 -v ~/.cache/huggingface:/root/.cache/huggingface 
 -e HF_TOKEN='YOUR_HF_TOKEN' 
 vllm/vllm-openai:v0.20.0 
 deepseek-ai/DeepSeek-V4-Pro 
 --attention_config.use_fp4_indexer_cache=True 
 --kv-cache-dtype fp8 
 --block-size 256 
 --tensor-parallel-size 8 
 --enable-expert-parallel 
 --max-model-len auto 
  --max-num-batched-tokens 65536 
 --gpu-memory-utilization 0.90 
 --tool-call-parser deepseek_v4 
 --reasoning-parser deepseek_v4 
 --tokenizer-mode deepseek_v4 
 --enable-auto-tool-choice 
 --max-num-seqs 1024 
 --trust-remote-code

Model Benchmarks

Each model was tested with a fixed input size and total token volume while increasing concurrency to measure serving performance under load.

DeepSeek V4 Pro

Inference Instructions

Model Benchmarks

ITL vs Concurrency

Time to First Token

Throughput Scaling

Total Tokens/sec vs Avg TTFT

NVIDIA HGX B200

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Blogs