Latest ContentInference Cookbook Model Library

Mistral Large 3 675B Instruct 2512

Mistral-Large-3-675B-Instruct-2512 is a large-scale multimodal Mixture-of-Experts (MoE) model designed for instruction following, agentic workflows, and long-context understanding. It has 675B total parameters with about 41B activated, using a 61-layer transformer with 128 attention heads and a 7,168 hidden size. The architecture includes 128 experts with 4 selected per token and supports a context window of ~294K tokens. It also integrates a 2.5B vision encoder and FP8 quantization, enabling efficient deployment and strong performance across text and vision tasks.

Type	Vision-Language Model
Capabilities	Text Generation, Instruction Following, Reasoning, Mathematical Reasoning+6 more
Release Date	December 1, 2025
Links	Blog\|HF Model Card
License	Apache 2.0

Inference Instructions

Deploy and run this model on NVIDIA B200 GPUs using the command below. Copy the command to get started with inference.

CONSOLE

docker run -it --rm 
 --runtime=nvidia 
 --gpus all 
 --ipc=host 
 --shm-size=128g 
 -p 8000:8000 
 -v ~/.cache/huggingface:/root/.cache/huggingface 
 -e HF_TOKEN='YOUR_HF_TOKEN' 
 vllm/vllm-openai:v0.18.0 
 mistralai/Mistral-Large-3-675B-Instruct-2512 
  --tensor-parallel-size 8 
 --max-model-len auto 
 --reasoning-parser mistral 
 --tool-call-parser mistral 
 --tokenizer_mode mistral 
 --config_format mistral 
 --load_format mistral 
 --enable-auto-tool-choice 
  --gpu-memory-utilization 0.9 
  --max-num-seqs 1024 
  --trust-remote-code