Model Guides

Updated on 12 March, 2026

Deploy leading AI models including Nemotron, DeepSeek, GLM, and MiniMax on NVIDIA HGX B200 GPUs with optimized inference configurations.

Nemotron 3 Nano 30B

Deploy NVIDIA's Nemotron 3 Nano on NVIDIA HGX B200 GPUs. This hybrid Mamba-Transformer model delivers high throughput with only 3B active parameters per token.

Llama-3.3 Nemotron Super 49B

Deploy NVIDIA's Nemotron Super 49B on NVIDIA HGX B200 GPUs. This dense transformer model is based on Llama 3.3 with NAS-optimized architecture, delivering strong reasoning performance.

DeepSeek V3.2

Deploy DeepSeek's V3.2 on NVIDIA HGX B200 GPUs. This MoE model uses Multi-Latent Attention (MLA) for compressed KV caching, delivering strong reasoning performance at 685B parameters.

GLM-5

Deploy THUDM/Zhipu's GLM-5 on NVIDIA HGX B200 GPUs. This large MoE model introduces Differential Sparse Attention for efficient inference at 744B total parameters.

MiniMax M2.5

Deploy MiniMax's M2.5 on NVIDIA HGX B200 GPUs. This MoE model combines Lightning Attention with traditional SoftMax attention for efficient long-context inference.

Model Guides

Products

Features

Solutions

Marketplace

Resources

Company