Focus Mode

NVIDIA HGX B200 Hardware Overview

Updated on 11 March, 2026

The NVIDIA HGX B200 is a Blackwell-architecture data center GPU designed for large-scale AI inference and training. NVIDIA HGX B200 servers provide 8 GPUs per node connected via NVSwitch 5.0.

GPU Specifications

Specification	Value
Architecture	Blackwell (sm_100)
VRAM	179 GB HBM3e
Memory Bandwidth	8.0 TB/s
FP8 Compute	~4.5 PFLOPS
FP4 Compute	~9.0 PFLOPS (NVFP4)
TDP	1000W
Interconnect	NVLink 5.0 / NVSwitch 5.0

What This Means for LLM Inference

LLM inference is primarily memory-bandwidth-bound during the decode phase (generating tokens one at a time). The NVIDIA HGX B200's 8.0 TB/s memory bandwidth is among the highest available, directly translating to faster token generation for memory-bound workloads.

Key NVIDIA HGX B200 advantages for inference:

High bandwidth: 8.0 TB/s enables fast KV cache reads during decode
NVFP4 support: Native 4-bit quantization halves memory vs FP8, enabling single-GPU deployments for models that otherwise need 2+ GPUs
NVSwitch 5.0: 1.8 TB/s bidirectional all-to-all GPU communication for tensor parallelism and NVIDIA Dynamo's disaggregated serving

Node Topology

The NVIDIA HGX B200 uses an 8-GPU configuration with all-to-all NVSwitch connectivity:

                            console
                            
                        
# Verify topology
$ nvidia-smi topo -m

All GPU pairs connect through NVSwitch 5.0, providing 1.8 TB/s bidirectional bandwidth between any two GPUs in the node. This is critical for tensor parallelism (weight sharding across GPUs) and for NVIDIA Dynamo's disaggregated serving (KV cache transfer between prefill and decode pools).

Verified topology (abbreviated):

        GPU0  GPU1  GPU2  GPU3  GPU4  GPU5  GPU6  GPU7
GPU0     X    NV18  NV18  NV18  NV18  NV18  NV18  NV18
GPU1    NV18   X    NV18  NV18  NV18  NV18  NV18  NV18
GPU2    NV18  NV18   X    NV18  NV18  NV18  NV18  NV18
GPU3    NV18  NV18  NV18   X    NV18  NV18  NV18  NV18
GPU4    NV18  NV18  NV18  NV18   X    NV18  NV18  NV18
GPU5    NV18  NV18  NV18  NV18  NV18   X    NV18  NV18
GPU6    NV18  NV18  NV18  NV18  NV18  NV18   X    NV18
GPU7    NV18  NV18  NV18  NV18  NV18  NV18  NV18   X

Every GPU pair shows NV18 — 18 bonded NVLinks via NVSwitch 5.0. GPUs 0-3 are on NUMA node 0 (CPUs 0-63, 128-191) and GPUs 4-7 are on NUMA node 1 (CPUs 64-127, 192-255). The node has 14x Mellanox ConnectX NICs (mlx5_0 through mlx5_13) for network connectivity.

Verified System Configuration

These specifications were verified on the NVIDIA HGX B200 instance used for this cookbook:

                            console
                            
                        
$ nvidia-smi --query-gpu=index,name,memory.total,driver_version --format=csv
index, name, memory.total, driver_version
0, NVIDIA B200, 183359 MiB, 580.105.08
1, NVIDIA B200, 183359 MiB, 580.105.08
2, NVIDIA B200, 183359 MiB, 580.105.08
3, NVIDIA B200, 183359 MiB, 580.105.08
4, NVIDIA B200, 183359 MiB, 580.105.08
5, NVIDIA B200, 183359 MiB, 580.105.08
6, NVIDIA B200, 183359 MiB, 580.105.08
7, NVIDIA B200, 183359 MiB, 580.105.08

Property	Value
GPUs	8x NVIDIA HGX B200
VRAM per GPU	183,359 MiB (~179 GB)
Total VRAM	~1.43 TB
Driver	580.105.08
CUDA	13.0
Persistence Mode	Enabled
Compute Capability	sm_100 (10.0)

Power Consumption

Each NVIDIA HGX B200 draws up to 1000W under load. At idle, GPUs draw approximately 140W each. During inference benchmarks, power consumption varies with load:

State	Power per GPU	Total (8 GPUs)
Idle	~140W	~1,120W
Light load	~200W	~1,600W
Full load	~700-1000W	~5,600-8,000W

Power efficiency (tok/s per watt) is a useful metric for production deployments but is not the primary focus of this cookbook.

NVIDIA HGX B200 Hardware Overview

GPU Specifications

What This Means for LLM Inference

Node Topology

Verified System Configuration

Power Consumption

Comments

Products

Features

Solutions

Marketplace

Resources

Company