How to Run PyTorch on Vultr Cloud GPU Powered by NVIDIA GH200 using Docker
Introduction
The NVIDIA GH200 Grace Hopper™ Superchip architecture brings together the groundbreaking performance of the NVIDIA Hopper™ GPU with the versatility of the NVIDIA Grace™ CPU in a single superchip, connected with the high-bandwidth, memory-coherent NVIDIA® NVLink® Chip-2-Chip (C2C) interconnect.
This article demonstrates the step-by-step process needed to run sample PyTorch code inside a Docker container on Vultr Cloud GPU instances powered by NVIDIA GH200.
Prerequisites
Before you begin, you must:
- Have access to a Vultr Cloud GPU instance powered by NVIDIA GH200
Verify the GPU Driver
Login to your Vultr Cloud GPU instance via SSH.
Run the following command to verify the GPU driver.
console$ nvidia-smi
Output.
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GH200 480GB Off | 00000009:01:00.0 Off | 0 | | N/A 29C P0 110W / 900W | 557MiB / 97871MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+
Here, the output shows that the NVIDIA driver version is
550.54.15
is installed with CUDA version12.4
.
PyTorch with Docker
Pull the PyTorch Docker image from NVIDIA NGC Catalog.
console$ docker pull nvcr.io/nvidia/pytorch:24.02-py3
Start a new Docker container with the PyTorch image.
console$ docker run \ --gpus all \ -it \ --rm \ -e NVIDIA_DRIVER_CAPABILITIES=all \ -e NVIDIA_VISIBLE_DEVICES=all \ -e __NV_PRIME_RENDER_OFFLOAD=1 \ -e __GLX_VENDOR_LIBRARY_NAME=nvidia \ nvcr.io/nvidia/pytorch:24.02-py3
Here, the
--gpus all
flag is used to enable GPU support in the Docker container.Some of the environment variables used in the command are:
NVIDIA_DRIVER_CAPABILITIES
: Specifies the driver capabilities.NVIDIA_VISIBLE_DEVICES
: Specifies the visible devices.__NV_PRIME_RENDER_OFFLOAD
: Specifies the prime render offload.__GLX_VENDOR_LIBRARY_NAME
: Specifies the GLX vendor library name.
By running the above command, you will be inside the Docker container with the PyTorch image.
Once inside the Docker container, enter the Python console.
console# python
Fetch the PyTorch version and CUDA version.
pythonimport torch print("PyTorch Version:", torch.__version__) print("CUDA Version:", torch.version.cuda)
Output.
PyTorch Version: 2.3.0a0+ebedce2 CUDA Version: 12.3
Here, the output shows that PyTorch version
2.3.0a0+ebedce2
is installed with CUDA version12.3
.Test CUDA availability.
pythonprint(torch.cuda.is_available())
Output.
True
Here, the output shows that CUDA is available. That means PyTorch is running with GPU support.
Test Tensor operations to verify GPU acceleration.
pythonx = torch.rand(5, 3) y = torch.rand(5, 3) if torch.cuda.is_available(): x = x.to('cuda') y = y.to('cuda') print(x + y)
Output.
tensor([[0.8447, 1.4692, 0.6879], [1.3824, 0.2579, 1.4064], [0.4440, 0.8597, 0.9637], [0.5748, 1.7547, 1.5362], [0.6347, 0.9307, 1.4364]], device='cuda:0')
Here, the output shows that the tensor operations are accelerated by the GPU.
Conclusion
In this article, you learned how to run PyTorch using Docker on Vultr Cloud GPU instances powered by NVIDIA GH200. You also verified the GPU driver, pulled the PyTorch Docker image, and tested PyTorch with GPU support.