How to Run PyTorch on Vultr Cloud GPU Powered by NVIDIA GH200 using Docker

Introduction

The NVIDIA GH200 Grace Hopper™ Superchip architecture brings together the groundbreaking performance of the NVIDIA Hopper™ GPU with the versatility of the NVIDIA Grace™ CPU in a single superchip, connected with the high-bandwidth, memory-coherent NVIDIA® NVLink® Chip-2-Chip (C2C) interconnect.

This article demonstrates the step-by-step process needed to run sample PyTorch code inside a Docker container on Vultr Cloud GPU instances powered by NVIDIA GH200.

Prerequisites

Before you begin, you must:

Have access to a Vultr Cloud GPU instance powered by NVIDIA GH200

Verify the GPU Driver

Login to your Vultr Cloud GPU instance via SSH.
Run the following command to verify the GPU driver.
console
```
$ nvidia-smi
```

PyTorch with Docker

Pull the PyTorch Docker image from NVIDIA NGC Catalog.
console
```
$ docker pull nvcr.io/nvidia/pytorch:24.02-py3
```
Start a new Docker container with the PyTorch image.
console
```
$ docker run \
  --gpus all \
  -it \
  --rm \
  -e NVIDIA_DRIVER_CAPABILITIES=all \
  -e NVIDIA_VISIBLE_DEVICES=all \
  -e __NV_PRIME_RENDER_OFFLOAD=1 \
  -e __GLX_VENDOR_LIBRARY_NAME=nvidia \
  nvcr.io/nvidia/pytorch:24.02-py3
```
Here, the --gpus all flag is used to enable GPU support in the Docker container.

Some of the environment variables used in the command are:
- NVIDIA_DRIVER_CAPABILITIES: Specifies the driver capabilities.
- NVIDIA_VISIBLE_DEVICES: Specifies the visible devices.
- __NV_PRIME_RENDER_OFFLOAD: Specifies the prime render offload.
- __GLX_VENDOR_LIBRARY_NAME: Specifies the GLX vendor library name.
By running the above command, you will be inside the Docker container with the PyTorch image.
Once inside the Docker container, enter the Python console.
console
```
# python
```

Fetch the PyTorch version and CUDA version.

                            python
                            
                        
import torch
print("PyTorch Version:", torch.__version__)
print("CUDA Version:", torch.version.cuda)

Output.

PyTorch Version: 2.3.0a0+ebedce2
CUDA Version: 12.3

Here, the output shows that PyTorch version 2.3.0a0+ebedce2 is installed with CUDA version 12.3.

Test CUDA availability.
python
```
print(torch.cuda.is_available())
```
Output.
```
True
```
Here, the output shows that CUDA is available. That means PyTorch is running with GPU support.

Test Tensor operations to verify GPU acceleration.

                            python
                            
                        
x = torch.rand(5, 3)
y = torch.rand(5, 3)
if torch.cuda.is_available():
    x = x.to('cuda')
    y = y.to('cuda')
    print(x + y)

Output.

tensor([[0.8447, 1.4692, 0.6879],
    [1.3824, 0.2579, 1.4064],
    [0.4440, 0.8597, 0.9637],
    [0.5748, 1.7547, 1.5362],
    [0.6347, 0.9307, 1.4364]], device='cuda:0')

Here, the output shows that the tensor operations are accelerated by the GPU.

Conclusion

In this article, you learned how to run PyTorch using Docker on Vultr Cloud GPU instances powered by NVIDIA GH200. You also verified the GPU driver, pulled the PyTorch Docker image, and tested PyTorch with GPU support.

Tags:

Docker

NVIDIA GH200

PyTorch