How to Run PyTorch on Vultr Cloud GPU Powered by NVIDIA GH200 using Docker

Updated on March 20, 2024

Introduction

The NVIDIA GH200 Grace Hopper™ Superchip architecture brings together the groundbreaking performance of the NVIDIA Hopper™ GPU with the versatility of the NVIDIA Grace™ CPU in a single superchip, connected with the high-bandwidth, memory-coherent NVIDIA® NVLink® Chip-2-Chip (C2C) interconnect.

This article demonstrates the step-by-step process needed to run sample PyTorch code inside a Docker container on Vultr Cloud GPU instances powered by NVIDIA GH200.

Prerequisites

Before you begin, you must:

  • Have access to a Vultr Cloud GPU instance powered by NVIDIA GH200

Verify the GPU Driver

  1. Login to your Vultr Cloud GPU instance via SSH.

  2. Run the following command to verify the GPU driver.

    console
    $ nvidia-smi
    

    Output.

    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA GH200 480GB             Off |   00000009:01:00.0 Off |                    0 |
    | N/A   29C    P0            110W /  900W |     557MiB /  97871MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+

    Here, the output shows that the NVIDIA driver version is 550.54.15 is installed with CUDA version 12.4.

PyTorch with Docker

  1. Pull the PyTorch Docker image from NVIDIA NGC Catalog.

    console
    $ docker pull nvcr.io/nvidia/pytorch:24.02-py3
    
  2. Start a new Docker container with the PyTorch image.

    console
    $ docker run \
      --gpus all \
      -it \
      --rm \
      -e NVIDIA_DRIVER_CAPABILITIES=all \
      -e NVIDIA_VISIBLE_DEVICES=all \
      -e __NV_PRIME_RENDER_OFFLOAD=1 \
      -e __GLX_VENDOR_LIBRARY_NAME=nvidia \
      nvcr.io/nvidia/pytorch:24.02-py3
    

    Here, the --gpus all flag is used to enable GPU support in the Docker container.

    Some of the environment variables used in the command are:

    • NVIDIA_DRIVER_CAPABILITIES: Specifies the driver capabilities.
    • NVIDIA_VISIBLE_DEVICES: Specifies the visible devices.
    • __NV_PRIME_RENDER_OFFLOAD: Specifies the prime render offload.
    • __GLX_VENDOR_LIBRARY_NAME: Specifies the GLX vendor library name.

    By running the above command, you will be inside the Docker container with the PyTorch image.

  3. Once inside the Docker container, enter the Python console.

    console
    # python
    
  4. Fetch the PyTorch version and CUDA version.

    python
    import torch
    print("PyTorch Version:", torch.__version__)
    print("CUDA Version:", torch.version.cuda)
    

    Output.

    PyTorch Version: 2.3.0a0+ebedce2
    CUDA Version: 12.3

    Here, the output shows that PyTorch version 2.3.0a0+ebedce2 is installed with CUDA version 12.3.

  5. Test CUDA availability.

    python
    print(torch.cuda.is_available())
    

    Output.

    True

    Here, the output shows that CUDA is available. That means PyTorch is running with GPU support.

  6. Test Tensor operations to verify GPU acceleration.

    python
    x = torch.rand(5, 3)
    y = torch.rand(5, 3)
    if torch.cuda.is_available():
        x = x.to('cuda')
        y = y.to('cuda')
        print(x + y)
    

    Output.

    tensor([[0.8447, 1.4692, 0.6879],
        [1.3824, 0.2579, 1.4064],
        [0.4440, 0.8597, 0.9637],
        [0.5748, 1.7547, 1.5362],
        [0.6347, 0.9307, 1.4364]], device='cuda:0')

    Here, the output shows that the tensor operations are accelerated by the GPU.

Conclusion

In this article, you learned how to run PyTorch using Docker on Vultr Cloud GPU instances powered by NVIDIA GH200. You also verified the GPU driver, pulled the PyTorch Docker image, and tested PyTorch with GPU support.