Deploy a PyTorch Workspace on a Vultr Cloud GPU Server
Introduction
PyTorch is an open-source deep learning framework for natural language processing and computer vision applications. It offers ease of use and flexibility, allowing for fast and seamless integration of deep learning models into a wide range of applications.
Deploying a PyTorch workspace on Vultr enables you to leverage the power of the Cloud GPU servers that feature the NVIDIA A100 and the A40 GPUs to perform resource-hungry tasks using the torch
module. Combining JupyterLab and the PyTorch container image provides an efficient remote development environment, allowing you to work with others on a machine-learning project.
This article demonstrates the steps to inherit the PyTorch container image and install JupyterLab, creating a new container image. It also walks you through the deployment using Docker and Docker Compose on Vultr Cloud GPU servers using the NVIDIA Docker Toolkit.
Prerequisites
Before you begin, you should:
- Deploy a Cloud GPU server with the NVIDIA NGC marketplace application.
- Point a subdomain to the server using an
A
record. This article uses pytorch.example.com for demonstration.
Verify the GPU Availability
The Vultr Cloud GPU servers feature NVIDIA GPUs for machine learning, artificial intelligence, and so on. They come with licensed NVIDIA drivers and the CUDA Toolkit, which are essential for the proper functioning of the GPUs. This section provides an overview of the PyTorch container image and demonstrates the steps to verify the GPU availability on the server and inside a container.
Execute the nvidia-smi
command on the server.
# nvidia-smi
The above command outputs the information about the connected GPU. It includes information such as the driver version, CUDA version, GPU model, available memory, GPU usage, and so on.
Run a temporary container using the pytorch/pytorch:latest
image.
# docker run --rm -it --gpus all pytorch/pytorch:latest
The above command uses the PyTorch container image to verify the GPU access inside a container. The NVIDIA Docker Toolkit enables you to use the GPU inside the containers using the --gpus
option. The -it
option provides access to an interactive terminal session of the container. The --rm
option removes the container from the disk once the container ends.
Enter the Python console.
root@f28fee5c54e5:/workspace# python
The above command enters a new Python console inside the container.
Test the GPU availability using the torch
module.
Python 3.10.8 (main, Nov 4 2022, 13:48:29) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
The above commands import the torch
module and use the torch.cuda.is_available()
function to verify if you can use the CUDA features with the PyTorch module. If the output is False
make sure you used the --gpus all
option while creating the container.
Quit the Python console with the quit()
command and exit the container terminal using the exit
command.
Build the Docker Image
The PyTorch container image consists of the runtime and the dependencies to use the PyTorch module. This section demonstrates the steps to create a configuration for the JupyterLab server and create a new container image that combines PyTorch and JupyterLab.
The NVIDIA NGC marketplace application installs the JupyterLab server alongside the NVIDIA Docker Toolkit on the server. You can use the pre-installed JupyterLab server to generate a configuration file to set up default options in the container image.
Create a new file named config.py
.
# nano config.py
Add the following contents to the file.
c.ServerApp.ip = '0.0.0.0'
c.ServerApp.allow_root = True
c.ServerApp.allow_remote_access = True
c.ServerApp.password = ''
The above configuration instructs the JupyterLab server to allow remote connection and listen on the 0.0.0.0
IP address so that you can access the workspace from the public IP address. You can add any other option in this file to pre-configure the container image.
Create the password hash using the passwd
function.
# python3 -c "from jupyter_server.auth import passwd; print(passwd('YOUR_PASSWORD'))"
The above command uses the pre-installed Jupyter module and the passwd()
function to create a new password hash to protect the workspace. Replace the c.ServerApp.password
value in the config.py
file with the output.
Create a new file named Dockerfile
.
# nano Dockerfile
The DockerFile
declares the steps to build the container image.
Add the following contents to the file.
FROM pytorch/pytorch:latest
RUN pip install jupyterlab
RUN pip install -U ipywidgets ipykernel
COPY config.py /root/.jupyter/jupyter_lab_config.py
EXPOSE 8888
CMD ["bash", "-c", "jupyter lab"]
The above instructions inherit the official PyTorch container image as the base image. It installs the JupyterLab library using pip
, copies the configuration file you created in the previous steps, exposes port 8888
and uses the bash -c jupyter lab
command to spawn the JupyterLab server.
Build the Docker image.
# docker build -t pytorch-jupyter .
The above command builds a new container image named pytorch-jupyter
. You can also push this container image to your DockerHub account in a private repository so that it is ready to use for deploying a temporary PyTorch workspace for performing resource-hungry tasks.
Deploy the Workspace using Docker
You created the pytorch-jupyter
image in the previous section combining JupyterLab and the PyTorch container image. This section demonstrates the steps to deploy the workspace using Docker for temporary tasks or testing the functionality.
Disable the firewall.
# ufw disable
The above command disables the firewall to allow incoming connections on port 8888
.
Run a temporary container using the pytorch-jupyter
image.
# docker run --rm -it --gpus all -p 8888:8888 pytorch-jupyter
The above command creates a new container using the defined image. The --gpus all
option provides access to all the GPUs connected to the host machine inside the container. The -it
option provides access to an interactive session of the container. The --rm
option removes the container from the disk once the container ends.
You can confirm the deployment by opening http://PUBLIC_IP:8888
in your web browser. To log in to the JupyterLab interface, use the password you used for creating the password hash in the previous sections. You can use the torch.cuda.is_available()
function in a new notebook to verify the GPU availability.
Exit the container using Ctrl + C.
Deploy the Workspace using Docker Compose
Deploying the PyTorch Workspace on a Vultr Cloud GPU server provides more than just access to high-end GPUs. The JupyterLab interface allows you to work with others on a machine-learning project, offering more flexibility and scalability than a local setup. It also allows you to access and manage your machine learning resources from anywhere with an internet connection. This section demonstrates the steps to deploy a persistent PyTorch workspace on a Vultr Cloud GPU server using Docker Compose.
Create and enter a new directory named pytorch-environment
.
# mkdir ~/pytorch-environment
# cd ~/pytorch-environment
The above commands create and enter a new directory named pytorch-environment
in the /root
directory. You use this directory to store all the configuration files related to the PyTorch Workspace, such as the Nginx configuration, SSL certificate, and so on.
Create a new file named docker-compose.yaml
.
# nano docker-compose.yaml
The docker-compose.yaml
file allows you to run multi-container Docker applications using the docker-compose
command.
Add the following contents to the file.
services:
jupyter:
image: pytorch-jupyter
restart: unless-stopped
volumes:
- "/root/workspace:/workspace"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
nginx:
image: nginx
restart: unless-stopped
ports:
- 80:80
- 443:443
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./certbot/conf:/etc/letsencrypt
- ./certbot/www:/var/www/certbot
certbot:
image: certbot/certbot
container_name: certbot
volumes:
- ./certbot/conf:/etc/letsencrypt
- ./certbot/www:/var/www/certbot
command: certonly --webroot -w /var/www/certbot --force-renewal --email YOUR_EMAIL -d pytorch.example.com --agree-tos
The above configuration defines three services. The jupyter
service runs the container that contains the GPU-accelerated PyTorch workspace, and it uses the volumes attribute to store all the workspace files in the /root/workspace
directory. The nginx
service runs a container using the official Nginx image that acts as a reverse proxy server between clients and the jupyter
service. The certbot
service runs a container using the official Certbot image that issues a Let's Encrypt SSL certificate for the specified domain name. Replace YOUR_EMAIL
with your email address.
Create a new directory named nginx
.
# mkdir nginx
Create a new file named nginx/nginx.conf
inside the directory.
# nano nginx/nginx.conf
Add the following contents to the file.
http {
server_tokens off;
charset utf-8;
server {
listen 80 default_server;
server_name _;
location ~ /.well-known/acme-challenge/ {
root /var/www/certbot;
}
}
}
The above configuration instructs the Nginx server to serve the ACME challenge generated by Certbot. You must perform this step for the Certbot container to verify the ownership of the domain name and issue an SSL certificate for it. You swap this configuration in the later steps to set up the reverse proxy server.
Start the PyTorch workspace.
# docker-compose up -d
The above command starts the services defined in the docker-compose.yaml
file in detached mode. This means that the services will start in the background, and you can use your terminal for other commands.
Verify the SSL issuance.
# ls certbot/conf/live/pytorch.example.com
The above command outputs the list of contents inside the directory created by Certbot for your domain name. The output should contain the fullchain.pem
and the privkey.pem
files. It may take up to five minutes to generate the SSL certificate. If it takes longer than that, you can troubleshoot by viewing the logs using the docker-compose logs certbot
command.
Update the nginx.conf
file.
# nano nginx/nginx.conf
Add the following contents to the file.
events {}
http {
server_tokens off;
charset utf-8;
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80 default_server;
server_name _;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name pytorch.example.com;
ssl_certificate /etc/letsencrypt/live/pytorch.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/pytorch.example.com/privkey.pem;
location / {
proxy_pass http://jupyter:8888;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
location ~ /.well-known/acme-challenge/ {
root /var/www/certbot;
}
}
}
The above configuration uses the SSL certificate generated by Certbot. It configures a reverse proxy server that channels the incoming traffic to the container on port 8888
. It also defines a location block to serve ACME challenge files for SSL renewals using Cron.
Restart the Nginx service.
# docker-compose restart nginx
The above command restarts the Nginx container to enable the updated configuration. You can confirm the deployment of the workspace by opening https://pytorch.example.com
in your web browser.
Set Up Automatic SSL Renewal
Cron is a built-in job scheduler in the Linux operating system to run the specified commands at a scheduled time. Refer to How to Use the Cron Task Scheduler to learn more.
Edit the Cron table.
# crontab -e
The above command opens the Cron table editor.
Add the following entries to the table.
0 5 1 */2 * /usr/local/bin/docker-compose start -f /root/pytorch-environment/docker-compose.yaml certbot
5 5 1 */2 * /usr/local/bin/docker-compose restart -f /root/pytorch-environment/docker-compose.yaml nginx
The above statements define two tasks that start the Certbot container to regenerate the SSL certificate and restart the Nginx container to reload the configuration using the latest SSL certificate.
To exit the editor, press Esc, type !wq
, and press Enter
Configure the Firewall Rules
Add the firewall rules.
# ufw allow 22
# ufw allow 80
# ufw allow 443
The above commands enable the firewall and allow the incoming connection on port 80
for HTTP traffic, 443
for HTTPS traffic, and 22
for SSH connections.
Enable the firewall.
# ufw enable
Conclusion
This article demonstrated the steps to inherit the PyTorch container image and install JupyterLab to create a new container image. It also walks you through the deployment using Docker and Docker Compose on Vultr Cloud GPU servers. You can also push the container image you built to your DockerHub account for deploying temporary PyTorch workspaces in the future.