How to Set up a TensorFlow Workspace on a Vultr Cloud GPU Instance
Introduction
TensorFlow is a popular open-source machine learning platform that helps users implement deep learning and machine learning models to solve common business problems. TensorFlow offers an ecosystem for developers and enterprises to build scalable machine learning applications. For example, it's used to train neural networks referred to as stateful dataflow graphs where each graph node represents neural network operations or multi-dimensional arrays.
This article demonstrates the steps to deploy a temporary or a persistent TensorFlow workspace using the official Docker image and the NVIDIA Docker Toolkit.
Prerequisites
Before you begin, you should:
- Deploy a Vultr Cloud GPU instance with the NVIDIA NGC Marketplace Application.
- Point a subdomain to your server using an
A
record. This article uses tensorflow.example.com for demonstration.
Verify the GPU Availability
The Vultr Cloud GPU servers feature NVIDIA GPUs for machine learning, artificial intelligence, and so on. They come with licensed NVIDIA drivers and the CUDA Toolkit, which are essential for the proper functioning of the GPUs. This section demonstrates the steps to verify the GPU availability on the server and inside a container.
Execute the nvidia-smi
command on the server.
# nvidia-smi
The above command outputs the information about the connected GPU. It includes information such as the driver version, CUDA version, GPU model, available memory, GPU usage, and so on.
Execute the nvidia-smi
command inside a container.
# docker run --rm --gpus all nvidia/cuda:10.2-base nvidia-smi
The above command uses the official nvidia/cuda
image to verify the GPU access inside a container. The NVIDIA Docker Toolkit enables you to use the GPU inside the containers using the --gpus
option. The --rm
option removes the container from the disk once the container ends.
Deploy a Temporary Workspace
The Vultr Cloud GPU servers offer access to high-end GPUs that you can leverage for training your machine learning models, saving a lot of time without paying the upfront cost of the hardware. This section explains the steps to deploy a temporary TensorFlow workspace on a Vultr Cloud GPU server.
Disable the firewall.
# ufw disable
The above command disables the firewall to allow inbound connections on all ports.
Deploy a new Docker container.
# docker run -p 8888:8888 --gpus all -it --rm -v /root/notebooks:/tf/notebooks tensorflow/tensorflow:latest-gpu-jupyter
The above command uses the official tensorflow/tensorflow
image with the latest-gpu-jupyter
tag that contains the GPU-accelerated TensorFlow environment and the Jupyter notebook server. Copy the token from the output of this command to access the Jupyter notebook interface.
The following is the explanation for each parameter used in the above command.
-p 8888:8888
: Expose the connection on port8888
.--gpus all
: GPU access inside the container.-it
: Interactive session. Allow keyboard interrupt.--rm
: Remove the container when stopped.-v /root/notebooks:/tf/notebooks
: Store all the notebooks in the/root/notebooks
directory.
Verify the GPU availability using the TensorFlow module.
- Open
http://PUBLIC_IP:8888
in your web browser and use the copied token to log in to the interface. - In the Jupyter Notebook interface, navigate to the directory where you want to create your new notebook.
- Click the "New" button in the top right corner of the interface, and select the "Python 3" option from the dropdown menu. This will create a new Python 3 notebook in the selected directory.
- Give your notebook a name by clicking the "Untitled" title at the top of the page and typing in a new name.
- To add a new cell, click the "Insert" menu at the top of the interface and select the "Insert Cell Below" option. You can then enter your code into the cell and run it by clicking the "Run" button in the toolbar at the top of the page or by pressing Shift + Enter.
Run the following code in a new cell.
import tensorflow as tf
tf.config.list_physical_devices()
Output.
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
The output confirms that the GPU is available in the TensorFlow module. You can stop the workspace by opening the terminal window and pressing Ctrl + C. Stopping the container will not delete the notebooks. You can find them in the /root/notebooks
directory.
Deploy a Persistent Workspace
Deploying the TensorFlow Workspace on a Vultr Cloud GPU server provides more than just access to high-end GPU. The Jupyter notebook interface allows you to work with others on a machine-learning project, offering more flexibility and scalability than a local setup. It also allows you to access and manage your machine learning resources from anywhere with an internet connection. This section demonstrates the steps to deploy a persistent TensorFlow workspace on a Vultr Cloud GPU server using Docker Compose.
Create and enter a new directory named tensorflow-workspace
.
# mkdir ~/tensorflow-workspace
# cd ~/tensorflow-workspace
The above commands create and enter a new directory named tensorflow-workspace
in the /root
directory. You use this directory to store all the configuration files related to the TensorFlow Workspace, such as Nginx configuration, SSL certificate, and so on.
Create a new file named docker-compose.yaml
.
# nano docker-compose.yaml
Add the following contents to the file.
services:
jupyter:
image: tensorflow/tensorflow:latest-gpu-jupyter
restart: unless-stopped
volumes:
- "/root/notebooks:/tf/notebooks"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
nginx:
image: nginx
restart: unless-stopped
ports:
- 80:80
- 443:443
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./nginx/dhparam.pem:/etc/ssl/certs/dhparam.pem
- ./certbot/conf:/etc/letsencrypt
- ./certbot/www:/var/www/certbot
certbot:
image: certbot/certbot
container_name: certbot
volumes:
- ./certbot/conf:/etc/letsencrypt
- ./certbot/www:/var/www/certbot
command: certonly --webroot -w /var/www/certbot --force-renewal --email YOUR_EMAIL -d tensorflow.example.com --agree-tos
The above configuration defines three services. The jupyter
service runs the container that contains the GPU-accelerated TensorFlow workspace, and it uses the volumes
attribute to store all the notebooks in the /root/notebooks
directory. The nginx
service runs a container using the official Nginx image that acts as a reverse proxy server between clients and the jupyter
service. The certbot
service runs a container using the official Certbot image that issues a Let's Encrypt SSL certificate for the specified domain name. Replace YOUR_EMAIL
with your email address.
Save the file and close the file editor using Ctrl+X then Enter.
Create a new directory named nginx
.
# mkdir nginx
Create a new file named nginx.conf
.
# nano nginx/nginx.conf
Add the following contents to the file.
events {}
http {
server_tokens off;
charset utf-8;
server {
listen 80 default_server;
server_name _;
location ~ /.well-known/acme-challenge/ {
root /var/www/certbot;
}
}
}
The above configuration instructs the Nginx server to serve the ACME challenge generated by Certbot. You must perform this step for the Certbot container to verify the ownership of the domain name and issue an SSL certificate for it. You swap this configuration in the later steps to set up the reverse proxy server.
Save the file and close the file editor using Ctrl+X then Enter.
Create a new file named dhparam.pem
using the openssl
command.
# openssl dhparam -dsaparam -out nginx/dhparam.pem 4096
The above command generates a DHparam or Diffie-Hellman parameter, a key exchange algorithm to secure communications between two parties. You use this as another layer of security to protect the server from getting hacked or attacked by malicious individuals who might try to intercept or decrypt the communications between the server and the client.
Before starting the services, you must point your domain name to the server's IP address using an
A
record.
Deploy the Docker Compose services.
# docker-compose up -d
The above command starts the services defined in the docker-compose.yaml
file in detached mode. This means that the services will start in the background, and you can use your terminal for other commands.
Verify the SSL issuance.
# ls certbot/conf/live/tensorflow.example.com/
The above command outputs the list of contents inside the directory created by Certbot for your domain name. The output should contain the fullchain.pem
and the privkey.pem
files. It may take up to five minutes to generate the SSL certificate. If this command takes longer than that, you can troubleshoot by viewing the logs using the docker-compose logs certbot
command.
Stop the nginx
container.
# docker-compose stop nginx
The above command stops the nginx
container so you can swap the Nginx configuration in the next steps.
Swap the Nginx configuration.
# rm -f nginx/nginx.conf
# nano nginx.conf
Add the following contents to the file.
events {}
http {
server_tokens off;
charset utf-8;
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80 default_server;
server_name _;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name tensorflow.example.com;
ssl_certificate /etc/letsencrypt/live/tensorflow.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/tensorflow.example.com/privkey.pem;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/ssl/certs/dhparam.pem;
ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:50m;
ssl_stapling on;
ssl_stapling_verify on;
add_header Strict-Transport-Security max-age=15768000;
location / {
proxy_pass http://jupyter:8888;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
location ~ /.well-known/acme-challenge/ {
root /var/www/certbot;
}
}
}
The above configuration uses the SSL certificate generated by Certbot and additional SSL parameters to increase the security of the workspace. It configures a reverse proxy server that channels the incoming traffic to the jupyter
container on port 8888
. It also defines a location block to serve ACME challenge files for SSL renewals using Cron.
Save the file and close the file editor using Ctrl+X then Enter.
Start the nginx
container.
# docker-compose start nginx
The above command starts the nginx
container that uses the new configuration. You can confirm the deployment of the workspace by opening https://tensorflow.example.com
in your web browser.
Fetch the token from the Docker logs.
# docker-compose logs jupyter
The above command outputs the logs generated by the jupyter
container. It contains the token to access the Jupyter notebook interface.
You can also set up a password for accessing the Jupyter notebook interface by following the steps given below.
- Open the Jupyter notebook interface on your web browser using
https://tensorflow.example.com
. - Scroll down to the "Setup a Password" section.
- Enter the token fetched from the Docker Compose logs.
- Enter the password you want to use for protecting the interface. Ensure you use a strong password to protect your environment from brute-force attacks.
- Click the "Log in and set new password" button.
Append an entry to the Cron table.
# crontab -e
The above command opens the Cron table editor. cron
is a built-in job scheduler in the Linux operating system to run the specified commands at a scheduled time. Refer to How to Use the Cron Task Scheduler to learn more.
Add the following lines to the table.
0 5 1 */2 * /usr/local/bin/docker-compose start -f /root/tensorflow-workspace/docker-compose.yml certbot
5 5 1 */2 * /usr/local/bin/docker-compose restart -f /root/tensorflow-workspace/docker-compose.yml nginx
The above statements define two tasks that start the certbot
container to regenerate the SSL certificate and restart the nginx
container to reload the configuration using the latest SSL certificate.
Exit the editor using Esc then ExclamationWQ and Enter.
Enable the firewall.
# ufw allow 80,443,22 proto tcp
# ufw enable
The above commands enable the firewall and allow the incoming connection on port 80
for HTTP traffic, 443
for HTTPS traffic, and 22
for SSH connections.
Conclusion
This article demonstrated the steps to deploy a temporary or a persistent TensorFlow workspace using the official Docker image and the NVIDIA Docker Toolkit. You can deploy a temporary workspace to leverage the high-end hardware offered by Vultr to perform resource-hungry tasks like training a model or performing visualizations. You can also deploy a persistent workspace for an efficient remote development environment, as the Jupyter notebook interface allows you to work with others on a machine-learning project.