Deploy a Machine Learning Model to Production using TorchServe
Introduction
TorchServe is an open-source tool for deploying PyTorch models on production. It allows you to deploy PyTorch models as RESTful services with minimal configuration. It supports multi-model serving, model versioning, and monitoring. It allows you to focus on model development and training without worrying about the underlying infrastructure and deployment.
Deploying PyTorch models using TorchServe on Vultr Cloud GPU servers allows for efficient and detached serving of models, utilizing the high-performance capabilities of the underlying hardware. This approach enables the ability to scale the models dynamically as traffic and usage increase. Additionally, deploying the models on a remote server allows you to access and use them from any location and device connected to the internet without needing local computational resources.
This article demonstrates the steps to package a PyTorch model into a model archive file, deploy the model achieve files using TorchServe, run inference using the REST API and manage the models using the management API.
Prerequisites
Before you begin, you should:
- Deploy a Cloud GPU server with the NVIDIA NGC marketplace application.
- Point 2 subdomains to the server using an
A
record. This article uses inference.torchserve.example.com and management.torchserve.example.com for demonstration.
Understanding TorchServe APIs
TorchServe features 3 different APIs, each designed to offer specific functionality.
- Management API
- Inference API
- Metrics API
Management API allows you to manage and organize your models, such as registering, unregistering, and listing models. It also enables you to specify configurations for the models, such as the number of instances and the batch size. It listens on port 8080
. Refer to Management API to learn more.
Inference API provides an interface for making predictions and inferences using your models. It allows you to send input data and receive output predictions in a standard format. It listens on port 8081
. Refer to Inference API to learn more.
Metrics API allows you to monitor the performance of your models. It provides real-time metrics such as request rate, response time, and error rate. It listens on port 8082
. Refer to TorchServe Metrics to learn more.
Create a Model Archive File
TorchServe uses the model archive files to load and serve a PyTorch model. A Model Archive (MAR) file is a format used to package a model. Creating a model archive file is a convenient way to store and distribute models, as it includes the dependencies, architecture and pre-trained weights in a single file.
Before you create a model archive file, you need to export your model into a serialized file.
>>> torch.save(model, 'path/to/model.pth')
The .pt
or .pth
file is the serialized version of a PyTorch model that consists of the weights, parameters and architecture. You can use this file to load the model and use it for inference or even training it further. This article does not cover the steps to train or save a model. You can refer to Demo Notebook that trains a model on a subset of the Food101 dataset and the ResNet18 pre-trained weights.
Install the torch-model-archiver
package using pip
.
# pip install torch-model-archiver
You can use the torch-model-archiver
command to create a model archive file using the following parameters.
--model-name
: Set the name of the model.--version
: Set the version of the model.--model
: The path to the file that declares the model.--serialized-file
: The path to the serialized file.--extra-files
: Any other dependencies seperated with a comma.--handler
: Choose from default handlers or the path to the file that declares custom handler logic.
The following are the default handler options that you can use for creating a model archive file.
image_classifier
text_classifier
object_detector
image_segmentation
Refer to Default Handler Documentation or Custom Handler Documentation for more information.
The following is an example of the command used for creating a model archive file.
# torch-model-archiver --model-name desserts \
--version 1.0 \
--model model.py \
--serialized-file desserts_resnet18.pth \
--handler handler.py \
--extra-files index_to_name.json
The above command creates a new file named desserts.mar
in the working directory. As TorchServe supports serving multiple models, you can store multiple model archive files in a directory, which you use in the later steps to serve the models.
You can download the desserts.mar and subset.mar files into the /root/model-store
directory to test the workflow as shown in the later steps.
Build the Container Image
The TorchServe container image available on the DockerHub may not be compatible with the hardware due to constantly changing drivers. The best approach is to build the container image using the build-image.sh
script by providing the options to match your specifications. This section demonstrates the steps to clone the GitHub repository, fetch the CUDA version and build the container image.
Clone the TorchServe GitHub repository.
# git clone https://github.com/pytorch/serve
The above command clones the TorchServe repository into the serve
directory.
Enter the docker
directory.
# cd serve/docker
The above command enters the docker
directory, which contains all the files related to the TorchServe container image.
Fetch the CUDA version.
# nvidia-smi
The above command outputs the information about the GPUs connected to the host machine. Note the CUDA version on the top right corner of the output. You use this in the next command to specify the version.
Build the container image.
# ./build-image.sh -g -cu 116
The above command uses the build-image.sh
script to build the container image named pytorch/torchserve:latest-gpu
. This process may take upto 10 to 15 minutes. It uses the -g
option to specify that you want to build a GPU-accelerated container image. It also uses the -cu 116
to specify the CUDA version by removing the .
symbol from the CUDA version fetched in the previous command. Refer to the Create TorchServe docker image section to learn more.
Deploy TorchServe using Docker
You built the TorchServe container image in the previous section using the build-image.sh
script. This section demonstrates how to deploy TorchServe using Docker, setting up all the options, and binding the model store directory using Docker volumes.
Run a temporary container using the pytorch/torchserve:latest-gpu
image.
# docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 -v /root/model-store:/model-store pytorch/torchserve:latest-gpu torchserve --model-store /model-store --models desserts=desserts.mar subset=subset.mar
The above command creates a new container using the defined image. This command edits the entry command to define the model name and path to instruct TorchServe to serve the defined models. However, pre-defining the models is not the best approach. In the next step, you run the container without editing the entry command.
The following is the explanation for each parameter used in the above command.
--rm
: Remove the container from the disk when stopped.-it
: Interactive session. Allow keyboard interrupt.--gpus all
: GPU access inside the container.-p 8080:8080
: Bind port8080
with the host machine, inference API.-p 8081:8081
: Bind port8080
with the host machine, management API.-v /root/model-store:/model-store
: Bind the/root/model-store
directory with the/model-store
directory inside the container.
Exit the container using Ctrl + C.
Run a temporary container using the pytorch/torchserve:latest-gpu
image without editing the command.
# docker run --rm -it --gpus all -p 8080:8080 -p 8081:8081 -v /root/model-store:/home/model-server/model-store pytorch/torchserve:latest-gpu
The above command creates a new container using the defined image without editing the entry command. It uses the same options as the previous command, but for binding the model store, you now use the /home/model-server/model-store
directory inside the container matching the default directory defined in the config.properties
file inside the serve/docker
directory.
Creating the container without editing the entry command will not start serving the models until you register the models using the management API, as demonstrated in the next section.
Register a New Model using the Management API
This section demonstrates the steps to register a new model on TorchServe using the management API.
Register a new model.
# curl -X POST "http://localhost:8080/models?url=desserts.mar&initial_workers=1"
# curl -X POST "http://localhost:8080/models?url=subset.mar&initial_workers=1"
The above commands send a POST request to the /models
endpoint for registering a new model. It uses the url
parameter to define the path to the model archive file. It also uses the initial_workers
parameter to set the number of workers to 1
as the default value is 0
.
Fetch the list of models.
# curl http://localhost:8080/models
The above command requests the list of available models from TorchServe.
Fetch individual model details.
# curl http://localhost:8080/models/desserts
# curl http://localhost:8080/models/subset_resnet18
The above commands send a GET request to fetch the details of the individual models. The output contains the metadata and the list of workers running for the specified model.
Set the number of minimum workers.
# curl -X PUT "http://localhost:8080/models/desserts?min_workers=3"
The above command sends a PUT request to the specified model to set the number of minimum workers to 3. The minimum worker values define the number of workers that are always up. TorchServe spawns a new worker if an existing worker crashes due to unexpected behavior.
Fetch Model Predictions using the Inference API
This section demonstrates the steps to run inference in a model served by TorchServe using the inference API.
Run inference on the model.
# curl -T dessert-example.jpg http://localhost:8081/predictions/desserts
# curl -T subset-example.jpg http://localhost:8081/predictions/subset_resnet18
The above commands send a GET request with the example dessert and subset images to the demo models to fetch the predictions.
Deploy TorchServe using Docker Compose
Deploying TorchServe using Docker Compose allows you to serve the PyTorch models using TorchServe, protected with a Let's Encrypt SSL certificate and run all the containers in the background. This section demonstrates the steps to deploy TorchServe using Docker Compose and set up the reverse proxy server using Nginx.
Create a new file named docker-compose.yaml
.
# nano docker-compose.yaml
The docker-compose.yaml
file allows you to run multi-container Docker applications using the docker-compose
command.
Add the following contents to the file.
services:
torchserve:
image: pytorch/torchserve:latest-gpu
restart: unless-stopped
volumes:
- "/root/model-store:/home/model-server/model-store"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
nginx:
image: nginx
restart: unless-stopped
ports:
- 80:80
- 443:443
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./certbot/conf:/etc/letsencrypt
- ./certbot/www:/var/www/certbot
certbot:
image: certbot/certbot
container_name: certbot
volumes:
- ./certbot/conf:/etc/letsencrypt
- ./certbot/www:/var/www/certbot
command: certonly --webroot -w /var/www/certbot --force-renewal --email YOUR_EMAIL -d inference.torchserve.example.com -d management.torchserve.example.com --agree-tos
The above configuration defines three services. The torchserve
service runs the GPU-accelerated TorchServe container, and it uses the volumes attribute to pass the model archive files in the /root/model-store
directory. The nginx
service runs a container using the official Nginx image that acts as a reverse proxy server between clients and the torchserve
service. The certbot
service runs a container using the official Certbot image that issues a Let's Encrypt SSL certificate for the specified domain name. Replace YOUR_EMAIL
with your email address.
Create a new directory named nginx
.
# mkdir nginx
Create a new file named nginx/nginx.conf
inside the directory.
# nano nginx/nginx.conf
Add the following contents to the file.
http {
server_tokens off;
charset utf-8;
server {
listen 80 default_server;
server_name _;
location ~ /.well-known/acme-challenge/ {
root /var/www/certbot;
}
}
}
The above configuration instructs the Nginx server to serve the ACME challenge generated by Certbot. You must perform this step for the Certbot container to verify the ownership of the subdomains and issue an SSL certificate for them. You swap this configuration in the later steps to set up the reverse proxy server.
Start the services.
# docker-compose up -d
The above command starts the services defined in the docker-compose.yaml
file in detached mode. This means that the services will start in the background, and you can use your terminal for other commands.
Verify the SSL issuance.
# ls certbot/conf/live/inference.torchserve.example.com
The above command outputs the list of contents inside the directory created by Certbot for your domain name. The output should contain the fullchain.pem
and the privkey.pem
files. It may take up to five minutes to generate the SSL certificate. If it takes longer than that, you can troubleshoot by viewing the logs using the docker-compose logs certbot
command.
Update the nginx.conf
file.
# nano nginx/nginx.conf
Add the following contents to the file.
events {}
http {
server_tokens off;
charset utf-8;
server {
listen 80 default_server;
server_name _;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name inference.torchserve.example.com;
ssl_certificate /etc/letsencrypt/live/inference.torchserve.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/inference.torchserve.example.com/privkey.pem;
location / {
proxy_pass http://torchserve:8080;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
location ~ /.well-known/acme-challenge/ {
root /var/www/certbot;
}
}
server {
listen 443 ssl http2;
server_name management.torchserve.example.com;
ssl_certificate /etc/letsencrypt/live/inference.torchserve.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/inference.torchserve.example.com/privkey.pem;
location / {
proxy_pass http://torchserve:8081;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
location ~ /.well-known/acme-challenge/ {
root /var/www/certbot;
}
}
}
The above configuration declares 2 server blocks. The first server block listens to the incoming traffic on inference.torchserve.example.com and channels it to the TorchServe container on port 8080
. The second server block listens to the incoming traffic on management.torchserve.example.com and channels it to the TorchServe container on port 8081
. Both the server blocks use the SSL certificate generated by Certbot and contain a location block to serve ACME challenge files for SSL renewals using Cron.
Restart the Nginx service.
# docker-compose restart nginx
The above command restarts the Nginx container to enable the updated configuration. You can confirm the deployment by opening https://inference.torchserve.example.com/ping
in your web browser. After confirming the deployment, you can register the models using the management demonstrated in the previous sections. You can additionally restrict access to the management API to a specific IP address using the allow
and deny
keywords in the Nginx configuration. Refer to the Module ngx_http_access_module for more information.
Set Up Automatic SSL Renewal
Cron is a built-in job scheduler in the Linux operating system to run the specified commands at a scheduled time. Refer to How to Use the Cron Task Scheduler to learn more.
Edit the Cron table.
# crontab -e
The above command opens the Cron table editor.
Add the following entries to the table.
0 5 1 */2 * /usr/local/bin/docker-compose start -f /root/torchserve/docker-compose.yaml certbot
5 5 1 */2 * /usr/local/bin/docker-compose restart -f /root/torchserve/docker-compose.yaml nginx
The above statements define two tasks that start the Certbot container to regenerate the SSL certificate and restart the Nginx container to reload the configuration using the latest SSL certificate.
To exit the editor, press Esc, type !wq
, and press Enter
Configure the Firewall Rules
Add the firewall rules.
# ufw allow 22
# ufw allow 80
# ufw allow 443
The above commands enable the firewall and allow the incoming connection on port 80
for HTTP traffic, 443
for HTTPS traffic, and 22
for SSH connections.
Enable the firewall.
# ufw enable
Conclusion
This article demonstrated the steps to package a PyTorch model into a model archive file, deploy the model achieve files using TorchServe, run inference using the REST API and manage the models using the management API. You can also refer to the Performance Guide to optimize the PyTorch models for efficiently serving them using TorchServe.