How to Deploy Services with dstack on Vultr

dstack services simplify the deployment of long-running applications like APIs, ML models, and web apps. They ensure scalability, reliability, and efficient resource management across cloud and on-premise environments. With easy YAML-based configuration, developers can deploy, scale, and update services seamlessly without complex infrastructure overhead.

In this guide, you'll deploy the Deepseek R1 LLM using dstack service on L40S Vultr Cloud GPUs. This setup combines dstack’s streamlined AI deployment with SGLang’s optimized inference for efficient, high-performance model serving. By leveraging Vultr’s L40S GPUs, you’ll achieve fast, scalable inference with minimal setup.

Create a Virtual Environment

In this section, you are going to create virtual environment on your machine and prepare the environment for the dstack dev environment deployment.

Install the venv package.
console
```
$ apt install python3-venv -y
```
Create a virtual environment.
console
```
$ python3 -m venv dstack-env
```
Activate the virtual environment.
console
```
$ source dstack-env/bin/activate
```

Install dstack

In this section, you are going to install all the necessary dependencies for dstack and activate the dstack server for the dev environment deployment in the later section.

Create a directory and navigate into it to store the backend file.
console
```
$ mkdir -p ~/.dstack/server
```
Create a backend yml file to declare Vultr as the provider.
console
```
$ nano ~/.dstack/server/config.yml
```

Copy and paste the below configuration.

                            YAML
                            
                        
projects:
- name: main
    backends:
    - type: vultr
        creds:
        type: api_key
        api_key: <vultr-account-api-key>

Retrieve your Vultr API Key.

Save and close the file.

Install dstack.
console
```
$ pip install "dstack[all]" -U
```
Activate the dstack server.
console
```
$ dstack server
```
Note down the URL on which the dstack server is running and token provided in the output.

Point the CLI to the dstack server.

                            console
                            
                        
$ dstack config --url <URL> \
    --project main \
    --token <TOKEN>

Deploy dstack SGLang Service

Continue in the dstack-env virtual environment, and create a directory and navigate into it.
console
```
$ mkdir quickstart && cd quickstart
```
Initialize the directory.
console
```
$ dstack init
```
Create a YAML file to define the dstack dev environment configuration.
console
```
$ nano .dstack.yaml
```
Copy and paste the below configuration.
YAML
```
type: service
name: deepseek-r1-nvidia

image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
commands:
    - python3 -m sglang.launch_server
    --model-path $MODEL_ID
    --port 8000
    --trust-remote-code

port: 8000

model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

resources:
    gpu: l40s:48GB
```
Save and close the file.

The above YAML file defines a dstack service to deploy the Deepseek R1 Distill LLaMA 8B model using sglang for inference. It pulls the lmsysorg/sglang:latest image and sets the MODEL_ID environment variable to specify the model. The command runs sglang.launch_server on port 8000 with --trust-remote-code for execution. The service requests a 48GB NVIDIA L40S GPU for efficient inference. This setup enables seamless, GPU-accelerated model serving on dstack. Learn more about defining GPU resources in dstack services.
Apply the configuration.
console
```
$ dstack apply -f .dstack.yaml
```
The configuration may take up to 10 minutes to execute. You can monitor its progress by visiting the Vultr Customer Portal and checking the status of the server provisioning, which is automatically initiated when you apply the configuration.
Navigate to http://<URL>/projects/main/models in any web browser to interact with the model.

Conclusion

In this guide, you successfully deployed a dstack service on L40S Vultr Cloud GPUs. You also set up a virtual environment, installed dstack, and configured Vultr as your cloud provider, ensuring a smooth and scalable deployment process. With this setup, you can now efficiently serve large language models on the cloud, enabling fast, reliable AI applications.

How to Deploy Services with dstack on Vultr

Create a Virtual Environment

Install dstack

Deploy dstack SGLang Service

Conclusion

Comments

Products

Features

Solutions

Marketplace

Resources

Company