How to Deploy Services with dstack on Vultr

Updated on January 31, 2025
How to Deploy Services with dstack on Vultr header image

dstack services simplify the deployment of long-running applications like APIs, ML models, and web apps. They ensure scalability, reliability, and efficient resource management across cloud and on-premise environments. With easy YAML-based configuration, developers can deploy, scale, and update services seamlessly without complex infrastructure overhead.

In this guide, you'll deploy the Deepseek R1 LLM using dstack service on L40S Vultr Cloud GPUs. This setup combines dstack’s streamlined AI deployment with SGLang’s optimized inference for efficient, high-performance model serving. By leveraging Vultr’s L40S GPUs, you’ll achieve fast, scalable inference with minimal setup.

Create a Virtual Environment

In this section, you are going to create virtual environment on your machine and prepare the environment for the dstack dev environment deployment.

  1. Install the venv package.

    console
    $ apt install python3-venv -y
    
  2. Create a virtual environment.

    console
    $ python3 -m venv dstack-env
    
  3. Activate the virtual environment.

    console
    $ source dstack-env/bin/activate
    

Install dstack

In this section, you are going to install all the necessary dependencies for dstack and activate the dstack server for the dev environment deployment in the later section.

  1. Create a directory and navigate into it to store the backend file.

    console
    $ mkdir -p ~/.dstack/server
    
  2. Create a backend yml file to declare Vultr as the provider.

    console
    $ nano ~/.dstack/server/config.yml
    
  3. Copy and paste the below configuration.

    YAML
    projects:
    - name: main
        backends:
        - type: vultr
            creds:
            type: api_key
            api_key: <vultr-account-api-key>
    

    Retrieve your Vultr API Key.

    Save and close the file.

  4. Install dstack.

    console
    $ pip install "dstack[all]" -U
    
  5. Activate the dstack server.

    console
    $ dstack server
    

    Note down the URL on which the dstack server is running and token provided in the output.

  6. Point the CLI to the dstack server.

    console
    $ dstack config --url <URL> \
        --project main \
        --token <TOKEN>
    

Deploy dstack SGLang Service

  1. Continue in the dstack-env virtual environment, and create a directory and navigate into it.

    console
    $ mkdir quickstart && cd quickstart
    
  2. Initialize the directory.

    console
    $ dstack init
    
  3. Create a YAML file to define the dstack dev environment configuration.

    console
    $ nano .dstack.yaml
    
  4. Copy and paste the below configuration.

    YAML
    type: service
    name: deepseek-r1-nvidia
    
    image: lmsysorg/sglang:latest
    env:
    - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
    commands:
        - python3 -m sglang.launch_server
        --model-path $MODEL_ID
        --port 8000
        --trust-remote-code
    
    port: 8000
    
    model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
    
    resources:
        gpu: l40s:48GB
    

    Save and close the file.

    The above YAML file defines a dstack service to deploy the Deepseek R1 Distill LLaMA 8B model using sglang for inference. It pulls the lmsysorg/sglang:latest image and sets the MODEL_ID environment variable to specify the model. The command runs sglang.launch_server on port 8000 with --trust-remote-code for execution. The service requests a 48GB NVIDIA L40S GPU for efficient inference. This setup enables seamless, GPU-accelerated model serving on dstack. Learn more about defining GPU resources in dstack services.

  5. Apply the configuration.

    console
    $ dstack apply -f .dstack.yaml
    

    The configuration may take up to 10 minutes to execute. You can monitor its progress by visiting the Vultr Customer Portal and checking the status of the server provisioning, which is automatically initiated when you apply the configuration.

  6. Navigate to http://<URL>/projects/main/models in any web browser to interact with the model.

    dstack documentation

Conclusion

In this guide, you successfully deployed a dstack service on L40S Vultr Cloud GPUs. You also set up a virtual environment, installed dstack, and configured Vultr as your cloud provider, ensuring a smooth and scalable deployment process. With this setup, you can now efficiently serve large language models on the cloud, enabling fast, reliable AI applications.

For more information, refer to the following documentation: