How to Provision Vultr Serverless Inference

Updated on February 7, 2025

Vultr Serverless Inference is an efficient AI model hosting service that provides seamless scalability and reduced operational complexity for Generative AI applications. With reliable performance across six continents, Vultr ensures minimal latency for AI models while meeting stringent security and data compliance requirements.

Follow this guide to provision a Vultr Serverless Inference instance using the Vultr Customer Portal, API, or CLI.

  • Vultr Customer Portal
  • Vultr API
  • Vultr CLI
  1. Navigate to Products, click Serverless, and then click Inference.

    Serverless Inference option in products menu

  2. Click Add Serverless Inference.

    Add serverless inference button

  3. Provide a Label, acknowledge the charges note and click Add Serverless Inference.

    Button for serverless inference creation

  1. Send a POST request to the Create Inference endpoint to create a Serverless Inference service.

    console
    $ curl "https://api.vultr.com/v2/inference" \
        -X POST \
        -H "Authorization: Bearer ${VULTR_API_KEY}" \
        -H "Content-Type: application/json" \
        --data '{
            "label" : "example-inference"
        }'
    
  2. Send a GET request to the List Inference endpoint to list all the available Serverless Inference services.

    console
    $ curl "https://api.vultr.com/v2/inference" \
        -X GET \
        -H "Authorization: Bearer ${VULTR_API_KEY}"
    
  1. Create a Serverless Inference service.

    console
    $ vultr-cli inference create --label example-service
    
  2. List all the available Serverless Inference services available.

    console
    $ vultr-cli inference list