How to Monitor Vultr Serverless Inference

Updated on November 27, 2024

Monitoring the Vultr Serverless Inference service is essential for maintaining the performance and cost-efficiency of your AI deployments. By tracking the usage of various AI workloads, such as "Prompt, Chat, & Embeddings" and "Text-to-Speech," you can gain valuable insights into resource consumption, optimize performance, and prevent potential bottlenecks. This proactive monitoring ensures that your AI applications run smoothly, delivering consistent and reliable results while keeping operational costs under control.

Follow this guide to monitor the usage of Serverless Inference on your Vultr account using the Vultr Customer Portal, API, or CLI.

  • Vultr Customer Portal
  • Vultr API
  • Vultr CLI
  1. Navigate to Products, click Serverless, and then click Inference.

    Serverless Inference option in products menu

  2. Click your target inference service to open its management page.

    Selection of a target serverless inference service

  3. Open the Usage page.

    Button to open the stats window

  4. View the usage statistics for all inference endpoints.

    View the usage stats for inference service

  1. Send a GET request to the List Inference endpoint and note the target inference service's ID.

    console
    $ curl "https://api.vultr.com/v2/inference" \
        -X GET \
        -H "Authorization: Bearer ${VULTR_API_KEY}"
    
  2. Send a GET request to the Inference Usage endpoint.

    console
    $ curl "https://api.vultr.com/v2/inference/<inference-id>/usage" \
        -X GET \
        -H "Authorization: Bearer ${VULTR_API_KEY}"
    
  1. List all the inference services available and note the target inference service's ID.

    console
    $ vultr-cli inference list
    
  2. Get the target inference service's usage.

    console
    $ vultr-cli inference usage get <inference-id>