Vultr Cloud Inference

Updated on April 22, 2024
Vultr Cloud Inference header image

Introduction

Vultr Cloud Inference allows you to run inference workloads for large language models such as Mixtral 8x7B, Mistral 7B, Meta Llama 2 70B, and more. Using Vultr Cloud Inference, you can run inference workloads without having to worry about the infrastructure, and you only pay for the input and output tokens.

This article demonstrates step-by-step process to start using Vultr Cloud Inference.

Create a Vultr Cloud Inference Subscription

  1. Log in to the Vultr Customer Portal.

  2. Navigate to Products > Cloud Inference.

  3. Click the Add Cloud Inference button.

    Add Vultr Cloud Inference Subscription

  4. Fill in the label field and click the Add Cloud Inference button.

    Fill in the Label Field

  5. Once the subscription is created, you will see the subscription page.

Manage Vultr Cloud Inference Subscription

  1. Log in to the Vultr Customer Portal.

  2. Navigate to Products > Cloud Inference.

  3. Click the three dots on the right side of the subscription and select Manage.

  4. You can see the subscription details and manage the subscription.

    Vultr Cloud Inference Manage Subscription

Details Tab

On the details tab, you can see the subscription details such as the API key, the number of tokens used, and the number of tokens remaining.

Vultr Cloud Inference Details Tab

Here you can find:

  1. General Information: Find the subscription label, created date, and monthly cost and included tokens.
  2. API Key: Find the API key for the subscription.
  3. Usage Details: Find the number of tokens used and the number of tokens remaining.

Prompt Tab

On the prompt tab, you can enter a prompt, select the machine learning model, and run test inference workloads.

Vultr Cloud Inference Prompt Tab

When you populate the prompt field, you can set:

  1. Model: Select the machine learning model.
  2. Max Tokens: Set the maximum number of tokens to generate.
  3. Seed: Set the random seed for reproducibility.
  4. Temperature: Set the sampling temperature for generation. This controls the randomness of the generation.
  5. Top-K: Set the top-k value for generation. This controls the vocabulary size considered during generation.
  6. Top-P: Set the top-p value for generation. This helps in controlling the diversity of the generation.

Chat Tab

On the chat tab, you can select the machine learning model, and run chat completions with conversation history.

Vultr Cloud Inference Chat Tab

Here you can:

  1. Model: Select the machine learning model.
  2. Max Tokens: Set the maximum number of tokens to generate.

Frequently Asked Questions (FAQs)

Frequency asked questions about Vultr Cloud Inference.

What is Vultr Cloud Inference?

Vultr Cloud Inference is a service that allows you to run inference workloads for large language models such as Mixtral 8x7B, Mistral 7B, Meta Llama 2 70B, and more. Using Vultr Cloud Inference, you can run inference workloads without having to worry about the infrastructure, and you only pay for the input and output tokens.

Is Vultr Cloud Inference compatible with OpenAI?

Yes, Vultr Cloud Inference is compatible with OpenAI libraries. You can swap out the OpenAI base API URL with the Vultr Cloud Inference API URL and use the Vultr Cloud Inference API key to run inference workloads. Beware that we currently only support Chat Completions.

What is the pricing model for Vultr Cloud Inference?

Vultr Cloud Inference subscriptions have a fixed monthly cost of $10 that includes 5 million tokens. Additional tokens are billed at $0.0002 per 1000 tokens. Beware that the pricing model may change in the future. For latest pricing, refer to the Vultr Cloud Inference pricing page.

How does Vultr API and Vultr Cloud Inference API differ from each other?

Vultr API is used to manage the infrastructure such as compute instances, block storage, networking, and other cloud resources. Vultr Cloud Inference API is used to run inference workloads for machine learning models. When you create a subscription for Vultr Cloud Inference, you get an API key that is used to access the Vultr Cloud Inference API.

How to find all API endpoints for Vultr Cloud Inference?

You can find all the API endpoints for Vultr Cloud Inference in the Vultr Cloud Inference API documentation.

How to list the available large language models in Vultr Cloud Inference?

You can list the available language models in Vultr Cloud Inference using the List Chat Completion Models API endpoint.

How to get the API key for Vultr Cloud Inference?

You can get the API key for Vultr Cloud Inference from the subscription details page in the Vultr Customer Portal.

Vultr Cloud Inference API Key

How to regenerate the API key for Vultr Cloud Inference?

You can regenerate the API key for Vultr Cloud Inference from the subscription details page in the Vultr Customer Portal.

Vultr Cloud Inference Regenerate API Key

Why am I not getting the high quality outputs from Vultr Cloud Inference?

Vultr does not create the machine learning models. We only provide the infrastructure to run inference workloads. The quality of the outputs depends on the machine learning model you are using. If you are not getting high quality outputs, you may want to try a different machine learning model or enhance the system/user prompts.

How to delete the Vultr Cloud Inference subscription?

You can delete the Vultr Cloud Inference subscription from the subscription details page in the Vultr Customer Portal.

Vultr Cloud Inference Delete Subscription

Learn More