These are the frequently asked questions for Vultr Serverless Inference.
Currently, Vultr Serverless Inference is optimized for running inference workloads on large language models like Mixtral 8x7B, Mistral 7B, and Meta Llama 2 70B. Support for other types of models may be added in the future, but for now, the focus is on language model inference.
You can monitor your usage and costs by navigating to the "Details" tab of your Vultr Serverless Inference subscription in the Vultr Customer Portal. Here, you will find information on the number of tokens used, remaining tokens, and associated costs. Additionally, you can view your API key and other subscription details.
Yes, you can integrate Vultr Serverless Inference with your existing machine learning pipeline. By swapping out the OpenAI base API URL with the Vultr Serverless Inference API URL and using your Vultr API key, you can easily incorporate Vultr Serverless Inference into your workflow.
If you exceed the 5 million tokens included in your Vultr Serverless Inference subscription, additional tokens will be billed at $0.0002 per 1,000 tokens. You can continue running workloads without interruption, and the overage charges will be added to your monthly bill.
You can regenerate your Vultr Serverless Inference API key from the subscription details page in the Vultr Customer Portal. This will invalidate the previous API key and generate a new one for enhanced security.
The quality of the outputs from Vultr Serverless Inference depends on the machine learning model you are using. If the outputs are not meeting your expectations, consider trying a different model or refining your prompts. Vultr provides the infrastructure, but the model's performance is a key factor in the output quality.
Yes, you can test inference workloads by using the "Prompt" tab in the Vultr Serverless Inference section of the Customer Portal. This allows you to input prompts, select a model, and adjust settings such as max tokens and temperature to see how the model responds before running larger workloads.
Vultr takes data security seriously. All data transmitted to and from Vultr Serverless Inference is encrypted, and the service is designed with security best practices to ensure that your data and workloads are protected.