Frequently Asked Questions (FAQs) About Vultr Serverless Inference

Updated on 10 March, 2026

Frequently asked questions and answers about Vultrs products, services, and platform features.


These are the frequently asked questions for Vultr Serverless Inference.

Can I run inference workloads for models other than large language models on Vultr serverless inference?

Currently, Vultr Serverless Inference supports a range of production-ready models across multiple categories. For language workloads, available models include Mistral-7B-v0.3, DeepSeek-R1, Llama-3.1-70B-Instruct-FP8, and Qwen2.5-32B-Instruct. Chat-optimized models include deepseek-r1-distill-qwen-32b, qwen2.5-coder-32b-instruct, deepseek-r1-distill-llama-70b, gpt-oss-120b, and kimi-k2-instruct. For speech generation and text-to-speech workloads, bark, bark-small, and xtts are supported. Image generation models include flux.1-dev, stable-diffusion-3.5-large, and stable-diffusion-3.5-medium. Support for additional model types may be added in the future.

Comments