Serverless Inference supports model versioning with containerized deployments, enabling parallel version operation, A/B testing, and non-disruptive updates.
Vultr Serverless Inference manages models as containerized deployments, each tagged with a specific version. This enables operators to run multiple versions in parallel, perform controlled A/B tests, and deploy updates without affecting live workloads. Rollbacks are handled at the container level and traffic can be switched back to a previous version instantly through the Vultr API or Customer Portal, without requiring manual infrastructure changes. By decoupling model versions from the underlying infrastructure, continuous availability, predictable performance, and precise control over model lifecycle management are ensured.