How to Use Vultr Cloud Inference in Python
Introduction
Vultr Cloud Inference allows you to run inference workloads for large language models such as Mixtral 8x7B, Mistral 7B, Meta Llama 2 70B, and more. Using Vultr Cloud Inference, you can run inference workloads without having to worry about the infrastructure, and you only pay for the input and output tokens.
This article demonstrates step-by-step process to start using Vultr Cloud Inference in Python.
Prerequisites
Before you begin, you must:
- Create a Vultr Cloud Inference Subscription
- Fetch the API key for Vultr Cloud Inference
- Python 3.10 or later
Set Up the Environment
Create a new project directory and navigate to the project directory.
$ mkdir vultr-cloud-inference-python
$ cd vultr-cloud-inference-python
Create a new Python virtual environment.
$ python3 -m venv venv
$ source venv/bin/activate
Install the required Python packages.
(venv) $ pip install requests
(venv) $ pip install openai
openai
package if you are using the OpenAI SDK for Vultr Cloud Inference.
Inference via Direct API Calls
Vultr Cloud Inference provides a RESTful API to run inference workloads. You can use the requests
package to make the API calls.
Create a new Python file name inference.py
.
(venv) $ nano inference.py
Add the following code to inference.py
.
import os
import requests
api_key = os.environ.get('VULTR_CLOUD_INFERENCE_API_KEY')
# Set the model
# List of available models: https://api.vultrinference.com/v1/chat/models
model = ''
messages = [
{
'role': 'user',
'content': 'What is the capital of India?'
}
]
headers = {
'Authorization ': f'Bearer {api_key}',
}
data = {
'model': model,
'messages': messages
}
response = requests.post('https://api.vultrinference.com/v1/chat/completions', headers=headers, json=data)
llm_response = response.json()['choices'][0]['message']['content']
print(llm_response)
Run the Python script.
(venv) $ export VULTR_CLOUD_INFERENCE_API_KEY=<your_api_key>
(venv) $ python inference.py
Here, we are making a POST request to https://api.vultrinference.com/v1/chat/completions
with the required headers and data. The messages
list contains the list of messages for which we want to generate completions, role can be either system
, user
or assistant
, and content
is the message content.
To maintain conversation context, you can add the previous messages to the messages
list. You can also use the stream
parameter to get real-time completions. For more information, refer to the Vultr Cloud Inference API documentation.
Inference via OpenAI SDK
If you are using the OpenAI SDK for Vultr Cloud Inference, you can use the openai
package to make the API calls.
Create a new Python file name inference_openai.py
.
(venv) $ nano inference_openai.py
Add the following code to inference_openai.py
.
import os
import openai
client = openai.OpenAI(
api_key=os.environ.get('VULTR_CLOUD_INFERENCE_API_KEY'),
base_url="https://api.vultrinference.com/v1",
)
# Set the model
# List of available models: https://api.vultrinference.com/v1/chat/models
model = ''
messages = [
{
'role': 'user',
'content': 'What is the capital of India?'
}
]
response = client.chat.completions.create(model=model, messages=messages)
llm_response = chat_completion.choices[0].message.content
print(llm_response)
Run the Python script.
(venv) $ export VULTR_CLOUD_INFERENCE_API_KEY=<your_api_key>
(venv) $ python inference_openai.py
Here, we are using the openai
package to make the API calls. The messages
list contains the list of messages for which we want to generate completions, role can be either system
, user
or assistant
, and content
is the message content.
To maintain conversation context, you can add the previous messages to the messages
list. You can also use the stream
parameter to get real-time completions. For more information, refer to the Vultr Cloud Inference API documentation.
Conclusion
In this article, you learned how to use Vultr Cloud Inference in Python to run inference workloads for large language models. You also learned how to use the requests
package and the OpenAI SDK to make API calls to Vultr Cloud Inference. You can now integrate Vultr Cloud Inference into your Python applications to generate completions for large language models.