
Vultr Serverless Inference allows you to run inference workloads for large language models such as Mixtral 8x7B, Mistral 7B, Meta Llama 2 70B, and more. Using Vultr Serverless Inference, you can run inference workloads without having to worry about the infrastructure, and you only pay for the input and output tokens.
This guide demonstrates step-by-step process to start using Vultr Serverless Inference in Node.js.
Prerequisites
Before you begin, you must:
- Create a Vultr Serverless Inference Subscription
- Fetch the API key for Vultr Serverless Inference
- Node.js 20.x or later
Set Up the Environment
Create a new project directory and navigate to the project directory.
console$ mkdir vultr-serverless-inference-nodejs $ cd vultr-serverless-inference-nodejs
Create a new Node.js project.
console$ npm init -y
Install the required Node.js packages.
console$ npm install openai
Noteopenai
package if you are using the OpenAI SDK for Vultr Serverless Inference.
Inference via Direct API Calls
Vultr Serverless Inference provides a RESTful API to run inference workloads. You can use the fetch
built-in module to make the API calls.
Create a new JavaScript file name
inference.js
.console$ nano inference.js
Add the following code to
inference.js
.javascriptconst apiKey = process.env.VULTR_SERVERLESS_INFERENCE_API_KEY; // Set the model // List of available models: https://api.vultrinference.com/v1/chat/models const model = ''; const messages = [ { "role": "user", "content": "What is the capital of India?" } ] const headers = { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' } const url = 'https://api.vultrinference.com/v1/chat/completions'; const options = { method: 'POST', headers, body: JSON.stringify({model, messages}) }; fetch(url, options) .then(output => output.json()) .then(output => { const llmResponse = output.choices[0].message.content; console.log(llmResponse); })
Run the JavaScript script.
console$ export VULTR_SERVERLESS_INFERENCE_API_KEY=<your_api_key> $ node inference.js
Here, we are making a POST request to
https://api.vultrinference.com/v1/chat/completions
with the required headers and data. Themessages
list contains the list of messages for which we want to generate completions, role can be eithersystem
,user
orassistant
, andcontent
is the message content.To maintain conversation context, you can add the previous messages to the
messages
list. You can also use thestream
parameter to get real-time completions. For more information, refer to the Vultr Serverless Inference API documentation.
Inferencing via OpenAI SDK
If you are using the OpenAI SDK for Vultr Serverless Inference, you can use the openai
package to make the API calls.
Create a new JavaScript file name
inference_openai.js
.console$ nano inference_openai.js
Add the following code to
inference_openai.js
.javascriptconst { OpenAI } = require("openai"); const apiKey = process.env.VULTR_SERVERLESS_INFERENCE_API_KEY const client = new OpenAI({ apiKey, baseURL: "https://api.vultrinference.com/v1" }); // Set the model // List of available models: https://api.vultrinference.com/v1/chat/models const model = ''; const messages = [ { "role": "user", "content": "What is the capital of India?" } ] async function main() { const output = await client.chat.completions.create({ messages, model}); const llmResponse = output.choices[0].message.content; console.log(llmResponse); } main();
Run the JavaScript script.
console$ export VULTR_SERVERLESS_INFERENCE_API_KEY=<your_api_key> $ node inference_openai.js
Here, we are using the
openai
package to make the API calls. Themessages
list contains the list of messages for which we want to generate completions, role can be eithersystem
,user
orassistant
, andcontent
is the message content.
Conclusion
In this guide, you learned how to use Vultr Serverless Inference in Node.js. You also learned how to make direct API calls and use the OpenAI SDK for Vultr Serverless Inference. You can now integrate Vultr Serverless Inference into your Node.js applications to generate completions for large language models.
No comments yet.