---
title: Chat
url: https://docs.vultr.com/products/compute/serverless-inference/management/usage/chat
publish_date: 2024-09-23T20:20:49.777546Z
last_updated: 2026-05-26T19:11:20.411598Z
---

# How to Use the Chat Endpoint in Vultr Serverless Inference

Vultr Serverless Inference chat endpoint enables users to engage in chat conversations with Large Language Models (LLMs). This service allows for real-time interaction, leveraging advanced AI capabilities to facilitate dynamic and responsive communication. The endpoint also supports tool calling, letting models invoke defined functions (for example, fetch live data or call an API) during a conversation to produce data-driven responses. By integrating this endpoint, users can enhance their applications with sophisticated conversational AI, improving user experience and operational efficiency.

Follow this guide to utilize the chat endpoint on your Vultr account using the Vultr Console or API.

=== "Vultr Console"

    1. Navigate to **Products**, click **Serverless**, and then click **Inference**.

    1. Click your target inference subscription to open its management page.

    1. Open the **Chat** page.

    1. Select a preferred model.

    1. Provide **Max Tokens** value.

    1. Send a message in the chat window.

    1. Click **History** to view chat history.

    1. Click **New Conversation** to create a chat window.

=== "Vultr API"

    ### Chat with a Model Using the API

    1. Send a `GET` request to the [**List Serverless Inference** endpoint](https://www.vultr.com/api/#tag/serverless-inference/operation/list-inference) and note the target inference subscription's ID.

        ```console
        $ curl "https://api.vultr.com/v2/inference" \
            -X GET \
            -H "Authorization: Bearer ${VULTR_API_KEY}"
        ```

    1. Send a `GET` request to the [**Serverless Inference** endpoint](https://www.vultr.com/api/#tag/serverless-inference/operation/get-inference) and note the target inference subscription's API key.

        ```console
        $ curl "https://api.vultr.com/v2/inference/{inference-id}" \
            -X GET \
            -H "Authorization: Bearer ${VULTR_API_KEY}"
        ```

    1. Send a `GET` request to the [**List Models** endpoint](https://api.vultrinference.com/#tag/Models/operation/list-models) and note the preferred inference model's ID.

        ```console
        $ curl "https://api.vultrinference.com/v1/models" \
            -X GET \
            -H "Authorization: Bearer ${INFERENCE_API_KEY}"
        ```
    
    1. Send a `POST` request to the [**Create Chat Completion** endpoint](https://api.vultrinference.com/#tag/Chat/operation/create-chat-completion) to chat with the prefered Large Language Model.

        ```console
        $ curl "https://api.vultrinference.com/v1/chat/completions" \
            -X POST \
            -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
            -H "Content-Type: application/json" \
            --data '{
                "model": "{model-id}",
                "messages": [
                {
                    "role": "user",
                    "content": "{user-input}"
                }
                ],
                "max_tokens": 512
            }'
        ```

        Visit the [**Create Chat Completion** API page](https://api.vultrinference.com/#tag/Chat/operation/create-chat-completion) to view additional attributes you can apply for greater control when interacting with the preferred inference model.

    ### Use Tool Calling with the Chat Endpoint

    > [!NOTE]
    > Tool calling is currently supported only on the `kimi-k2-instruct` model.

    1. Define your tools using the `"tools"` parameter in the request body.
    1. Set `"tool_choice"` to `"auto"`, `"required"` or `"none"` to control when the model triggers a tool call.
    1. Send a `POST` request to the [**Create Chat Completion** endpoint](https://api.vultrinference.com/#tag/Chat/operation/create-chat-completion) to send a message that can trigger tool calls.

        ```console
        $ curl "https://api.vultrinference.com/v1/chat/completions" \
            -X POST \
            -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
            -H "Content-Type: application/json" \
            --data '{
                "model": "kimi-k2-instruct",
                "messages": [
                    { "role": "user", "content": "Ask a question that requires a tool response." }
                ],
                "tools": [
                    {
                        "type": "function",
                        "function": {
                            "name": "function_name",
                            "description": "Briefly describe the purpose of the function.",
                            "parameters": {
                                "type": "object",
                                "properties": {
                                    "parameter_name": {
                                        "type": "string",
                                        "description": "Describe the expected input parameter."
                                    }
                                },
                                "required": ["parameter_name"]
                            }
                        }
                    }
                ],
                "tool_choice": "auto"
            }'
        ```

        The model responds with a structured tool call such as:

        ```
        {
        "role": "assistant",
        "tool_calls": [
            {
            "type": "function",
            "function": {
                "name": "function_name",
                "arguments": "{\"parameter_name\": \"example_value\"}"
            }
            }
        ]
        }
        ```

        You can execute this function locally or via API, then send the output back to the model in a second request.  

        For a complete implementation example, see [**How to Use Tool Calling with Vultr Serverless Inference**](https://docs.vultr.com/how-to-use-tool-calling-with-vultr-serverless-inference).
