How to build a Chatbot in Rust with Large Language Models on Vultr Cloud GPU

Updated on July 25, 2024
How to build a Chatbot in Rust with Large Language Models on Vultr Cloud GPU header image

Introduction

Large language models (LLM) are artificial intelligence models that process natural language inputs and generate human-like outputs. You can implement LLM operations for various tasks such as chatbots, virtual assistants, and customer service. Rust is a statically and strongly typed programming language that focuses on performance and safety. It's a good choice for developing chatbots and other LLM-powered applications.

This article explains how to build a Chatbot in Rust with Large Language Models on a Vultr Cloud GPU server. You will use the Leptos web framework to build a web application in Rust. Then, integrate a Large Language Model to enable the Chatbot application processes.

Prerequisites

Before you begin:

Set Up the Development Server

  1. Update the server.

    console
    $ sudo apt-get update
    
  2. Install build-essential and libssl-dev dependency packages.

    console
    $ sudo apt-get install -y build-essential libssl-dev
    
  3. Install the latest Rust toolchain and Cargo packages.

    console
    $ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    

    When prompted for your desired installation type, enter 1 to select the Proceed with installation (default) option.

  4. Activate Rust in your server session.

    console
    $ source "$HOME/.cargo/env"
    
  5. View the installed Rust and Cargo versions.

    console
    $ rustc --version && cargo --version
    
  6. Install the cargo-leptos package using cargo.

    console
    $ cargo install cargo-leptos@0.2.4
    
  7. Create a new project rust-chatbot using Cargo and the start-axum-workspace template.

    console
    $ cargo leptos new --git https://github.com/quanhua92/start-axum-workspace --name rust-chatbot
    
  8. Switch to the rust-chatbot directory.

    console
    $ cd rust-chatbot
    
  9. Install the wasm32 target using rustup

    console
    $ rustup target add wasm32-unknown-unknown
    

Build the Rust Chatbot Application

  1. Edit the Cargo.toml file in the app directory using a text editor such as Nano.

    console
    $ nano app/Cargo.toml
    
  2. Add the following serde crate to the dependencies section within the file.

    toml
    [dependencies]
    serde = { version = "1.0.188", features = ["derive"] }
    
  3. Create a models sub-directory in the app/src directory.

    console
    $ mkdir -p app/src/models
    
  4. Create a new file conversation.rs in the app/src/models directory to implement the conversation area flow.

    console
    $ nano app/src/models/conversation.rs
    
  5. Add the following code to the file.

    rust
    use serde::Deserialize;
    use serde::Serialize;
    
    #[derive(Serialize, Deserialize, Clone, Debug)]
    pub struct Message {
        pub text: String,
        pub sender: String,
    }
    
    #[derive(Serialize, Deserialize, Clone, Debug)]
    pub struct Conversation {
        pub messages: Vec<Message>,
    }
    
    impl Conversation {
        pub fn new() -> Self {
            Self {
                messages: Vec::new(),
            }
        }
    }
    

    Save and close the file.

  6. Create a new mod.rs in the app/src/models directory.

    console
    $ nano app/src/models/mod.rs
    
  7. Add the following code to the file.

    rust
    pub mod conversation;
    pub use conversation::{Conversation, Message};
    

    The above code creates the Conversation struct and Message struct to manage the chat data.

Create the Conversation Area

  1. Create a components sub-directory in the app/src directory.

    console
    $ mkdir -p app/src/components
    
  2. Create a new file conversation_area.rs in the app/src/components directory.

    console
    $ nano app/src/components/conversation_area.rs
    
  3. Add the following code to the file.

    rust
    use crate::models::Conversation;
    use leptos::html::Div;
    use leptos::logging::log;
    use leptos::*;
    
    #[component]
    pub fn ConversationArea(conversation: ReadSignal<Conversation>) -> impl IntoView {
        let div_ref = create_node_ref::<Div>();
    
        create_effect(move |_| {
            let c = conversation.get();
            log!("ConversationArea: {:?}", c);
            if let Some(div) = div_ref.get() {
                request_animation_frame(move || {
                    div.set_scroll_top(div.scroll_height());
                });
            }
        });
    
        view! {
            <div class="conversation-area" node_ref=div_ref>
                { move || conversation.get().messages.iter().map(move |message| {
                    view! {
                        <div class="message">
                            <span class="message-sender">{message.sender.clone()}</span>
                            <p class="message-text">{message.text.clone()}</p>
                        </div>
                    }
                })
                .collect::<Vec<_>>()
                }
    
            </div>
        }
    }
    

    Save and close the file.

    This creates a component named ConversationArea that displays all messages in the conversation.

Create the Application Input Area

  1. Create a new file input_area.rs in the app/src/components directory.

    console
    $ nano app/src/components/input_area.rs
    
  2. Add the following code to the file.

    rust
    use crate::models::Conversation;
    use leptos::html::Input;
    use leptos::*;
    
    #[component]
    pub fn InputArea(submit: Action<String, Result<Conversation, ServerFnError>>) -> impl IntoView {
        let text_ref = create_node_ref::<Input>();
        view! {
            <form class="input-area" on:submit=move |ev| {
                ev.prevent_default();
                let input = text_ref.get().expect("input exists");
                let user_input = input.value();
                let user_input = user_input.trim();
                if !user_input.is_empty() {
                    submit.dispatch(user_input.to_string());
                    input.set_value("");
                }
            }>
                <input type="text" class="input-area-text" placeholder="Enter a prompt here" node_ref=text_ref/>
                <input type="submit" class="input-area-button" value="Send"/>
            </form>
        }
    }
    

    Save and close the file.

    The above code creates a new component InputArea that displays a form with a text input field and a submit button to send a new prompt. The button uses a submit signal to send the new prompt message to the parent component.

  3. Create a new mod.rs file in the app/src/components/ directory to import the input area.

    console
    $ nano app/src/components/mod.rs
    
  4. Add the following code to the file.

    rust
    pub mod conversation_area;
    pub mod input_area;
    
    pub use conversation_area::ConversationArea;
    pub use input_area::InputArea;
    

    Save and close the file.

Apply CSS Styling to the Application Interface

  1. Back up the original style/main.scss file.

    console
    $ mv style/main.scss style/main.scss.ORIG
    
  2. Create a new style/main.scss file to include the application style elements.

    console
    $ nano style/main.scss
    
  3. Add the following CSS code to the file.

    css
    body {
      font-family: sans-serif;
      text-align: center;
      margin: 0;
      padding: 0;
    }
    
    .chat-area {
      display: flex;
      flex-direction: column;
      height: 100vh;
      justify-content: space-between;
    }
    
    .conversation-area {
      overflow: auto;
      display: flex;
      flex-direction: column;
      padding: 0.25rem;
    }
    
    .conversation-area > .message {
      display: flex;
      align-items: center;
      gap: 0.5rem;
      border-bottom: 1px solid hsl(0, 0, 0%, 10%);
    }
    
    .conversation-area > .message > .message-sender {
      min-width: 40px;
      height: 40px;
      border-radius: 20px;
      background-color: hsl(0, 0, 0%, 10%);
      display: flex;
      align-items: center;
      justify-content: center;
      font-size: 0.7em;
    }
    
    .input-area {
      display: flex;
      justify-content: space-between;
      gap: 0.5rem;
      padding: 0.25rem;
    }
    
    .input-area-text {
      flex-grow: 1;
      min-height: 2em;
    }
    

    Save and close the file.

Create the Application Server Function

In this section, create a server function to handle the chat conversation. The function uses the Leptos framework to call both frontend and backend server functions. When accessed, the Leptos framework sends a fetch request to the server, serializes arguments, and deserializes the return value from the response.

  1. Create a new file api.rs in the app/src directory.

    console
    $ nano app/src/api.rs
    
  2. Add the following code to the file.

    rust
    use crate::models::Conversation;
    use crate::models::Message;
    use leptos::logging::log;
    use leptos::*;
    
    #[server(ProcessConversation, "/api")]
    pub async fn process_conversation(
        conversation: Conversation,
    ) -> Result<Conversation, ServerFnError> {
        log!("process_conversation {:?}", conversation);
        let mut conversation = conversation;
    
        conversation.messages.push(Message {
            text: "Response from AI".to_string(),
            sender: "AI".to_string(),
        });
        Ok(conversation)
    }
    

    Save and close the file.

    The above code creates a server function process_conversation that displays the text Response from AI within the application conversation area.

  3. Back up the original lib.rs in the app/src directory

    console
    $ mv app/src/lib.rs app/src/lib.ORIG
    
  4. Create the lib.rs file.

    console
    $ nano app/src/lib.rs
    
  5. Add the following contents to the file.

    rust
    use leptos::*;
    use leptos_meta::*;
    use leptos_router::*;
    
    pub mod api;
    pub mod components;
    pub mod error_template;
    pub mod models;
    
    use crate::api::process_conversation;
    use crate::components::{ConversationArea, InputArea};
    use crate::models::{Conversation, Message};
    
    #[component]
    pub fn App() -> impl IntoView {
        // Provides context that manages stylesheets, titles, meta tags, etc.
        provide_meta_context();
    
        view! {
            // injects a stylesheet into the document <head>
            // id=leptos means cargo-leptos will hot-reload this stylesheet
            <Stylesheet id="leptos" href="/pkg/start-axum-workspace.css"/>
    
            // sets the document title
            <Title text="Welcome to Rust Chatbot"/>
    
            // content for this welcome page
            <Router>
                <main>
                    <Routes>
                        <Route path="" view=|| view! { <HomePage/> }/>
                    </Routes>
                </main>
            </Router>
        }
    }
    
    /// Renders the home page of your application.
    #[component]
    fn HomePage() -> impl IntoView {
        // Creates a reactive value to update the button
        let (conversation, set_conversation) = create_signal(Conversation::new());
        let send_message = create_action(move |input: &String| {
            let message = Message {
                text: input.clone(),
                sender: "User".to_string(),
            };
            set_conversation.update(move |c| {
                c.messages.push(message);
            });
    
            process_conversation(conversation.get())
        });
    
        create_effect(move |_| {
            if let Some(_) = send_message.input().get() {
                set_conversation.update(move |c| {
                    c.messages.push(Message {
                        text: "...".to_string(),
                        sender: "AI".to_string(),
                    });
                });
            }
        });
    
        create_effect(move |_| {
            if let Some(Ok(response)) = send_message.value().get() {
                set_conversation.set(response);
            }
        });
    
        view! {
            <div class="chat-area">
                <ConversationArea conversation />
                <InputArea submit=send_message />
            </div>
        }
    }
    

    Save and close the file.

    In the above code, the web application renders the HomePage component that contains the ConversationArea and InputArea. The signal send_message calls the process_conversation server function to process the conversation.

  6. Build the application using cargo.

    console
    $ cargo leptos build
    

    When successful, verify the build process does not return any errors.

  7. By default, UFW is active on Vultr Ubuntu servers. To enable the application interface, allow the HTTP server port 3000 through the firewall.

    console
    $ sudo ufw allow 3000
    
  8. Allow the web socket port 3001.

    console
    $ sudo ufw allow 3001
    
  9. Reload the Firewall rules to apply changes.

    console
    $ sudo ufw reload
    
  10. Run the application using cargo to test the application interface.

    console
    $ LEPTOS_SITE_ADDR=0.0.0.0:3000 cargo leptos watch
    
  11. Visit your Server IP on port 3000 to access the application.

    http://<SERVER-IP>:3000

    The Rust Chatbot Application Interface

  12. In your server terminal session, press Ctrl + C on your keyboard to stop the running application process.

Add a Language Model to the Application

To enable all application processes, integrate a pre-trained Large Language Model (LLM) that generates a response within the application. The model loads once and saved to the Axum framework shared state. On every API call, the process_conversation server function loads the model from the shared state to generate the chat response. In this section, implement the Open_llama_7b model in your application to enable the Chatbot operations.

  1. Edit the Cargo.toml file in your root project directory.

    console
    $ nano Cargo.toml
    
  2. Add the following [profile.dev.package.ggml-sys] section below the [workspace] declarations.

    ini
    [profile.dev.package.ggml-sys]
    opt-level = 3
    
  3. Back up the Cargo.toml file in the app directory.

    console
    $ mv app/Cargo.toml app/Cargo.toml.ORIG
    
  4. Create a new Cargo.toml file.

    console
    $ nano app/Cargo.toml
    
  5. Add the following contents to the file.

    ini
    [package]
    name = "app"
    version = "0.1.0"
    edition = "2021"
    
    # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
    
    [dependencies]
    leptos.workspace = true
    leptos_meta.workspace = true
    leptos_router.workspace = true
    leptos_axum = { workspace = true, optional = true }
    axum = { workspace = true, optional = true }
    serde = { version = "1.0.188", features = ["derive"] }
    llm = { git = "https://github.com/rustformers/llm" , branch = "main", default-features = false, features = ["models"], optional = true}
    rand = "0.8.5"
    num_cpus = { version = "1.16.0", optional = true }
    
    http.workspace = true
    cfg-if.workspace = true
    thiserror.workspace = true
    
    [features]
    default = []
    hydrate = ["leptos/hydrate", "leptos_meta/hydrate", "leptos_router/hydrate"]
    ssr = ["leptos/ssr", "leptos_meta/ssr", "leptos_router/ssr", "dep:leptos_axum", "dep:llm", "dep:axum", "dep:num_cpus"]
    

    Save and close the file.

    The above code adds the llm functionality and other crates to your app project process.

  6. Back up the original Cargo.toml file in the server directory.

    console
    $ mv server/Cargo.toml server/Cargo.toml.ORIG
    
  7. Create a new Cargo.toml file.

    console
    $ nano server/Cargo.toml
    
  8. Add the following code to the file.

    ini
    [package]
    name = "server"
    version = "0.1.0"
    edition = "2021"
    
    # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
    
    [dependencies]
    app = { path = "../app", default-features = false, features = ["ssr"] }
    leptos = { workspace = true, features = [ "ssr" ]}
    dotenv = { version = "0.15.0" }
    llm = { git = "https://github.com/rustformers/llm" , branch = "main", default-features = false, features = ["models"]}
    
    leptos_axum.workspace = true
    
    axum.workspace = true
    simple_logger.workspace = true
    tokio.workspace = true
    tower.workspace = true
    tower-http.workspace = true
    log.workspace = true
    
    [features]
    clblast = ["llm/clblast"]
    

    Save and close the file.

    The above code adds the llm functionality and other crates to the server project process. It also defines a new feature clblast to improve the application processes in a production environment.

Prepare the Large Language Model

  1. Create a new directory rust-chatbot in a system-wide location such as /opt to store the model files.

    console
    $ sudo mkdir -p /opt/rust-chatbot
    
  2. Grant your user ownership privileges to the directory.

    console
    $ sudo chown $USER /opt/rust-chatbot
    
  3. Download and save your target model to the directory. For example, download the open_llama_7b-q5_1-ggjt.bin file from the OpenLLamA Hugging Face model page using the wget utility.

    console
    $ wget -O /opt/rust-chatbot/open_llama_7b-q5_1-ggjt.bin https://huggingface.co/rustformers/open-llama-ggml/resolve/e261e2b5f5bd3dc88507a76b97431cab257eeaee/open_llama_7b-q5_1-ggjt.bin
    
  4. Create a new .env file in your rust-chatbot project directory.

    console
    $ nano .env
    
  5. Add the following string to the file. Replace the path with your actual model file location.

    shell
    MODEL_PATH="/opt/rust-chatbot/open_llama_7b-q5_1-ggjt.bin"
    

Load the Large Language Model

  1. Create a new file state.rs in the app/src directory.

    console
    $ nano app/src/state.rs
    
  2. Add the following code to the file.

    rust
    use cfg_if::cfg_if;
    
    cfg_if! {
        if #[cfg(feature = "ssr")] {
    
    use leptos::LeptosOptions;
    use axum::extract::FromRef;
    use llm::models::Llama;
    use std::sync::Arc;
    
    #[derive(Clone, FromRef)]
    pub struct AppState {
        pub leptos_options: LeptosOptions,
        pub model: Arc<Llama>,
        pub model_path: String,
    }
    
        }
    }
    

    Save and close the file.

    The above code defines a new struct AppState that stores the language model in your application.

  3. Edit the file lib.rs in the app/src directory.

    console
    $ nano app/src/lib.rs
    
  4. Add the following code below the pub mod models; element.

    console
    pub mod state;
    
  5. Back up the main.rs in the server/src directory.

    console
    $ mv server/src/main.rs server/src/main.rs.ORIG
    
  6. Create a new main.rs file.

    console
    $ nano server/src/main.rs
    
  7. Add the following code to the file.

    rust
    use app::state::AppState;
    use app::*;
    use axum::{
        body::Body as AxumBody,
        extract::{Path, RawQuery, State},
        http::{header::HeaderMap, Request},
        response::IntoResponse,
    };
    use axum::{routing::post, Router};
    use dotenv;
    use fileserv::file_and_error_handler;
    use leptos::logging::log;
    use leptos::*;
    use leptos_axum::{generate_route_list, handle_server_fns_with_context, LeptosRoutes};
    use std::{env, sync::Arc};
    
    pub mod fileserv;
    
    async fn handle_server_fns_with_state(
        State(state): State<AppState>,
        path: Path<String>,
        headers: HeaderMap,
        raw_query: RawQuery,
        request: Request<AxumBody>,
    ) -> impl IntoResponse {
        handle_server_fns_with_context(
            path,
            headers,
            raw_query,
            move || {
                provide_context(state.clone());
            },
            request,
        )
        .await
    }
    
    #[tokio::main]
    async fn main() {
        simple_logger::init_with_level(log::Level::Debug).expect("couldn't initialize logging");
    
        dotenv::dotenv().ok();
    
        // Setting get_configuration(None) means we'll be using cargo-leptos's env values
        // For deployment these variables are:
        // <https://github.com/leptos-rs/start-axum#executing-a-server-on-a-remote-machine-without-the-toolchain>
        // Alternately a file can be specified such as Some("Cargo.toml")
        // The file would need to be included with the executable when moved to deployment
        let conf = get_configuration(None).await.unwrap();
        let leptos_options = conf.leptos_options;
        let addr = leptos_options.site_addr;
        let routes = generate_route_list(|| view! { <App/> });
    
        // Load model
        let model_path = env::var("MODEL_PATH").expect("MODEL_PATH must be set");
        let model_parameters = llm::ModelParameters {
            use_gpu: true,
            ..llm::ModelParameters::default()
        };
    
        let model = llm::load::<llm::models::Llama>(
            std::path::Path::new(&model_path),
            llm::TokenizerSource::Embedded,
            model_parameters,
            llm::load_progress_callback_stdout,
        )
        .unwrap_or_else(|err| panic!("Failed to load model: {err}"));
    
        let state = AppState {
            leptos_options,
            model: Arc::new(model),
            model_path,
        };
    
        // build our application with a route
        let app = Router::new()
            .route("/api/*fn_name", post(handle_server_fns_with_state))
            .leptos_routes(&state, routes, || view! { <App/> })
            .fallback(file_and_error_handler)
            .with_state(state);
    
        // run our app with hyper
        // `axum::Server` is a re-export of `hyper::Server`
        log!("listening on http://{}", &addr);
        axum::Server::bind(&addr)
            .serve(app.into_make_service())
            .await
            .unwrap();
    }
    

    Save and close the file.

    The above code loads the language model and saves it to the Axum web framework state within your application.

Create the Chatbot Response

  1. Back up the api.rs file in the app/src directory.

    console
    $ mv app/src/api.rs app/src/api.rs.ORIG
    
  2. Create a new api.rs file.

    console
    $ nano app/src/api.rs
    
  3. Add the following code to the file.

    rust
    use crate::models::Conversation;
    use crate::models::Message;
    use leptos::logging::log;
    use leptos::*;
    
    #[server(ProcessConversation, "/api")]
    pub async fn process_conversation(
        conversation: Conversation,
    ) -> Result<Conversation, ServerFnError> {
        use crate::state::AppState;
        use llm::Model;
    
        let state: AppState = use_context::<AppState>()
            .ok_or(ServerFnError::ServerError("No server state".to_string()))?;
    
        let model = state.model;
        let prelude = r#"A chat between a human ("User") and an AI assistant ("AI"). The AI assistant gives helpful, detailed, and polite answers to the human's questions."#;
        let mut prompt = format!("{prelude}\n").to_string();
        for message in conversation.messages.clone() {
            let sender = message.sender;
            let text = message.text;
            prompt.push_str(format!("{sender}: {text}\n").as_str());
        }
        prompt.push_str(format!("AI:").as_str());
        let stop_sequence = "User:";
        let maximum_token_count = 100;
    
        let mut output: String = String::new();
        let mut buffer: String = String::new();
        let mut session = model.start_session(llm::InferenceSessionConfig {
            n_threads: num_cpus::get_physical(),
            ..Default::default()
        });
        log!("Generating response...");
        log!("Prompt: {}", prompt);
        let res = session.infer::<std::convert::Infallible>(
            model.as_ref(),
            &mut rand::thread_rng(),
            &llm::InferenceRequest {
                prompt: prompt.as_str().into(),
                parameters: &llm::InferenceParameters::default(),
                play_back_previous_tokens: false,
                maximum_token_count: Some(maximum_token_count),
            },
            &mut Default::default(),
            |r| match r {
                llm::InferenceResponse::InferredToken(token) => {
                    let mut buf = buffer.clone();
                    buf.push_str(&token);
    
                    if buf.starts_with(stop_sequence) {
                        buffer.clear();
                        return Ok(llm::InferenceFeedback::Halt);
                    } else if stop_sequence.starts_with(&buf) {
                        buffer = buf;
                        return Ok(llm::InferenceFeedback::Continue);
                    }
                    buffer.clear();
                    output.push_str(&buf);
                    Ok(llm::InferenceFeedback::Continue)
                }
                llm::InferenceResponse::EotToken => Ok(llm::InferenceFeedback::Halt),
                _ => Ok(llm::InferenceFeedback::Continue),
            },
        );
    
        println!("Output: {output}");
    
        match res {
            Ok(result) => println!("\n\nInference stats: \n {result}"),
            Err(err) => println!("\n{err}"),
        }
    
        let mut conversation = conversation;
        conversation.messages.push(Message {
            text: output,
            sender: "AI".to_string(),
        });
        Ok(conversation)
    }
    

    Save and close the file.

    The above process_conversation method prepares a prompt from the conversation, loads the model from the shared state, creates a new session, and runs an inference process to generate the output. The inference process requires a GPU to improve the model processing time.

  4. Build the application using cargo.

    console
    $ cargo leptos build
    
  5. Run the application to test the new LLM model performance.

    console
    $ LEPTOS_SITE_ADDR=0.0.0.0:3000 cargo leptos watch
    
  6. In a new web browser window, access your application interface using your Server IP Address on port 3000.

    http://SERVER-IP:3000
  7. Enter a prompt of your choice in the input area to test the model performance.

Note
A CPU server may take up to 30 seconds to generate a chat response within your application. To generate the best possible results, experiment with multiple fine-tuned models to apply and improve your application processes.

Containerize the Application

  1. Create a new Dockerfile file.

    console
    $ nano Dockerfile
    
  2. Add the following contents to the file.

    dockerfile
    FROM rustlang/rust:nightly-bullseye as builder
    
    # Add GPU support
    RUN apt-get update && apt-get install -y libclblast-dev
    
    RUN wget https://github.com/cargo-bins/cargo-binstall/releases/latest/download/cargo-binstall-x86_64-unknown-linux-musl.tgz
    
    RUN tar -xvf cargo-binstall-x86_64-unknown-linux-musl.tgz
    
    RUN cp cargo-binstall /usr/local/cargo/bin
    
    RUN cargo binstall cargo-leptos --version 0.2.4 -y
    
    RUN mkdir -p /app
    WORKDIR /app
    COPY . .
    
    RUN rustup target add wasm32-unknown-unknown
    RUN cargo leptos build --release --bin-features clblast -vv
    
    FROM rustlang/rust:nightly-bullseye as runner
    COPY --from=builder /app/target/release/server /app/
    COPY --from=builder /app/target/site /app/site
    COPY --from=builder /app/Cargo.toml /app/
    WORKDIR /app
    
    RUN apt-get update && apt-get -y upgrade \
        && apt-get install -y \
        ocl-icd-libopencl1 \
        opencl-headers \
        clinfo \
        libclblast-dev
    
    RUN mkdir -p /etc/OpenCL/vendors && \
        echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
    
    ENV NVIDIA_VISIBLE_DEVICES all
    ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
    ENV RUST_LOG="info"
    ENV APP_ENVIRONMENT="production"
    ENV LEPTOS_SITE_ADDR="0.0.0.0:8080"
    ENV LEPTOS_SITE_ROOT="site"
    ENV MODEL_PATH="/opt/rust-chatbot/open_llama_7b-q5_1-ggjt.bin"
    EXPOSE 8080
    
    CMD ["/app/server"]
    

    Save and close the file.

    The above configuration uses the rust:nightly-bullseye image which includes the Rust nightly toolchain to build the application. The runner stage installs the required packages to work with NVIDIA GPUs and copies the rust-chatbot files from the builder stage.

  3. Log in to your Vultr Container Registry to set it as the default Docker registry.

    console
    $ docker login
    
  4. Build the Docker image to include all application files in the working directory.

    console
    $ docker build -t example_user/rust-chatbot .
    
  5. Push the Docker Image to your Vultr Container Registry.

    console
    $ docker push example_user/rust-chatbot:latest
    

Deploy the Application to your Vultr Cloud GPU Production Server

  • Access your Vultr Cloud GPU production server using SSH as a non-root user with sudo and Docker privileges.
  1. View the available GPU memory on your GPU server and verify that the total memory is at least 8192 MiB.

    console
    $ nvidia-smi --query-gpu=memory.total,memory.free,memory.used --format=csv
    
  2. Create a new directory rust-chatbot in a system-wide location such as /opt to store your application model image.

    console
    $ sudo mkdir -p /opt/rust-chatbot
    
  3. Grant your user account ownership privileges to the /opt/rust-chatbot directory.

    console
    $ sudo chown $USER /opt/rust-chatbot
    
  4. Download your application model to the directory. For example, open_llama_7b-q5_1-ggjt.bin.

    console
    $ wget -O /opt/rust-chatbot/open_llama_7b-q5_1-ggjt.bin https://huggingface.co/rustformers/open-llama-ggml/resolve/e261e2b5f5bd3dc88507a76b97431cab257eeaee/open_llama_7b-q5_1-ggjt.bin
    
  5. Log in to your Vultr Container Registry.

    console
    $ docker login
    
  6. Pull your application image from the Vultr Container Registry.

    console
    $ docker pull example_user/rust-chatbot
    
  7. Deploy a new Docker container with a restart always policy.

    console
    $ docker run -d --name rust-chatbot --restart always --gpus all -p 8080:8080 -v /opt/rust-chatbot:/opt/rust-chatbot example_user/rust-chatbot
    
  8. Allow the application port 8080 through your firewall table to enable access to the Chatbot interface.

    console
    $ ufw allow 8080
    
  9. In a new web browser window, visit your GPU Server IP Address on port 8080 to access the application.

    console
    http://SERVER-IP:8080
    
  10. Enter a prompt in the input area to test the application model performance.

    A Production Rust Chatbot Model Performance

  11. View the rust-chatbot Docker container logs to verify the application processes.

    console
    $ docker logs -f rust-chatbot
    

Conclusion

You have developed a full-stack chat application with Rust and deployed it on a Vultr Cloud GPU server for production serving. To improve the application performance, experiment with different fine-tuned models to generate and design your interface to match your needs. For more information, visit the Rust Cargo documentation.