How to Build a Llama.cpp Container Image

Updated on April 24, 2024

Header Image

Introduction

Llama.cpp is a high-performance inference platform for Large Language Models (LLMs) such as Llama, Falcon, and Mistral. It offers a minimal development environment with support for both CPU and GPU systems.

This article explains how to build a Llama.cpp container image using the Vultr Container Registry.

Prerequisites

Before you begin:

  • Deploy an instance using Vultr's GPU Marketplace App

  • Access the server using SSH.

  • Start the Docker service.

    console
    $ sudo systemctl start docker
    
  • Add the non-root user to the Docker group. For example, linuxuser.

    console
    $ sudo usermod -aG docker linuxuser
    
  • Switch to the user:

    console
    $ su - linuxuser
    

Set Up the Server

  1. Create a new directory to store your Llama.cpp project files.

    console
    $ mkdir llama-project
    
  2. Switch to the directory.

    console
    $ cd llama-project
    
  3. Clone the Llama.cpp project repository using Git.

    console
    $ git clone https://github.com/ggerganov/llama.cpp/
    
  4. List files and verify that a new llama.cpp directory is available.

    console
    $ ls
    

    Output:

    llama.cpp
  5. Switch to the llama.cpp project directory.

    console
    $ cd llama.cpp
    
  6. List all hidden files and verify that a new .devops directory is available.

    console
    $ ls -a
    

    Output:

    ..                 cmake           convert.py                     flake.lock     ggml-common.h        ggml-metal.h      ggml-quantsh     .git                         llama.cpp        
    CMakeLists.txt  .devops    

    The .devops directory contains the following Dockerfile resources:

    • main.Dockerfile: Contains the build context for CPU systems.
    • main-cuda.Dockerfile: Contains the build context for GPU systems.

    Use the above resources in the next sections to build a CPU or GPU system container image.

Build a Llama.cpp Container Image for CPU Systems

Follow the steps below to build a new Llama.cpp container image using main.Dockerfile that contains the required build context and installs all necessary dependency packages for CPU-based systems.

  1. Copy the main.Dockerfile from the .devops directory to the main llama.cpp project directory.

    console
    $ cp .devops/main.Dockerfile .
    
  2. Build a new container image using main.Dockerfile with all files in the project directory. Replace llama-image with your desired image name.

    console
    $ docker build -f main.Dockerfile -t llama-image .
    
  3. View all Docker images on the server and verify that a new llama-image is available.

    console
    $ docker images
    

    Output:

    REPOSITORY                                  TAG       IMAGE ID       CREATED              SIZE
    llama-image                                 latest    f4fcb571a2f7   About a minute ago   80.1MB

Build a Llama.cpp Container Image for GPU Systems

The Llama.cpp main-cuda.Dockerfile resource contains the build context for NVIDIA GPU systems that run the latest CUDA driver packages. Follow the steps below to build a Llama container image compatible with GPU systems.

  1. Copy main-cuda.Dockerfile to the Llama.cpp project directory.

    console
    $ cp .devops/main-cuda.Dockerfile .
    
  2. Build a new container image llama-gpu-image using the main-cuda.Dockerfile with all files in the working project directory.

    console
    $ docker build -f main-cuda.Dockerfile -t llama-gpu-image .
    
  3. View all Docker images on the server and verify that the new llama-gpu-image is available.

    console
    $ docker images
    

    Output:

    REPOSITORY                                  TAG       IMAGE ID       CREATED          SIZE
    llama-gpu-image                             latest    40421f1469cf   2 minutes ago    2.08GB

    To run the Llama.cpp GPU container image, verify that your target host includes the minimum or higher CUDA version referenced by the ARG CUDA_VERSION= directive within main-cuda.Dockerfile. Run the following command to verify the target CUDA version.

    console
    $ cat main-cuda.Dockerfile | grep CUDA_VERSION=
    

    Output:

    ARG CUDA_VERSION=12.3.1

Upload the Llama.cpp Container Image to the Vultr Container Registry

  1. Open the Vultr Customer Portal.

  2. Click Products and select Container Registry on the main navigation menu.

    Manage a Vultr Container Registry

  3. Click your target Vultr Container Registry to open the management panel and view the registry access credentials.

  4. Copy the Registry URL value, Username, and API Key to use when accessing the registry.

    Open the Vultr Container Registry

  5. Switch to your server terminal session and log in to your Vultr Container Registry. Replace exampleregistry, exampleuser, registry-password with your actual registry details.

    console
    $ docker login https://sjc.vultrcr.com/exampleregistry -u exampleuser -p registry-password
    
  6. Tag the Llama.cpp container image with your desired Vultr Container Registry tag. For example, sjc.vultrcr.com/exampleregistry/llama-gpu-image.

    console
    $ docker tag llama-gpu-image sjc.vultrcr.com/exampleregistry/llama-gpu-image
    
  7. View all Docker images on the server and verify that the new tagged image is available.

    console
    $ docker images
    

    Output:

    REPOSITORY                                  TAG       IMAGE ID       CREATED          SIZE
    sjc.vultrcr.com/exampleregistry/llama-gpu-image     latest    40421f1469cf   3 minutes ago    2.08GB
    llama-gpu-image                             latest    40421f1469cf   3 minutes ago    2.08GB
  8. Push the tagged image to your Vultr Container Registry repository.

    console
    $ docker push sjc.vultrcr.com/exampleregistry/llama-gpu-image
    
  9. Open your Vultr Container Registry management panel and click Repositories on the top navigation bar to verify that the new repository is available.

    View Vultr Container Registry Repositories