How to Build a Llama.cpp Container Image
Introduction
Llama.cpp is a high-performance inference platform for Large Language Models (LLMs) such as Llama, Falcon, and Mistral. It offers a minimal development environment with support for both CPU and GPU systems.
This article explains how to build a Llama.cpp container image using the Vultr Container Registry.
Prerequisites
Before you begin:
Deploy an instance using Vultr's GPU Marketplace App
Access the server using SSH.
Start the Docker service.
console$ sudo systemctl start docker
Add the non-root user to the Docker group. For example,
linuxuser
.console$ sudo usermod -aG docker linuxuser
Switch to the user:
console$ su - linuxuser
Set Up the Server
Create a new directory to store your Llama.cpp project files.
console$ mkdir llama-project
Switch to the directory.
console$ cd llama-project
Clone the Llama.cpp project repository using Git.
console$ git clone https://github.com/ggerganov/llama.cpp/
List files and verify that a new
llama.cpp
directory is available.console$ ls
Output:
llama.cpp
Switch to the
llama.cpp
project directory.console$ cd llama.cpp
List all hidden files and verify that a new
.devops
directory is available.console$ ls -a
Output:
.. cmake convert.py flake.lock ggml-common.h ggml-metal.h ggml-quantsh .git llama.cpp CMakeLists.txt .devops
The
.devops
directory contains the following Dockerfile resources:main.Dockerfile
: Contains the build context for CPU systems.main-cuda.Dockerfile
: Contains the build context for GPU systems.
Use the above resources in the next sections to build a CPU or GPU system container image.
Build a Llama.cpp Container Image for CPU Systems
Follow the steps below to build a new Llama.cpp container image using main.Dockerfile
that contains the required build context and installs all necessary dependency packages for CPU-based systems.
Copy the
main.Dockerfile
from the.devops
directory to the mainllama.cpp
project directory.console$ cp .devops/main.Dockerfile .
Build a new container image using
main.Dockerfile
with all files in the project directory. Replacellama-image
with your desired image name.console$ docker build -f main.Dockerfile -t llama-image .
View all Docker images on the server and verify that a new
llama-image
is available.console$ docker images
Output:
REPOSITORY TAG IMAGE ID CREATED SIZE llama-image latest f4fcb571a2f7 About a minute ago 80.1MB
Build a Llama.cpp Container Image for GPU Systems
The Llama.cpp main-cuda.Dockerfile
resource contains the build context for NVIDIA GPU systems that run the latest CUDA driver packages. Follow the steps below to build a Llama container image compatible with GPU systems.
Copy
main-cuda.Dockerfile
to the Llama.cpp project directory.console$ cp .devops/main-cuda.Dockerfile .
Build a new container image
llama-gpu-image
using themain-cuda.Dockerfile
with all files in the working project directory.console$ docker build -f main-cuda.Dockerfile -t llama-gpu-image .
View all Docker images on the server and verify that the new
llama-gpu-image
is available.console$ docker images
Output:
REPOSITORY TAG IMAGE ID CREATED SIZE llama-gpu-image latest 40421f1469cf 2 minutes ago 2.08GB
To run the Llama.cpp GPU container image, verify that your target host includes the minimum or higher CUDA version referenced by the
ARG CUDA_VERSION=
directive withinmain-cuda.Dockerfile
. Run the following command to verify the target CUDA version.console$ cat main-cuda.Dockerfile | grep CUDA_VERSION=
Output:
ARG CUDA_VERSION=12.3.1
Upload the Llama.cpp Container Image to the Vultr Container Registry
Open the Vultr Customer Portal.
Click Products and select Container Registry on the main navigation menu.
Click your target Vultr Container Registry to open the management panel and view the registry access credentials.
Copy the Registry URL value, Username, and API Key to use when accessing the registry.
Switch to your server terminal session and log in to your Vultr Container Registry. Replace
exampleregistry
,exampleuser
,registry-password
with your actual registry details.console$ docker login https://sjc.vultrcr.com/exampleregistry -u exampleuser -p registry-password
Tag the Llama.cpp container image with your desired Vultr Container Registry tag. For example,
sjc.vultrcr.com/exampleregistry/llama-gpu-image
.console$ docker tag llama-gpu-image sjc.vultrcr.com/exampleregistry/llama-gpu-image
View all Docker images on the server and verify that the new tagged image is available.
console$ docker images
Output:
REPOSITORY TAG IMAGE ID CREATED SIZE sjc.vultrcr.com/exampleregistry/llama-gpu-image latest 40421f1469cf 3 minutes ago 2.08GB llama-gpu-image latest 40421f1469cf 3 minutes ago 2.08GB
Push the tagged image to your Vultr Container Registry repository.
console$ docker push sjc.vultrcr.com/exampleregistry/llama-gpu-image
Open your Vultr Container Registry management panel and click Repositories on the top navigation bar to verify that the new repository is available.