How to Build a Llama.cpp Container Image

Introduction

Llama.cpp is a high-performance inference platform designed for Large Language Models (LLMs) like Llama, Falcon, and Mistral. It provides a streamlined development environment compatible with both CPU and GPU systems. This article explains how to set up and run Llama.cpp in Docker using the Vultr Container Registry.

Prerequisites

Before you begin:

Deploy an instance using Vultr's GPU Marketplace App
Access the server using SSH.
Start the Docker service.
console
```
$ sudo systemctl start docker
```
Add the non-root user to the Docker group. For example, linuxuser.
console
```
$ sudo usermod -aG docker linuxuser
```
Switch to the user:
console
```
$ su - linuxuser
```

Set Up the Server

Create a new directory to store your Llama.cpp project files.
console
```
$ mkdir llama-project
```
Switch to the directory.
console
```
$ cd llama-project
```
Clone the Llama.cpp project repository using Git.
console
```
$ git clone https://github.com/ggerganov/llama.cpp/
```
List files and verify that a new llama.cpp directory is available.
console
```
$ ls
```
Output:
```
llama.cpp
```
Switch to the llama.cpp project directory.
console
```
$ cd llama.cpp
```
List all hidden files and verify that a new .devops directory is available.
console
```
$ ls -a
```
Output:
```
..                 cmake           convert.py                     flake.lock     ggml-common.h        ggml-metal.h      ggml-quantsh     .git                         llama.cpp        
CMakeLists.txt  .devops    
```
The .devops directory contains the following Dockerfile resources:
- main.Dockerfile: Contains the build context for CPU systems.
- main-cuda.Dockerfile: Contains the build context for GPU systems.
Use the above resources in the next sections to build a CPU or GPU system container image.

Build a Llama.cpp Container Image for CPU Systems

Follow the steps below to build a new Llama.cpp container image using main.Dockerfile that contains the required build context and installs all necessary dependency packages for CPU-based systems.

Copy the main.Dockerfile from the .devops directory to the main llama.cpp project directory.
console
```
$ cp .devops/main.Dockerfile .
```
Build a new container image using main.Dockerfile with all files in the project directory. Replace llama-image with your desired image name.
console
```
$ docker build -f main.Dockerfile -t llama-image .
```

View all Docker images on the server and verify that a new llama-image is available.

console

$ docker images

Output:

REPOSITORY                                  TAG       IMAGE ID       CREATED              SIZE
llama-image                                 latest    f4fcb571a2f7   About a minute ago   80.1MB

Build a Llama.cpp Container Image for GPU Systems

The Llama.cpp main-cuda.Dockerfile resource contains the build context for NVIDIA GPU systems that run the latest CUDA driver packages. Follow the steps below to build a Llama container image compatible with GPU systems.

Copy main-cuda.Dockerfile to the Llama.cpp project directory.
console
```
$ cp .devops/main-cuda.Dockerfile .
```
Build a new container image llama-gpu-image using the main-cuda.Dockerfile with all files in the working project directory.
console
```
$ docker build -f main-cuda.Dockerfile -t llama-gpu-image .
```
View all Docker images on the server and verify that the new llama-gpu-image is available.
console
```
$ docker images
```
Output:
```
REPOSITORY                                  TAG       IMAGE ID       CREATED          SIZE
llama-gpu-image                             latest    40421f1469cf   2 minutes ago    2.08GB
```
To run the Llama.cpp GPU container image, verify that your target host includes the minimum or higher CUDA version referenced by the ARG CUDA_VERSION= directive within main-cuda.Dockerfile. Run the following command to verify the target CUDA version.
console
```
$ cat main-cuda.Dockerfile | grep CUDA_VERSION=
```
Output:
```
ARG CUDA_VERSION=12.3.1
```

Upload the Llama.cpp Container Image to the Vultr Container Registry

Open the Vultr Customer Portal.
Click Products and select Container Registry on the main navigation menu.
Click your target Vultr Container Registry to open the management panel and view the registry access credentials.
Copy the Registry URL value, Username, and API Key to use when accessing the registry.
Switch to your server terminal session and log in to your Vultr Container Registry. Replace exampleregistry, exampleuser, registry-password with your actual registry details.
console
```
$ docker login https://sjc.vultrcr.com/exampleregistry -u exampleuser -p registry-password
```
Tag the Llama.cpp container image with your desired Vultr Container Registry tag. For example, sjc.vultrcr.com/exampleregistry/llama-gpu-image.
console
```
$ docker tag llama-gpu-image sjc.vultrcr.com/exampleregistry/llama-gpu-image
```

View all Docker images on the server and verify that the new tagged image is available.

console

$ docker images

Output:

REPOSITORY                                  TAG       IMAGE ID       CREATED          SIZE
sjc.vultrcr.com/exampleregistry/llama-gpu-image     latest    40421f1469cf   3 minutes ago    2.08GB
llama-gpu-image                             latest    40421f1469cf   3 minutes ago    2.08GB

Push the tagged image to your Vultr Container Registry repository.
console
```
$ docker push sjc.vultrcr.com/exampleregistry/llama-gpu-image
```
Open your Vultr Container Registry management panel and click Repositories on the top navigation bar to verify that the new repository is available.

Tags:

Container Image

Vultr Container Registry

Llama.cpp

Comments

No comments yet.

How to Build a Llama.cpp Container Image

Introduction

Prerequisites

Set Up the Server

Build a Llama.cpp Container Image for CPU Systems

Build a Llama.cpp Container Image for GPU Systems

Upload the Llama.cpp Container Image to the Vultr Container Registry

Comments

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Blogs