How to Deploy Deepseek R1 Reasoning Large Language Model (LLM) Using SGLang
Deepseek R1 is a first-generation reasoning model designed to excel in mathematical, coding, and logical reasoning tasks. It leverages reinforcement learning (RL) with a carefully integrated cold-start phase to enhance readability, coherence, and reasoning capabilities. This approach helps the model generate clear, well-structured responses while minimizing issues like repetition and language mixing. Deepseek R1 is optimized for high-quality reasoning, making it a powerful tool for tackling complex problem-solving tasks.
In this article, you will deploy Deepseek R1 on MI300X Vultr Cloud GPU due to large VRAM requirements using SGlang and configure the model for inference. By leveraging Vultr’s high-performance cloud infrastructure, you can efficiently set up Deepseek R1 for advanced reasoning tasks.
Prerequsites
- Contact sales to gain access to an AMD Instinct™ MI300X instance.
Deployment Steps
In this section, you will install the necessary dependencies, build a ROCm-supported container image, and deploy the SGlang inference server with Deepseek R1 on Vultr Cloud GPU. You will then verify the deployment by sending an HTTP request to test the model's inference response.
Install Hugging Face Command Line Interface (CLI) package.
console$ pip install huggingface_hub[cli]
Download the Deepseek R1 model.
console$ huggingface-cli download deepseek-ai/DeepSeek-R1
The above command downloads the model on to the
$HOME/.cache/huggingface
directory. It is recommended to download the model in the background and proceed with the next steps, as the model is very large in size and is not required until you run the container image.Clone the SGLang inference server repository.
console$ git clone https://github.com/sgl-project/sglang.git
Build a ROCm supported container image.
console$ cd sglang/docker $ docker build --build-arg SGL_BRANCH=v0.4.2 -t sglang:v0.4.2-rocm620 -f Dockerfile.rocm .
The above command builds a container image named
sglang:v0.4.2-rocm620
using theDockerfile.rocm
manifest. This step may require upto 30 minutes.If you face the
error: RPC failed; curl 56 GnuTLS recv error
error at the time of container image build, you can try to add the following lines to theDockerfile.rocm
file before the statements for cloning repositories.DockerfileRUN git config --global http.postBuffer 1048576000 RUN git config --global https.postBuffer 1048576000
Additionally, if you face connection timeouts during the build time, you can try to run the process again to re-establish the connection. Docker is able to cache portions of the build process to ensure efficient use of time and resources.
Run the SGlang inference server container.
console$ docker run -d --device=/dev/kfd --device=/dev/dri --ipc=host \ --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ -v $HOME/dockerx:/dockerx -v $HOME/.cache/huggingface:/root/.cache/huggingface \ --shm-size 16G -p 30000:30000 sglang:v0.4.2-rocm620 \ python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --host 0.0.0.0 --port 30000
The above command runs the SGlang inference server container in detached mode with ROCm support, enabling GPU access and necessary permissions. It mounts required directories, allocates shared memory, and starts the server on port
30000
using the Deepseek R1 model with tensor parallelism (TP) set to 8.Send a HTTP request to verify inference response.
console$ curl http://localhost:30000/v1/chat/completions \ -H "Content-Type: application/json" \ -d "{\"model\": \"deepseek-ai/DeepSeek-R1\", \"messages\": [{\"role\": \"user\", \"content\": \"I am running Deepseek on Vultr powered by AMD Instinct MI300X. What's next?\"}], \"temperature\": 0.7}"
Optional: Allow incoming connections on port 30000.
console$ sudo ufw allow 30000
Conclusion
In this article, you successfully deployed Deepseek R1 on MI300X Vultr Cloud GPU using SGlang and prepared the model for inference. By leveraging Vultr’s high-performance infrastructure, you have set up an optimized environment for running Deepseek R1 efficiently. With the model now ready, you can utilize its advanced reasoning capabilities for various applications.