
NVIDIA drivers are essential for enabling GPU acceleration on Bare Metal instances, but mismatched versions or failed updates may cause instability, performance drops, or NVML related errors. In such cases, you may need to downgrade to a stable release or reinstall the current driver to restore proper functionality.
- Bare Metal: Drivers are installed directly on dedicated hardware.
- Passthrough: Drivers are installed inside a virtual machine where a physical GPU is assigned from the host. These instances do not require Fabric Manager as GPU initialization and communication are handled entirely by the guest driver, and NVSwitch is managed at the host level.
Follow this guide to remove, reinstall, or downgrade NVIDIA drivers on Vultr Bare Metal and Passthrough GPU instances to keep your workloads running efficiently.
Prerequisite
Before you begin, you need to:
- Have access to a GPU-enabled Bare Metal instance or Passthrough GPU instance as a non-root user with sudo privileges.
- Ensure that no GPU workloads are running during the driver removal or installation process.
Install DKMS Package
The NVIDIA driver uses the Dynamic Kernel Module Support (DKMS) framework to automatically rebuild kernel modules whenever the kernel updates. This ensures the NVIDIA driver remains functional after system upgrades.
Update the package index.
console$ sudo apt update
Install the dkms package.
console$ sudo apt install -y dkms
View the dkms package version.
console$ dkms --version
A version number in the output verifies that DKMS is installed correctly.
Remove Existing NVIDIA Drivers
Before downgrading or reinstalling, you must completely remove the existing NVIDIA drivers and related CUDA packages. This ensures there are no conflicts during the new installation.
Remove CUDA, cuBLAS, and Nsight packages.
console$ sudo apt-get --assume-yes --purge remove "*cublas*" "cuda*" "nsight*"
Remove NVIDIA drivers and libraries.
console$ sudo apt-get --assume-yes --purge remove "*nvidia*"
Reboot the system to unload any remaining driver modules.
console$ sudo reboot
Configure the Official NVIDIA Repository
To install the NVIDIA drivers, you need access to the official NVIDIA repository. Adding the repository lets you fetch specific driver versions and stay consistent with NVIDIA's distribution.
Set your Ubuntu version.
console$ UBUNTU_VERSION=$(lsb_release -rs | sed -e 's/\.//')
Download the NVIDIA keyring package.
console$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
Install the keyring package.
console$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
Download the repository signing key.
console$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-archive-keyring.gpg
Move the key to the keyrings directory.
console$ sudo mv cuda-archive-keyring.gpg /usr/share/keyrings/cuda-archive-keyring.gpg
Add the CUDA repository.
console$ echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda-ubuntu${UBUNTU_VERSION}-x86_64.list
Update the package index.
console$ sudo apt update
Install Latest NVIDIA Drivers
Install the appropriate driver and toolkit packages for your GPU model. The required packages differ depending on whether you are using B200 and newer GPUs or H100/A100 GPUs that require Fabric Manager. The following steps apply equally to Bare Metal GPU instances and Passthrough GPU instances, since both provide direct access to a physical NVIDIA GPU.
B200 and Newer GPUs (no Fabric Manager required)
For B200 and newer GPUs, install the NVIDIA open drivers, CUDA toolkit, NVLink libraries, and container runtime support.
Install the NVIDIA open drivers, CUDA toolkit, and NVLink libraries.
console$ sudo apt install --assume-yes nvidia-open cuda-toolkit nvlink5
Install the NVIDIA container runtime and supporting libraries.
console$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot the system to load the new drivers.
console$ sudo reboot
Verify the NVIDIA drivers are installed.
console$ nvidia-smi
H100 and Older GPUs (with Fabric Manager)
H100 and older data center GPUs, such as the A100, use NVLink and NVSwitch to enable high-bandwidth, peer-to-peer communication across multiple GPUs in the same system. To support these features, NVIDIA requires both the CUDA proprietary drivers and the Fabric Manager service. Without Fabric Manager running alongside the proper drivers, NVLink/NVSwitch interconnects will not function correctly.
Install the CUDA drivers, Fabric Manager, and CUDA toolkit.
console$ sudo apt install --assume-yes cuda-drivers-fabricmanager cuda-toolkit
Install the NVIDIA container runtime and supporting libraries.
console$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot the system to load the new drivers.
console$ sudo reboot
Verify that the NVIDIA drivers are installed.
console$ nvidia-smi
Enable and start the Fabric Manager service.
console$ sudo systemctl enable --now nvidia-fabricmanager
Verify that Fabric Manager is running.
console$ sudo systemctl status nvidia-fabricmanager
Output:
● nvidia-fabricmanager.service - NVIDIA fabric manager service Loaded: loaded (/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2025-09-02 13:47:35 UTC; 1h 17min ago Main PID: 4811 (nv-fabricmanage)
Install Specific Versions of NVIDIA Drivers and Packages
NVIDIA maintains driver branches to simplify installation of specific versions. Each GPU generation requires a strict minimum driver version.
- NVIDIA HGX-2 and HGX A100 systems: Minimum driver version 450.xx
- NVIDIA HGX H100 systems: Minimum driver version 525.xx
- NVIDIA HGX B200 and HGX B100 systems: Minimum driver version 570.xx
Append the version to the package name to install a specific driver branch. For example:
nvidia-open-570
: Installs the open driver 570 branch (recommended for B200 systems).cuda-drivers-550
: Installs the proprietary driver 550 branch (suitable for H100 systems).cuda-12-8
andcuda-toolkit-12-8
: Installs CUDA 12.8 and its toolkit.
$ sudo apt install nvidia-container-toolkit=VERSION
Example Installation using 570 Drivers with CUDA 12.8
NVIDIA provides multiple driver branches depending on the GPU generation. Follow the steps below to install the 570 driver branch with CUDA 12.8.
B200 and Newer GPUs
For B200 and other newer GPUs, install the open 570 driver branch and the CUDA 12.8 toolkit.
Install the NVIDIA drivers, CUDA toolkit, and NVLink support.
console$ sudo apt install --assume-yes nvidia-open-570 cuda-toolkit-12-8 nvlink5-570
Install the NVIDIA container runtime and supporting libraries.
console$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot the system to load the new drivers.
console$ sudo reboot
Verify that the NVIDIA drivers are installed.
console$ nvidia-smi
H100 and Older GPUs
For H100 and older GPUs, install the CUDA 570 driver branch with Fabric Manager and the CUDA 12.8 toolkit. Fabric Manager is required to enable NVLink/NVSwitch functionality in multi-GPU systems.
Install the CUDA drivers, Fabric Manager, and CUDA toolkit.
console$ sudo apt install --assume-yes cuda-drivers-fabricmanager-570 cuda-toolkit-12-8
Install the NVIDIA container runtime and supporting libraries.
console$ sudo apt install --assume-yes nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Reboot the system to load the new drivers.
console$ sudo reboot
Verify that the NVIDIA drivers are installed.
console$ nvidia-smi
Enable and start the Fabric Manager service.
console$ sudo systemctl enable --now nvidia-fabricmanager
Verify that Fabric Manager is running.
console$ sudo systemctl status nvidia-fabricmanager
Output:
● nvidia-fabricmanager.service - NVIDIA fabric manager service Loaded: loaded (/lib/systemd/system/nvidia-fabricmanager.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2025-09-02 13:47:35 UTC; 1h 17min ago Main PID: 4811 (nv-fabricmanage)
Conclusion
You have successfully downgraded or reinstalled NVIDIA drivers on your Bare Metal or Passthrough GPU instance and verified that the correct version is active. For systems that require Fabric Manager, you ensured NVLink and NVSwitch features are properly enabled. With the drivers, CUDA toolkit, and container runtime installed, your environment is now ready for high-performance GPU workloads and containerized deployments.
No comments yet.