ArticlesWarewulf

Install NVIDIA Drivers in a Warewulf Container

drivershow-toinstallationnvidiawarewulf

Stephen Simpson
Senior Customer Support Engineer

Jun 10, 2024

Introduction

There are multiple ways to create an image for Warewulf that includes the NVIDIA drivers. This article will explore how to work with a Containerfile or modify a regular image.

Prerequisites

This guide assumes that a Warewulf server has been installed and configured, and at least one node has been deployed.

Instructions

Create an image using a Containerfile

Warewulf supports creating images directly from containers, enabling you to build a custom image that includes the required NVIDIA drivers. The Warewulf-images GitHub repository has an example Containerfile that we can work from. Start by installing Podman on the server you want to build the image on. To keep things simple, we will install this on our Warewulf host server to streamline the import process:

sudo dnf install -y podman

Next, we'll create a directory to add our Containerfile to:

mkdir rockylinux-9-nvidia
cd rockylinux-9-nvidia

Now create a new file called Containerfile and add the following to it:

sudo tee ./Containerfile <<EOF
FROM ghcr.io/warewulf/warewulf-rockylinux:9

RUN dnf -y install dnf-plugins-core epel-release kernel-headers \
    && dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(arch)/cuda-rhel9.repo \
    && dnf -y module install nvidia-driver:latest-dkms \
    && dnf -y install datacenter-gpu-manager \
    && dnf clean all \
    && for dir in /usr/src/kernels/*; do dkms autoinstall --kernelver $(basename $dir); done \
    && dkms status
EOF

Modify the Containerfile as needed to suit your environment. Next, build the image:

podman build -t rockylinux-9-nvidia:v1 .

Feel free to replace v1 with a tag or versioning system that better fits your personal preference or corporate policy if needed. If you see an error cannot apply additional memory protection after relocation: Permission denied, you may need to rerun with SELinux disabled or configured:

podman build --security-opt label=disable -t rockylinux-9-nvidia:v1 .

Once the build finishes, save this to a tar file and then import the image into Warewulf:

podman save -o rockylinux-9-nvidia-v1.tar localhost/rockylinux-9-nvidia:v1
wwctl image import file://rockylinux-9-nvidia-v1.tar rockylinux-9-nvidia-v1

Once the import finishes, you can then assign the image to a profile or node as you would any other image. You can find an example of this in the following section within this guide. Another benefit of using Containerfiles, is their ability to be integrated into a CI/CD pipeline to enable version control and automated building.

Modifying a regular image

Warewulf includes the ability to shell into an image via a shell. This allows you to modify the image and easily and quickly install components like NVIDIA drivers. You first need to start by either copying an existing image or importing a new image. You can use wwctl image list to view your current images if you would like to build off an existing one. You can, of course, edit any existing image; however, we always recommend working from a copy or creating a backup first.

In this example, you are going to start with a fresh copy of Rocky Linux 9. You can do so by importing an image from the Warewulf repository:

wwctl image import docker://ghcr.io/warewulf/warewulf-rockylinux:9 rockylinux-nvidia-9

In this example, the name of our image will be rockylinux-nvidia-9. Feel free to change this to match your environment or to add additional information such as the date or version. Once the image is finished importing, you can shell into it and install the NVIDIA drivers:

wwctl image shell rockylinux-nvidia-9

You can find more detailed instructions for installing NVIDIA drivers here. The following are example steps for Rocky Linux 9:

# Install necessary packages and add the NVIDIA repository
dnf -y install dnf-plugins-core epel-release kernel-headers
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(arch)/cuda-rhel9.repo
# Install the latest NVIDIA driver
dnf -y module install nvidia-driver:latest-dkms

We can verify this installed correctly by checking for the dynamically loaded kernel:

dkms status | grep nvidia
nvidia/580.95.05, 5.14.0-570.49.1.el9_6.x86_64, x86_64: installed

Once you have finished installing the driver and any other applications you need, type exit to leave the container. Warewulf should rebuild the container once you are finished, however you can run the following command manually to verify the container was built:

wwctl image build

Running this command will not return anything if the image has already been built/updated. If applicable, you can sync local users into the container. You can read more about the syncuser subcommand here:

wwctl image syncuser --write rockylinux-nvidia-9

If you observe WARN : syncuser cannot determine what name should be assigned to id number and ERROR: error in synchronize messages, add the users that are highlighted in the WARN message with useradd <USERNAME> and then run the wwctl image syncuser command again. You may need to add multiple users.

Once the syncuser command completes, you can assign this new image to a node. Assigning it directly to a node allows you to boot and test your newly created image before pushing it out to a profile. You can skip directly to assigning this to a profile if you desire:

wwctl node set node1 --image rockylinux-nvidia-9

Once you have tested your image, you can assign it to the necessary profile. In the example below, we are using the default profile:

wwctl profile set default --image rockylinux-nvidia-9

Finally, we rebuild the overlays:

wwctl overlay build

Upon reboot, your nodes will now boot with the newly created image! You can find more information about Warewulf images as well as more advanced configuration examples and instructions in the Warewulf documentation.

References & related articles

Warewulf Documentation
Warewulf Rocky Linux 9 NVIDIA Container Example
Hands on Warewulf: Solving Cluster Provisioning & Management
Warewulf: Deep Dive, Use Cases, and Examples