Install Nvidia Drivers in a WareWulf Container
Introduction
WareWulf includes the ability to enter into an image via a shell. This allows you to modify the container image and install components like Nvidia drivers easily and quickly.
Resolution
We first need to start by either copying an existing container or downloading a new container. You can use wwctl container list
to view your current images if you would like to build off an existing container. You can of course edit any existing container, however, we always recommend either workling off of a copy or creating a backup first.
In this example we are going to start by downloading a fresh copy of Rocky Linux 9. We can do so by grabbing an image from the WareWulf repository.
wwctl container import docker://ghcr.io/warewulf/warewulf-rockylinux:9 rockylinux-nvidia-9
The name of our container will be rockylinux-nvidia-9
in this example. Feel free to change this to match your environment or to add additional information such as the date or version. Once the container is finished downloading, we can enter into this container's shell and install the Nvidia drivers.
wwctl container shell rockylinux-nvidia-9
You can read more about how to install the Nvidia drivers here. The following are example steps for Rocky Linux 9.
# Install necessary packages and add the Nvidia repository
dnf -y install dnf-plugins-core epel-release kernel-headers
dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(arch)/cuda-rhel9.repo
# Install the latest Nvidia driver
dnf -y module install nvidia-driver:latest-dkms
We can verify this installed correctly by checking for the dynamically loaded kernel
# dkms status | grep nvidia
nvidia/555.42.02, 5.14.0-427.20.1.el9_4.x86_64, x86_64: installed
Once you have finished installing the driver and any other applications you need, type exit
to leave the container. Warewulf should rebuild the container once you are finished but you can run this manually to verify the container was built.
wwctl container build
Running this command will not return anything if the image has already been built/updated. If applicable we can sync our local users into our container. You can read more about the syncuser
subcommand here.
wwctl container syncuser --write rockylinux-nvidia-9
Once the syncuser command completes, we can assign this new image to a node. Assigning it directly to a node allows us to boot and test our newly created container before pushing it out to a profile. You can skip directly to assigning this to a profile if you desire.
wwctl node set node1 --container rockylinux-nvidia-9
Once you have tested your container image you can assign the container to the necessary profile. In our example we are sticking with the default profile
wwctl profile set default --container rockylinux-nvidia-9
And finally we can build the overlay
wwctl overlay build
Upon reboot, your nodes will now log in with the newly created container image! You can find more information about WareWulf containers as well as more advanced configuration examples and instructions in the WareWulf documentation.
References & related articles
WareWulf Documentation
WareWulf Rocky Linux 9 Nvidia Container Example
Hands on Warewulf: Solving Cluster Provisioning & Management
Warewulf: Deep Dive, Use Cases, and Examples