How to Enable Kdump in Warewulf Nodes
Introduction
Kdump utilizes kexec
to swiftly boot into a dump-capture kernel whenever a memory dump of the system kernel is required, such as during a system panic. This process preserves the system kernel's memory image across the reboot, making it accessible to the dump-capture kernel.
You will learn to how to enable kdump
on one of your Compute Nodes in your Warewulf cluster. This is essential for when a kernel panic occurs and for performing an in-depth analysis at the kernel level.
Prerequisites
This guide assumes that a Warewulf server is successfuly installed and nodes are able to boot from a Rocky Linux image.
In many cases, the kdump
service is installed and activated by default on new Linux installations, but some installation options do not have to install or enable kdump
by default.
If you do not know whether kdump
is installed on your system, you can "shell into" an image with wwctl image shell
like the example below. For Warewulf 4.5.x versions, you can use wwctl container shell
.
You can "shell into" the image with wwctl image shell
. For 4.5.x versions you can use wwctl container shell
.
[root@Warewulf ~]# wwctl image shell rockylinux-9
Image build will be skipped if the shell ends with a non-zero exit code.
From there you can check if the RPM has been installed by running rpm -q kexec-tools
:
[warewulf:rockylinux-9] /# rpm -q kexec-tools
package kexec-tools is not installed
If you need to install kexec-tools
, please run the command dnf install kexec-tools
. This command secures installation of the userspace tools for kexec
:
[warewulf:rockylinux-9] /# dnf install kexec-tools
Last metadata expiration check: 0:01:12 ago on Wed Mar 5 18:07:55 2025.
Dependencies resolved.
=============================================================================================================================================================================================
Package Architecture Version Repository Size
=============================================================================================================================================================================================
Installing:
kexec-tools x86_64 2.0.27-16.el9_5.1 baseos 477 k
...
Complete!
Installation instructions
Configure kdump
The Linux kernel accepts several boot-time arguments. For example, Warewulf currently specifies the following arguments by default:
quiet crashkernel=no vga=791 net.naming-scheme=v238
.
You can specify a different set of kernel arguments for a node or profile using --kernelargs
. This value is recorded on a node or profile as the Kernel.Args
field.
To set this at the profile level, run this command:
wwctl profile set default --kernelargs 'quiet,crashkernel=512M,vga=791,net.naming-scheme=v238'
Similarly, to set this at the node level, run this command:
wwctl node set node01 --kernelargs 'quiet,crashkernel=512M,vga=791,net.naming-scheme=v238'
To specify the memory reserved for the kdump
kernel, set the crashkernel=
option to the required memory value. For example, to reserve 128 MB of memory, you can use crashkernel=128M
.
NOTE: For newest Warewulf versions, kernel args
is set as a list to be able to combine them between profiles and nodes, therefore, these values may get appended. You potentially have to negate default values by adding "~
". An example is: ~crashkernel=no
.
Changing the excludes
file inside the container to allow copying the /boot
directory to the nodes
Warewulf can exclude files from an image to prevent them from being delivered to the compute node. This is typically used to reduce the size of the image when some files are unnecessary. Patterns for excluded files are read from the file /etc/warewulf/excludes
in the image itself. For example, the default Rocky Linux images exclude these paths:
/boot/
/usr/share/GeoIP
To remove /boot/
from the excludes
file, shell into the image:
[root@Warewulf ~]# wwctl image shell rockylinux-9
Image build will be skipped if the shell ends with a non-zero exit code.
[warewulf:rockylinux-9] /# sed -i '/\/boot\//d' /etc/warewulf/excludes
[warewulf:rockylinux-9] /# cat /etc/warewulf/excludes
/usr/share/GeoIP
Since kdump
needs the vmlinuz
file to boot the dump kernel, you need to add a symlink to /usr/lib/modules/<kernel_version>/<vmlinuz file>
, as this file is not present on /boot
:
[warewulf:rockylinux-9] /# KERNEL_VERSION=$(ls /usr/lib/modules | head -n 1)
ln -s /usr/lib/modules/$KERNEL_VERSION/vmlinuz-$KERNEL_VERSION /boot/vmlinuz-$KERNEL_VERSION
[warewulf:rockylinux-9] /# ls /boot/ | grep vmlinuz
vmlinuz-5.14.0-503.19.1.el9_5.x86_64
Configuring the kdump target
When a kernel crash is captured, the core dump can be stored in various ways: as a file in a local file system, directly on a device, or sent over a network using the NFS (Network File System) or SSH (Secure Shell) protocols. Only one of the three options can be configured at a time. By default, the vmcore
file is stored in the /var/crash
directory of the local file system. To verify this, check the /etc/kdump.conf
file's contents. In this example, NFS is used, since the Warewulf nodes will clear anything in memory after a reboot, and anything added to /var/crash
will be removed.
An example from the /etc/kdump.conf
file:
# nfs <nfs mount>
# - Will mount nfs to <mnt>, and copy /proc/vmcore to
# <mnt>/<path>/%HOST-%DATE/, supports DNS.
Configure kdump
manually on the container
Shell into the image and run the following commands:
[warewulf:rockylinux-9] /# grep -v ^# /etc/kdump.conf
auto_reset_crashkernel yes
path /var/crash
core_collector makedumpfile -l --message-level 7 -d 31
[warewulf:rockylinux-9] /# vi /etc/kdump.conf
# Edit the fields accordingly
# If adding nfs please comment path
auto_reset_crashkernel yes
#path /var/crash
nfs nfs_server_ip:/kdump/path
core_collector makedumpfile -l --message-level 7 -d 31
Configuring kdump using overlays
To create the kdump
overlay, please run:
$ wwctl overlay create kdump
Edit the kdump.conf.ww
file (this command will create the overlay template):
$ wwctl overlay edit --parents kdump /etc/kdump.conf.ww
Add the following configuration after the # This file is autogenerated by warewulf
line:
auto_reset_crashkernel yes
core_collector makedumpfile -l --message-level 7 -d 31
{{ if .Tags.kdump_nfs_location }}
nfs {{ .Tags.kdump_nfs_location }}
{{ else }}
path /var/crash
{{ end }}
Check that the new changes were applied successfully:
wwctl overlay show kdump /etc/kdump.conf.ww
Display the changes on a node to confirm it is working as expected:
$ wwctl overlay show -r node01 kdump /etc/kdump.conf
backupFile: true
writeFile: true
Filename: /etc/kdump.conf
# This is a Warewulf Template file.
#
# This file (suffix '.ww') will be automatically rewritten without the suffix
# when the overlay is rendered for the individual nodes. Here are some examples
# of macros and logic which can be used within this file:
#
# Node FQDN = node01
# Node Cluster = oso
# Network Config = <no value>, <no value>, etc.
#
# Go to the documentation pages for more information:
# https://warewulf.org/docs/main/contents/overlays.html
#
# Keep the following for better reference:
# ---
# This file is autogenerated by warewulf
auto_reset_crashkernel yes
core_collector makedumpfile -l --message-level 7 -d 31
path /var/crash
Change the tag to add the NFS path by using the --tagadd
flag. In the following example, the path used was 172.16.131.5:/var/crash/
:
$ wwctl profile set default --tagadd kdump_nfs_location=172.16.131.5:/var/crash/
Are you sure you want to modify 1 profile(s): y
Render the changes on a node to confirm the changes:
$ wwctl overlay show -r node01 kdump /etc/kdump.conf
# Removing the extra lines to shorten the output
...
nfs 172.16.131.5:/var/crash/
Add the newly created overlay to the default profile, and boot the node:
wwctl profile set default -O $(wwctl profile list default -a | grep SystemOverlay | awk '{print $3,kdump}')
Testing the kdump configuration
⚠️ WARNING The commands below cause the kernel to crash. Use caution when following these steps, and by no means run them on a production system.
To test the configuration, ssh
into the node, and make sure that the service is running:
$ systemctl status kdump.service
#If the service is active, please run the following commands:
$ echo 1 > /proc/sys/kernel/sysrq
$ echo c > /proc/sysrq-trigger
The above commands force the Linux kernel to crash, and the address-YYYY-MM-DD-HH:MM:SS/vmcore
file is copied to the location you have selected in the configuration (that is, to /var/crash/
by default).