ArticlesRocky Linux

Udev Rules Causing RAID Device Creation to Hang

Introduction

udev provides a dynamic device directory containing only the files for actually present devices. It creates or removes device node files usually located in the /dev directory, or it renames network interfaces.

As part of the hotplug subsystem, udev is executed if a kernel device is added or removed from the system. On device creation, udev reads the sysfs directory of the given device to collect device attributes such as label, serial number or bus device number. These attributes may be used as keys to determine a unique name for the device.

On another hand, RAID devices are virtual devices created from two or more real block devices. This allows multiple devices (typically disk drives or partitions thereof) to be combined into a single device to hold, for example, a single filesystem. Some RAID levels include redundancy and so can survive some degree of device failure. Linux Software RAID devices are implemented through the md (Multiple Devices) device driver. More information can be found at the mdadm man page.

Problem

The user may have created udev rules to handle the RAID device creation, causing the md0_sync process to hang.

Symptoms

Please note that the replication of this issue is random, as the race condition does not happen every time. You may have to run the below mdam command a few times, in order to recreate the race condition.

The following udev rules were used:

SUBSYSTEM=="block",ACTION=="add|change",KERNEL=="md*",\
ATTR{md/sync_speed_max}="2000000",\
ATTR{md/group_thread_cnt}="64",\
ATTR{md/stripe_cache_size}="8192",\
ATTR{queue/nomerges}="2",\
ATTR{queue/nr_requests}="1023",\
ATTR{queue/rotational}="0",\
ATTR{queue/rq_affinity}="2",\
ATTR{queue/scheduler}="none",\
ATTR{queue/add_random}="0",\
ATTR{queue/max_sectors_kb}="4096"

Then to create the RAID device, the following command was run:

$ mdadm --create --verbose --chunk=128 --level=6 -n 4 /dev/md0 /dev/vda /dev/vdb /dev/vdc /dev/vdd
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: size set to 52395008K
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

You can check the status of the RAID creation by checking the /proc/mdstat file:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 vdd[3] vdc[2] vdb[1] vda[0]
      104790016 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] [UUUU]
      [>....................]  resync =  4.2% (2203904/52395008) finish=3.7min speed=220390K/sec

When you check for a second time, you may find the file is empty or the resync completion percentage status has not moved.

You could find the following errors in the logs:

Sep 11 18:39:21 rockylinux.com systemd-udevd[3394]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument
Sep 12 10:22:33 rockylinux.com systemd-udevd[570894]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument
Sep 12 10:43:13 rockylinux.com systemd-udevd[3450]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument
Sep 12 12:21:44 rockylinux.com systemd-udevd[65370]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument

The md0_resync task will hang with the following logs (it may cause the kernel to panic and crash, if you have kernel.hung_task_panic = 1 included in your /etc/sysctl.conf file):

Sep 10 10:40:38 rockylinux.com kernel: INFO: task md0_resync:2365161 blocked for more than 122 seconds.
Sep 10 10:40:38 rockylinux.com kernel:      Tainted: G S      W6.6.32-120240613.el9.x86_64 #1
Sep 10 10:40:38 rockylinux.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 10 10:40:38 rockylinux.com kernel: task:md0_resync      state:D stack:0     pid:2365161 ppid:2      flags:0x00004000
Sep 10 10:40:38 rockylinux.com kernel: Call Trace:
Sep 10 10:40:38 rockylinux.com kernel: <TASK>
Sep 10 10:40:38 rockylinux.com kernel: __schedule+0x222/0x660
Sep 10 10:40:38 rockylinux.com kernel: schedule+0x5e/0xd0
Sep 10 10:40:38 rockylinux.com kernel: md_do_sync+0xcef/0x1030
Sep 10 10:40:38 rockylinux.com kernel: ? sched_clock_cpu+0xf/0x190
Sep 10 10:40:38 rockylinux.com kernel: ? membarrier_register_private_expedited+0xa0/0xa0
Sep 10 10:40:38 rockylinux.com kernel: md_thread+0xb0/0x160
Sep 10 10:40:38 rockylinux.com kernel: ? super_90_load.part.0+0x350/0x350
Sep 10 10:40:38 rockylinux.com kernel: kthread+0xe3/0x110
Sep 10 10:40:38 rockylinux.com kernel: ? kthread_complete_and_exit+0x20/0x20
Sep 10 10:40:38 rockylinux.com kernel: ret_from_fork+0x31/0x50
Sep 10 10:40:38 rockylinux.com kernel: ? kthread_complete_and_exit+0x20/0x20
Sep 10 10:40:38 rockylinux.com kernel: ret_from_fork_asm+0x11/0x20
Sep 10 10:40:38 rockylinux.com kernel: </TASK>

Resolution

To conclude from the above findings, the udev rule for the RAID device may be triggering changes too early, leading to a race condition during the md0_resync task. Removing this rule fixes the issue.

One potential solution is to use the udevadm settle approach or introduce a delay to avoid the race condition from occurring.

Alternatively, monitoring the md0 status before applying devices changes via a script, could help to ensure the resync completes without udev interference.

Root Cause

This issue is related to common udev limitations. At the time of a udev query, some storage devices might not be accessible. There might be a delay between event generation and processing, especially with numerous devices involved, causing a lag between kernel detection and link availability. External programs invoked by udev rules, such as blkid, might briefly open the device, making it temporarily inaccessible for other tasks.

References & related articles

udev man page
mdadm man page
udevadm man page