Udev Rules Causing RAID Device Creation to Hang
Introduction
udev
provides a dynamic device directory containing only the files for actually present devices. It creates or removes device node files usually located in the /dev
directory, or it renames network interfaces.
As part of the hotplug subsystem, udev
is executed if a kernel device is added or removed from the system. On device creation, udev
reads the sysfs
directory of the given device to collect device attributes such as label, serial number or bus device number. These attributes may be used as keys to determine a unique name for the device.
On another hand, RAID devices are virtual devices created from two or more real block devices. This allows multiple devices (typically disk drives or partitions thereof) to be combined into a single device to hold, for example, a single filesystem. Some RAID levels include redundancy and so can survive some degree of device failure. Linux Software RAID devices are implemented through the md
(Multiple Devices) device driver. More information can be found at the mdadm man page.
Problem
The user may have created udev
rules to handle the RAID device creation, causing the md0_sync
process to hang.
Symptoms
Please note that the replication of this issue is random, as the race condition does not happen every time. You may have to run the below mdam
command a few times, in order to recreate the race condition.
The following udev rules were used:
SUBSYSTEM=="block",ACTION=="add|change",KERNEL=="md*",\
ATTR{md/sync_speed_max}="2000000",\
ATTR{md/group_thread_cnt}="64",\
ATTR{md/stripe_cache_size}="8192",\
ATTR{queue/nomerges}="2",\
ATTR{queue/nr_requests}="1023",\
ATTR{queue/rotational}="0",\
ATTR{queue/rq_affinity}="2",\
ATTR{queue/scheduler}="none",\
ATTR{queue/add_random}="0",\
ATTR{queue/max_sectors_kb}="4096"
Then to create the RAID device, the following command was run:
$ mdadm --create --verbose --chunk=128 --level=6 -n 4 /dev/md0 /dev/vda /dev/vdb /dev/vdc /dev/vdd
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: size set to 52395008K
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
You can check the status of the RAID creation by checking the /proc/mdstat
file:
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 vdd[3] vdc[2] vdb[1] vda[0]
104790016 blocks super 1.2 level 6, 128k chunk, algorithm 2 [4/4] [UUUU]
[>....................] resync = 4.2% (2203904/52395008) finish=3.7min speed=220390K/sec
When you check for a second time, you may find the file is empty or the resync
completion percentage status has not moved.
You could find the following errors in the logs:
Sep 11 18:39:21 rockylinux.com systemd-udevd[3394]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument
Sep 12 10:22:33 rockylinux.com systemd-udevd[570894]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument
Sep 12 10:43:13 rockylinux.com systemd-udevd[3450]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument
Sep 12 12:21:44 rockylinux.com systemd-udevd[65370]: nvme2n1: /etc/udev/rules.d/udev.rules:53 Failed to write ATTR{/sys/devices/pci0000:60/0000:60:03.3/0000:65:00.0/nvme/nvme2/nvme2n1/queue/max_sectors_kb}, ignoring: Invalid argument
The md0_resync
task will hang with the following logs (it may cause the kernel to panic and crash, if you have kernel.hung_task_panic = 1
included in your /etc/sysctl.conf
file):
Sep 10 10:40:38 rockylinux.com kernel: INFO: task md0_resync:2365161 blocked for more than 122 seconds.
Sep 10 10:40:38 rockylinux.com kernel: Tainted: G S W6.6.32-120240613.el9.x86_64 #1
Sep 10 10:40:38 rockylinux.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 10 10:40:38 rockylinux.com kernel: task:md0_resync state:D stack:0 pid:2365161 ppid:2 flags:0x00004000
Sep 10 10:40:38 rockylinux.com kernel: Call Trace:
Sep 10 10:40:38 rockylinux.com kernel: <TASK>
Sep 10 10:40:38 rockylinux.com kernel: __schedule+0x222/0x660
Sep 10 10:40:38 rockylinux.com kernel: schedule+0x5e/0xd0
Sep 10 10:40:38 rockylinux.com kernel: md_do_sync+0xcef/0x1030
Sep 10 10:40:38 rockylinux.com kernel: ? sched_clock_cpu+0xf/0x190
Sep 10 10:40:38 rockylinux.com kernel: ? membarrier_register_private_expedited+0xa0/0xa0
Sep 10 10:40:38 rockylinux.com kernel: md_thread+0xb0/0x160
Sep 10 10:40:38 rockylinux.com kernel: ? super_90_load.part.0+0x350/0x350
Sep 10 10:40:38 rockylinux.com kernel: kthread+0xe3/0x110
Sep 10 10:40:38 rockylinux.com kernel: ? kthread_complete_and_exit+0x20/0x20
Sep 10 10:40:38 rockylinux.com kernel: ret_from_fork+0x31/0x50
Sep 10 10:40:38 rockylinux.com kernel: ? kthread_complete_and_exit+0x20/0x20
Sep 10 10:40:38 rockylinux.com kernel: ret_from_fork_asm+0x11/0x20
Sep 10 10:40:38 rockylinux.com kernel: </TASK>
Resolution
To conclude from the above findings, the udev
rule for the RAID device may be triggering changes too early, leading to a race condition during the md0_resync
task. Removing this rule fixes the issue.
One potential solution is to use the udevadm settle
approach or introduce a delay to avoid the race condition from occurring.
Alternatively, monitoring the md0
status before applying devices changes via a script, could help to ensure the resync
completes without udev
interference.
Root Cause
This issue is related to common udev
limitations. At the time of a udev
query, some storage devices might not be accessible. There might be a delay between event generation and processing, especially with numerous devices involved, causing a lag between kernel detection and link availability. External programs invoked by udev
rules, such as blkid
, might briefly open the device, making it temporarily inaccessible for other tasks.