Maximizing NVIDIA GPU Performance on Rocky Linux 8
Introduction
Optimizing NVIDIA GPU performance on Rocky Linux 8 is essential for computationally intensive tasks.
Performance inconsistencies can emerge, particularly when GPUs enter low-power states or incorrectly fall back to CPU rendering, significantly impacting workloads.
Problem
NVIDIA GPUs have been known to intermittently drop performance to very low levels due to incorrect power states.
Symptoms
Users experience initially acceptable GPU performance that rapidly deteriorates.
Tasks such as rotating graphical models take multiple seconds, indicating a fallback to CPU rendering or low GPU utilization.
Monitoring tools such as with nvidia-smi report sometimes observe GPUs stuck in performance mode P8 (low-power idle state) and show GPU utilization as minimal or zero during running of workloads.
Resolution
Environment Prerequisites:
-
Rocky Linux 8
-
NVIDIA GPU
-
rootorsudoprivileges -
For NVIDIA GPU driver installation, please follow this guide.
Steps to Resolution:
The example steps below were tested using an NVIDIA 1060 GPU on Rocky Linux 8.10.
- Verify and set optimal tuned profile (for an HPC setup, the recommended profile is
hpc-compute):
tuned-adm list
sudo tuned-adm profile hpc-compute
tuned-adm active
- Set GPU persistence and P-State permanently:
echo 'options nvidia NVreg_RegistryDwords="RMForcePstate=0"' | sudo tee /etc/modprobe.d/nvidia.conf
sudo dracut --force
sudo reboot
- GPU state checked with
nvidia-smibefore the above changes were made (observe thatP8is listed, equivalent toBasic HD video playback):
| 0 NVIDIA GeForce GTX 1060 6GB Off |
| 40% 35C P8 10W / 120W |
- GPU state after the above changes with
nvidia-smi(observe that the P-State has now changed toP0orMaximum 3D performance):
| 0 NVIDIA GeForce GTX 1060 6GB Off |
| 40% 35C P0 24W / 120W |
Root cause
Due to incorrect GPU persistence settings, the GPU fails to be permanently set in a high power state.
Notes
- Useful host-specific performance troubleshooting commands:
sudo dmidecode -t bios
sudo cpupower frequency-info
sudo lshw -class processor
sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
tuned-adm active
tuned-adm list
- NVIDIA GPU-specific troubleshooting commands:
nvidia-smi
nvidia-smi -q -d PERFORMANCE
nvidia-smi -q -d POWER
nvidia-smi -q | grep -i "Persistence Mode"
nvidia-smi -q -d CLOCK
nvidia-smi --query-gpu=clocks.current.graphics,clocks.current.sm,clocks.current.memory --format=csv
- Ensuring persistent high-performance GPU states helps to resolve the majority of performance problems in computational and graphical workloads.
References & related articles
NVIDIA GPU Driver Installation on Rocky Linux
NVIDIA GPU P-State Descriptions