I have hesitated a long time before I decided to write this.
The fact that it may help others made me decide to do it anyway.
I have ran this little gadget for years now and it is probably the most robust and unobtrusive piece of software in my system.
AMD GPU cooling
Although I really love my XFX RX 570 discrete GPU and it renders everything I throw at it with remarkable speed and without a hitch (yes, even FarCry 6) it has two drawbacks:
- The fans are absolutely great but also very loud when running fullspeed (4000 rpm).
- The card gets extremely hot under Linux (on MS-Windows it's ok)
The reason it runs hot under Linux is that there is no fancontrol so the fans are at a fixed speed of around 1453 rpm which is nice and silent but in my opinion inadequate when running very heavy loads.
For MS-Windows there is the AMD driver software suite that takes care of it but for Linux there was nothing, at least not when I wrote this script.
The familiy of AMD RX chips can handle a maximum temperature of a stunning 94°C ( 200°F ) before they either throttle or shut down but running the GPU close to that temperature is in my opinion way too hot to be healthy, possibly shortening the lifespan of the GPU considerably.
Goals:
- I want the maximum temperature set according to my wishes.
- The fan should be at full speed at maximum temperature regardless any other settings: Survival of the card is of utmost importance. It should never exceed the maximum temperature that I configure.
- The fan should stop (preventing unneeded wear and tear) when the temperature is below a certain level
- The script must allow for the possibility to balance temperature with fan noise because every build and every person is different.
- The script must be extremely simple. The reasons are
a. less chance on bugs, easier to spot them
b. easier to reason about how it works
c. much more robust (which is very important because GPUs are pretty expensive). - It must be an invisible "set it and forget it" script.
What it turned out to be:
All of the above, You can change the following parameters in the script:
- temperature for maximum fanspeed
- temperature where the fan kicks in
- temperature where the fan shuts off
- fansetting for minimum rpm (also prevents stalling)
At the end of the script there is a sleep statement that defines the time interval between measurements. It is set at 10 seconds which is frequent enough but you may set it to 5 or even less although I wouldn't go much lower than that because it will only consume more CPU time without resulting in any advantage in return.
What to check before saving the script
The GPU userspace-interface usually resides here : '/sys/class/drm/card0/device/hwmon/hwmon1'
The script will exit immediately if it doesn't so you might want to make sure that it really is there.
This is the code:
#!/bin/bash
## This is where the GPU-unit resides
cd /sys/class/drm/card0/device/hwmon/hwmon1 || exit
## give control back to the GPU-unit whenever we exit:
trap "echo 2 >pwm1_enable ; exit" INT TERM EXIT
## BEWARE!! temperature is expressed in °C * 1000 !!
t_max=80000 ## temperature for maximum fanspeed
t_fan_on=50000 ## temperature where the fan kicks in
t_fan_off=45000 ## temperature where the fan shuts off
f_min=100 ## fansetting for minimum rpm (also prevents stalling)
## don't touch the values below
f_max=255 ; f_off=0
while :
do
## get current temperature
t_current=$(<temp1_input)
## low temperature -> shut fan off
(( t_current < t_fan_off )) && (( f_set = f_off ))
## when in operational range ...
if (( t_current > t_fan_on ))
then
## ... calculate fansetting ...
(( f_set = f_max * ( t_current - t_fan_on ) / ( t_max - t_fan_on ) ))
## ... and prevent fan from dropping below minimum rpm
(( f_set < f_min )) && (( f_set = f_min ))
fi
echo 1 >pwm1_enable ## always take control of the GPU first
echo $f_set >pwm1 ## write preferred fanspeed setting to GPU
/usr/bin/sleep 10 ## polling interval in seconds (default: 10)
done
How does it work
I have to reveal that I am artistically challanged so forgive my doodling:
Things to keep in mind
-
Adjusting t_fan_on to a higher temperature (that is: to the right) will result in marginally less fan noise and a higher constant temperature
-
Adjusting t_fan_on to a higher temperature will also result in a steeper curve because the f_max/t_max point stays where it is.
Reason: fan must always be full speed at maximum temperature point. -
Setting the t_fan_on value too high will result in significant and reasonably fast rpm changes whenever it reaches the diagonal part of the curve which might get on your nerves.
-
the distance between t_fan_on and t_fan_off is to prevent the controller from oscillating around one switchpoint, also known as hysteresis.
-
Setting t_fan_off at a value lower than case temperature or minimum GPUtemperature wil, ofcourse, result in an always running fan.
-
Setting f_min equal to f_max will result in maximum fanspeed whenever the fan is on, always.
-
Keep f_on at least at 80 or 90 to prevent stalling of the fan
-
Keep t_fan_on at least 10°C lower than t_max
-
Keep t_fan_off at least a few °C lower than than t_fan_on:
testing
Save the script as '/usr/local/bin/fanctl' and make it executable and run it with
sudo /usr/local/bin/fanctl
starting it automatically at boot time
paste codeblock below in a terminal and press enter
cat<<\FANCTL | sudo tee /lib/systemd/system/fanctl.service
[Unit]
Description=AMDGPU Fancontrol Daemon
ConditionPathExists=/usr/local/bin/fanctl
ConditionFileIsExecutable=/usr/local/bin/fanctl
After=multi-user.target
[Service]
ExecStart=/usr/local/bin/fanctl
[Install]
WantedBy=multi-user.target
FANCTL
sudo ln -s /lib/systemd/system/fanctl.service /etc/systemd/system/multi-user.target.wants/fanctl.service
Extra
I also wrote a monitoring tool to keep an eye on temperature and fan behaviour.
It has the following features:
- showing fan rpm
- showing GPU temperature
- showing if the card is controlled by a fancontroller.
- showing if my fancontroller is doing that job
- starting and stopping of the fancontroller
- editing the fancontroller
I will post the code of this monitoring tool in a followup if any of you is interested.
Below you see a picture with this monitoring tool in action ( monitoring temperature and fanspeed ) while both Unigine Heaven and Furmark are heavily torturing (a.k.a. stresstesting) my GPU at the same time, this has been running for 10 minutes before I took the screenshot and it is keeping the GPU at a nice constant 72°C . Settings of the fancontroller are the same as in the code above.
Conclusion: project succeeded
If you have any questions or discover an error ( despite me having checking everything carefully in this post ). Please let me know in this thread.
EDIT 2024-08-09:
Today I have found some excellent documentation on the sysfs interface of AMDGPU.
You'll recognize one or two things that I've used in the script.
https://www.kernel.org/doc/html/v6.8/gpu/amdgpu/thermal.html