Help - first kernel issue

kerv · 28 January 2022 22:06

Every few days of uptime, I seem to get a kernel issue. The system is completely unresponsive in this state. This is a media server with 3 USB drives attached. I have basic working knowledge of Ubuntu, but no debugging skills when it comes to things like this.

Any thoughts?

pavlos_kairis · 28 January 2022 22:37

Seems you are running Ubuntu 18.04 LTS with a kernel 5.4.0-96

No idea why you get a kernel dump though. I would disconnect the usb drives and reboot. Does it come up? if so, can you update everything? I cant remember the latest kernel for 18.04 LTS.

kerv · 28 January 2022 23:17

Yeah, I might try updating to 20.04 with latest kernel. This is just a personal machine, so if something happens, not the end of the world.

kerv · 28 January 2022 23:20

Also - it doesn't come up immediately. Machine works fine for a few days and then all of the sudden (seems random) this dump happens with no warning.

I enabled kdump and such, so hopefully I'll see something in /var/crash. Plus, I hope the version + kernel upgrade fixes something. It's been a while since this was updated.

Norbert_X · 29 January 2022 07:05

@kerv welcome to Ubuntu MATE community!

I think that there is no need to upgrade your system to 20.04 LTS. I'm running many 18.04 LTS systems without problems. I use both GA (4.15 version) and HWE (5.4 version) kernels without problems on various harwdware.

In your case I see BIOS date from 2014 year, so system is old. It is still good, but may be faulty because its aging.
So I suspect faulty hardware. I would recommend to test RAM first. If you using legacy/BIOS system, then there should be Memtest entry in the GRUB menu:

grub

(if you do not see it here - install it first by sudo apt-get install memtest86+ and reboot)

Select it on next boot and allow it to run 3-5 full cycles.

(above image is just an illustration from VM, the test is not completed, it is just started)

This Memtest utility also available from Ubuntu MATE installation medias and from its site for UEFI systems (select free version here).

In any case memtest should continue to show blue window. And will show line like "Pass completed without errors" on each full RAM testing cycle.
Errors (if any) will be shown in red. This will mean that your RAM is faulty.

Thus you have to run RAM testing by Memtest and then report results here. Good luck!

gordon · 3 February 2022 00:35

Sorry, couldn't resist...

Says the guy who, in the screengrab below that quote, shows that he is running Memtest86+ on an Ivy Bridge-based system -- probably also from 2014...!

More seriously, to elaborate on what @Norbert_X already said, oftentimes I find out that there is no real hardware failure that causes these issues -- I've seen plenty of instances of odd behavior that are fixed by removing all memory modules and then re-seating them. I've noticed that every time a computer is powered on, things inside get warm and expand, and when the machine is turned off, things cool down and contract; as such I've actually seen memory modules "walk" out of their sockets over a period of months or years! Even if the modules look properly seated, try reseating them anyway. It usually works for me.