Very Slow Boot after updating from 16.04 to 18.04

Hi again,

I've just remembered from somewhere that sometimes the password doesn't show up as you type it. I tried sudo update-grub and got the following:

Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-65-generic
Found initrd image: /boot/initrd.img-4.15.0-65-generic
Found linux image: /boot/vmlinuz-4.15.0-64-generic
Found initrd image: /boot/initrd.img-4.15.0-64-generic
Found memtest86+ image: /boot/memtest86+.elf
Found memtest86+ image: /boot/memtest86+.bin
done

I tried again with the services and the first two seemed to work:

$ sudo systemctl disable snapd.service
Removed /etc/systemd/system/multi-user.target.wants/snapd.service.
$ sudo systemctl disable NetworkManager-wait-online.service
Removed /etc/systemd/system/network-online.target.wants/NetworkManager-wait-online.service.

Nothing seemed to happen with the last one:

$ sudo systemctl disable apt-daily.service

I tried rebooting and it was still slow but I did see the text scrolling past. Then it went into the usual sequence of black screens.

I put dmesg| tail into a terminal and got the following:

[ 350.021664] ESI: 00000001 EDI: c05064a7 EBP: 0000000e ESP: bfcab9e8
[ 350.021667] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 350.021672] Code: 83 c4 0c e9 1a ff ff ff e8 5d 7b 0b d6 8b 45 d0 8d 55 dc e8 22 a6 0f d6 85 f6 0f 85 f9 fe ff ff 57 68 94 b5 fd f7 e8 cf 78 0b d6 <0f> 0b 58 5a e9 e5 fe ff ff 8d b6 00 00 00 00 66 66 66 66 90 55
[ 350.021778] ---[ end trace fb34033ec11338b0 ]---
[ 360.160137] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [CRTC:41:pipe B] flip_done timed out
[ 370.400150] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] ERROR [PLANE:35:plane B] flip_done timed out
[ 392.084628] [UFW BLOCK] IN=enp2s0 OUT= MAC=00:1e:33:65:3f:73:e0:b9:e5:1a:b2:a6:08:00 SRC=192.168.1.254 DST=192.168.1.64 LEN=291 TOS=0x00 PREC=0x00 TTL=64 ID=34989 PROTO=UDP SPT=1900 DPT=41066 LEN=271
[ 392.085412] [UFW BLOCK] IN=enp2s0 OUT= MAC=00:1e:33:65:3f:73:e0:b9:e5:1a:b2:a6:08:00 SRC=192.168.1.254 DST=192.168.1.64 LEN=296 TOS=0x00 PREC=0x00 TTL=64 ID=34990 PROTO=UDP SPT=1900 DPT=41066 LEN=276
[ 443.864229] [UFW BLOCK] IN=enp2s0 OUT= MAC=01:00:5e:00:00:01:e0:b9:e5:1a:b2:a6:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=35129 DF PROTO=2
[ 568.958987] [UFW BLOCK] IN=enp2s0 OUT= MAC=01:00:5e:00:00:01:e0:b9:e5:1a:b2:a6:08:00 SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=35145 DF PROTO=2

Thanks for your help btw and no problem about sudo update-grub

I did that too after you mentioned it.

Also see - 10 minute Long Boot Time?

Thanks mdooley, that has made a big difference, the boot time is down to two minutes now.

Sweet. :slight_smile:

This section in your syslog:

[ 350.021667] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 350.021672] Code: 83 c4 0c e9 1a ff ff ff e8 5d 7b 0b d6 8b 45 d0 8d 55 dc e8 22 a6 0f d6 85 f6 0f 85 f9 fe ff ff 57 68 94 b5 fd f7 e8 cf 78 0b d6 <0f> 0b 58 5a e9 e5 fe ff ff 8d b6 00 00 00 00 66 66 66 66 90 55
[ 350.021778] ---[ end trace fb34033ec11338b0 ]---

shows that something crashed on startup. Whatever that "something" is, it's worth investigating. Look a few (or possibly a few dozen!) lines above that in the log - usually easiest via dmesg |less, pressing END, and then paging back upwards - and you'll find what the dying process is.

Thanks arQon,

I tried dmesg |less but couldn't find that section in the log, so maybe it's resolved itself? If there's anywhere else I should look then please do let me know.

Thanks again for all your help.

Though i doubt we have the same problem - is your laptop rather old? I found the problem with my Toshiba A305 is any Kernel newer than 4.4, and have read that anything beyond 4.4.0.91 will cause problems. Thing is I had been running various distros, with kernels after 4.4 with no problem until MATE 18.04?

EDIT - NEVERMIND. Thank you mdooley! I did the edit to grub and the first reboot was about 90 seconds instead of 8 minutes, and each subsequent boot takes less than 20 seconds. However, I did this on 19.04 thinking it may help my boot times to upgrade from 18.04 to 19.04 - the upgrade didn't help. I assume mdooley's solution would work in 18.04 as well. That first reboot displayed two "atomic helper"......"kms"...."Flip flop" errors. which I didn't have time to record or even read properly. Thank you again

Just as a point to remember later, the 19.10 beta doesn't appear to need the "video=SVIDEO-1:d" addition to grub that works around the flip_done timed out bug .

This is another bug that has been fixed in the beta.

1 Like

Hi jmarkus, yes, it's pretty old, I bought it second hand but 16.04 was fine on it. Which is why it was so strange when I updated to 18.04 and everything ground to a halt!

Then it's probably from the piece mdooley mentioned, so that's good. :slight_smile:

Two minutes is obviously a big improvement over where you were, but it's still pretty slow. What was it for you on 16.04?
The POST (the part of boot before it even gets to grub) is generally fairly constant, but with the crashy piece seemingly resolved, you should now hopefully be able to get useful diagnostics from systemd. I'd be tempted to run blame and critical-chain reports again and see if the numbers add up properly now.

Thanks arQon, I did wonder if I could shave a bit more off that two minutes. It's been so long since I updated to 18.04 that I can't actually remember how long it took, my gut feeling is that it was slightly quicker but not by much.

systemd-analyze blame now gives me this:

     17.828s dev-sda1.device
     17.457s systemd-journal-flush.service
     11.742s systemd-hwdb-update.service
     11.507s ufw.service
     10.730s systemd-modules-load.service
     10.631s keyboard-setup.service
     10.143s systemd-tmpfiles-setup-dev.service
      6.387s snapd.seeded.service
      5.824s snapd.service
      4.681s NetworkManager.service
      4.635s networkd-dispatcher.service
      3.239s udisks2.service
      3.058s tlp.service
      3.008s motd-news.service
      2.988s accounts-daemon.service
      2.787s gpu-manager.service
      2.764s upower.service
      2.514s ModemManager.service
      2.096s dev-loop4.device
      2.077s dev-loop3.device
      2.073s dev-loop2.device
      2.071s dev-loop1.device
      2.063s dev-loop0.device

lines 1-23

which adds up to roughly 140 seconds which sounds about right.

This is the output from systemd-analyze critical-chain but I'm not sure how to add these times up, do they overlap with each other? Thanks again.

graphical.target [at]31.173s
└─multi-user.target [at]31.170s
└─snapd.seeded.service [at]24.782s +6.387s
└─snapd.service [at]24.885s +5.824s
└─basic.target [at]24.214s
└─sockets.target [at]24.214s
└─snapd.socket [at]24.211s +2ms
└─sysinit.target [at]24.138s
└─systemd-timesyncd.service [at]23.975s +162ms
└─systemd-tmpfiles-setup.service [at]23.621s +291ms
└─systemd-journal-flush.service [at]6.161s +17.457s
└─systemd-journald.service [at]5.398s +760ms
└─systemd-journald.socket [at]5.298s
└─system.slice [at]5.296s
└─-.slice [at]5.255s

No, the chain is exactly that (in theory) - each of the pieces that SEQUENTIALLY are blocking boot. So snapd.seeded.service [at]24.782s +6.387s means the next link in the chain should start at 31.169s: and that's multi-user.target [at]31.170s, so it pretty much does.

So by removing snapd, for example, you'd boot up to 6.4s faster (in reality it would be less of an improvement than that, because snapd is waiting on other things, like internet access, so you'd need to fix that too. But if you cleared that out too then you'd save 6+s: I don't think any of the others are more than a few ms long).

That chain shows that systemd-journal-flush.service is the biggest problem in your boot time, at 17s. I THINK that's new to 18.04 (I know Debian doesn't use it). Basically it copies log files around on your hard drive, so aside from slowing boot in its own right it also blocks the drive from being able to do any other IO for services that would otherwise be starting up in parallel.

The systemd developers apparently fixed one piece of it in v240 (https://github.com/systemd/systemd/issues/10469). IDK what version 18.04 is using (systemd --version will tell you) but the fix was "only" a year ago, so it's not going to be in that unless the Ubuntu maintainers backported it (which they should have, so you may well have it already).

You can apply the fix it yourself without much trouble. That said, I don't see how it would actually help: it's still a dependency of systemd-tmpfiles-setup.service, and that's the one that's triggering it on your machine. So I'm completely puzzled by that, I admit. It certainly won't HURT to do it, I just don't see it actually helping for you.

sudo pluma /lib/systemd/system/systemd-journal-flush.service and change the line that says
Before=systemd-user-sessions.service systemd-tmpfiles-setup.service to just Before=systemd-tmpfiles-setup.service

None of my machines are running this, so I'm very much just guessing here, and like I say I don't see how that would really help. But systemd is a mess, and "unexpected behavior" might as well be its slogan, so I'd say go for it and see. :slight_smile:
If that fixes things, great. If not, some of the other ways to deal with it get you into much darker territory, and you probably don't really want to go there. But, one thing at a time...


Note that according to systemd, you should have a graphical login prompt (or autologin) about 31s after "userspace" starts, not 140s.

There are 3 pieces to boot: BIOS/POST, kernel, and userspace.
BIOS ends when GRUB (the boot menu) starts, and depends entirely on your hardware. Consumer PCs generally take anything from a few seconds to 20+ in this stage.
The kernel depends almost entirely on your CPU, but is usually just a couple of seconds even on old PCs.
The rest is userspace, which is what systemd is supposed to be reporting on. So 140s of total elapsed time is WAYYY longer than it "should" be.

Hi again arQon,

Thanks for the explanation.

I ran the fixes that you suggested but it didn't seem to alter the boot time.

Here's the output from systemd --version afterwards:

systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

So, if I understand correctly from your last paragraph, systemd is not reporting everything that is happening in the userspace stage?

Possibly *mis-*reporting rather than not reporting, but either way, either your estimate of 140s is very wrong or the systemd's claim of 31s is.

Incidentally, systemd-analyze time will give you the kernel / userspace split.

I'm not surprised that fixing systemd's journaling ordering issue didn't change things. "Sooner" vs "later" is fairly minor when the problem is "shouldn't be happening at all" (or at least "shouldn't be doing something clearly broken").

Even on a 2.5" HDD (they're quite a lot slower than 3.5" ones), 17s just to copy a tiny amount of logging to the disk is beyond ridiculous. The only way it should be taking even a tenth of that much time is if the disk is failing, big time. Which I suppose is worth checking (search for "SMART" if you don't know how), but for it to have coincidentally happened at the same time as your upgrade is a bit of a stretch. :slight_smile:

So that really only leaves "systemd is doing something stupid". Copying the logs a byte at a time, or starting over if something new gets written to it, and so on. Also something of a stretch, but still much more credible as a starting point.

I'd be tempted to just disable it outright, at least temporarily: the worst-case outcome is that you lose old boot-time logs, and that's just not going to be relevant to you.

edit /etc/systemd/journald.conf and change #Storage=auto to Storage=volatile. Note the hash change.

See what happens next time you reboot. At a minimum, it should be 17s faster. If not, Something Went Wrong. gl!

Thanks arQon

Here's the output from systemd-analyze time:

Startup finished in 7.102s (kernel) + 1min 39.689s (userspace) = 1min 46.791s
graphical.target reached after 31.747s in userspace

So the delay appears to be in the userspace

graphical.target should be the last thing IN userspace, so I have no idea where that last 68 seconds is coming from or what it's doing, but whatever it is it's clearly "the" problem. systemd's reporting is being impressively useless, since it clearly sees that piece, but is hiding it in both of the reports that would actually let you identify it easily.

Since systemd doesn't log anything properly, its own tools are all you have, so when those fail you're screwed. About the only path left to you is to use systemd-analyze dump > /tmp/foo.txt and search that for After: graphical.target and see if anything interesting shows up - that is, something with Enter and Exit timestamps that are far enough apart to account for your missing minute-plus.
Someone more familiar with it might know a better way, but I don't.

Enable auto-login for your account, then reboot and try systemd-analyze critical-chain user-1000.slice. (If you aren't user 1000 - id will tell you what you are - change that part to match). That should tell you if something is blocking your logins completely. It would be strange for X to be up and that part not to be, but your whole situation is strange anyway. And that's everything I can think of except booting into recovery mode and seeing how long that takes to get you to a console login (which you should also check).

Thanks arQon,

I tried systemd-analyze dump > /tmp/foo.txt but nothing happened, it just returned me to the opening line of the terminal.

This is the output from systemd-analyze critical-chain user-1000.slice:

user-1000.slice [at]39.804s
└─systemd-user-sessions.service [at]32.957s +42ms
└─network.target [at]32.934s
└─NetworkManager.service [at]26.363s +6.570s
└─dbus.service [at]25.690s
└─basic.target [at]25.626s
└─sockets.target [at]25.626s
└─snapd.socket [at]25.619s +5ms
└─sysinit.target [at]25.530s
└─systemd-timesyncd.service [at]25.403s +127ms
└─systemd-tmpfiles-setup.service [at]25.084s +272ms
└─systemd-journal-flush.service [at]6.314s +18.767s
└─systemd-journald.service [at]5.548s +763ms
└─syslog.socket [at]5.545s
└─system.slice [at]5.453s
└─-.slice [at]5.403s

If that gives you any ideas then do let me know but if not don't worry, I can live with a 2 minute boot. It's a lot better than it was and you've been very helpful. Thanks again.

Ah - I should have explained that eh? :slight_smile:

That creates a file with all the (VERY LARGE) output, which is why you don't see it on the screen.
pluma /tmp/foo.txt after running that command will let you see all the details of your startup, hopefully including whatever the missing piece(s) from blame and chain are.

You CAN live with a 2-minute boot, certainly. My Dad had a laptop with Vista on it once upon a time, which took literally 15+ minutes to "truly" start up, thanks to the AV etc. But you shouldn't HAVE to live with a 2-minute boot.

I'm curious enough to want to know how this story ends, but what you should really do is try AskUbuntu: there are a lot more people there, and chances are good someone else has had the exact same problem. (Sorry, I thought I'd suggested that before, but apparently that was another thread).

gl, and if you find a solution please let us know what it turned out to be!