Very Slow Boot after updating from 16.04 to 18.04

Thanks arQon, I did wonder if I could shave a bit more off that two minutes. It's been so long since I updated to 18.04 that I can't actually remember how long it took, my gut feeling is that it was slightly quicker but not by much.

systemd-analyze blame now gives me this:

     17.828s dev-sda1.device
     17.457s systemd-journal-flush.service
     11.742s systemd-hwdb-update.service
     11.507s ufw.service
     10.730s systemd-modules-load.service
     10.631s keyboard-setup.service
     10.143s systemd-tmpfiles-setup-dev.service
      6.387s snapd.seeded.service
      5.824s snapd.service
      4.681s NetworkManager.service
      4.635s networkd-dispatcher.service
      3.239s udisks2.service
      3.058s tlp.service
      3.008s motd-news.service
      2.988s accounts-daemon.service
      2.787s gpu-manager.service
      2.764s upower.service
      2.514s ModemManager.service
      2.096s dev-loop4.device
      2.077s dev-loop3.device
      2.073s dev-loop2.device
      2.071s dev-loop1.device
      2.063s dev-loop0.device

lines 1-23

which adds up to roughly 140 seconds which sounds about right.

This is the output from systemd-analyze critical-chain but I'm not sure how to add these times up, do they overlap with each other? Thanks again.

graphical.target [at]31.173s
└─multi-user.target [at]31.170s
└─snapd.seeded.service [at]24.782s +6.387s
└─snapd.service [at]24.885s +5.824s
└─basic.target [at]24.214s
└─sockets.target [at]24.214s
└─snapd.socket [at]24.211s +2ms
└─sysinit.target [at]24.138s
└─systemd-timesyncd.service [at]23.975s +162ms
└─systemd-tmpfiles-setup.service [at]23.621s +291ms
└─systemd-journal-flush.service [at]6.161s +17.457s
└─systemd-journald.service [at]5.398s +760ms
└─systemd-journald.socket [at]5.298s
└─system.slice [at]5.296s
└─-.slice [at]5.255s

No, the chain is exactly that (in theory) - each of the pieces that SEQUENTIALLY are blocking boot. So snapd.seeded.service [at]24.782s +6.387s means the next link in the chain should start at 31.169s: and that's multi-user.target [at]31.170s, so it pretty much does.

So by removing snapd, for example, you'd boot up to 6.4s faster (in reality it would be less of an improvement than that, because snapd is waiting on other things, like internet access, so you'd need to fix that too. But if you cleared that out too then you'd save 6+s: I don't think any of the others are more than a few ms long).

That chain shows that systemd-journal-flush.service is the biggest problem in your boot time, at 17s. I THINK that's new to 18.04 (I know Debian doesn't use it). Basically it copies log files around on your hard drive, so aside from slowing boot in its own right it also blocks the drive from being able to do any other IO for services that would otherwise be starting up in parallel.

The systemd developers apparently fixed one piece of it in v240 (https://github.com/systemd/systemd/issues/10469). IDK what version 18.04 is using (systemd --version will tell you) but the fix was "only" a year ago, so it's not going to be in that unless the Ubuntu maintainers backported it (which they should have, so you may well have it already).

You can apply the fix it yourself without much trouble. That said, I don't see how it would actually help: it's still a dependency of systemd-tmpfiles-setup.service, and that's the one that's triggering it on your machine. So I'm completely puzzled by that, I admit. It certainly won't HURT to do it, I just don't see it actually helping for you.

sudo pluma /lib/systemd/system/systemd-journal-flush.service and change the line that says
Before=systemd-user-sessions.service systemd-tmpfiles-setup.service to just Before=systemd-tmpfiles-setup.service

None of my machines are running this, so I'm very much just guessing here, and like I say I don't see how that would really help. But systemd is a mess, and "unexpected behavior" might as well be its slogan, so I'd say go for it and see. :slight_smile:
If that fixes things, great. If not, some of the other ways to deal with it get you into much darker territory, and you probably don't really want to go there. But, one thing at a time...


Note that according to systemd, you should have a graphical login prompt (or autologin) about 31s after "userspace" starts, not 140s.

There are 3 pieces to boot: BIOS/POST, kernel, and userspace.
BIOS ends when GRUB (the boot menu) starts, and depends entirely on your hardware. Consumer PCs generally take anything from a few seconds to 20+ in this stage.
The kernel depends almost entirely on your CPU, but is usually just a couple of seconds even on old PCs.
The rest is userspace, which is what systemd is supposed to be reporting on. So 140s of total elapsed time is WAYYY longer than it "should" be.

Hi again arQon,

Thanks for the explanation.

I ran the fixes that you suggested but it didn't seem to alter the boot time.

Here's the output from systemd --version afterwards:

systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

So, if I understand correctly from your last paragraph, systemd is not reporting everything that is happening in the userspace stage?

Possibly *mis-*reporting rather than not reporting, but either way, either your estimate of 140s is very wrong or the systemd's claim of 31s is.

Incidentally, systemd-analyze time will give you the kernel / userspace split.

I'm not surprised that fixing systemd's journaling ordering issue didn't change things. "Sooner" vs "later" is fairly minor when the problem is "shouldn't be happening at all" (or at least "shouldn't be doing something clearly broken").

Even on a 2.5" HDD (they're quite a lot slower than 3.5" ones), 17s just to copy a tiny amount of logging to the disk is beyond ridiculous. The only way it should be taking even a tenth of that much time is if the disk is failing, big time. Which I suppose is worth checking (search for "SMART" if you don't know how), but for it to have coincidentally happened at the same time as your upgrade is a bit of a stretch. :slight_smile:

So that really only leaves "systemd is doing something stupid". Copying the logs a byte at a time, or starting over if something new gets written to it, and so on. Also something of a stretch, but still much more credible as a starting point.

I'd be tempted to just disable it outright, at least temporarily: the worst-case outcome is that you lose old boot-time logs, and that's just not going to be relevant to you.

edit /etc/systemd/journald.conf and change #Storage=auto to Storage=volatile. Note the hash change.

See what happens next time you reboot. At a minimum, it should be 17s faster. If not, Something Went Wrong. gl!

Thanks arQon

Here's the output from systemd-analyze time:

Startup finished in 7.102s (kernel) + 1min 39.689s (userspace) = 1min 46.791s
graphical.target reached after 31.747s in userspace

So the delay appears to be in the userspace

graphical.target should be the last thing IN userspace, so I have no idea where that last 68 seconds is coming from or what it's doing, but whatever it is it's clearly "the" problem. systemd's reporting is being impressively useless, since it clearly sees that piece, but is hiding it in both of the reports that would actually let you identify it easily.

Since systemd doesn't log anything properly, its own tools are all you have, so when those fail you're screwed. About the only path left to you is to use systemd-analyze dump > /tmp/foo.txt and search that for After: graphical.target and see if anything interesting shows up - that is, something with Enter and Exit timestamps that are far enough apart to account for your missing minute-plus.
Someone more familiar with it might know a better way, but I don't.

Enable auto-login for your account, then reboot and try systemd-analyze critical-chain user-1000.slice. (If you aren't user 1000 - id will tell you what you are - change that part to match). That should tell you if something is blocking your logins completely. It would be strange for X to be up and that part not to be, but your whole situation is strange anyway. And that's everything I can think of except booting into recovery mode and seeing how long that takes to get you to a console login (which you should also check).

Thanks arQon,

I tried systemd-analyze dump > /tmp/foo.txt but nothing happened, it just returned me to the opening line of the terminal.

This is the output from systemd-analyze critical-chain user-1000.slice:

user-1000.slice [at]39.804s
└─systemd-user-sessions.service [at]32.957s +42ms
└─network.target [at]32.934s
└─NetworkManager.service [at]26.363s +6.570s
└─dbus.service [at]25.690s
└─basic.target [at]25.626s
└─sockets.target [at]25.626s
└─snapd.socket [at]25.619s +5ms
└─sysinit.target [at]25.530s
└─systemd-timesyncd.service [at]25.403s +127ms
└─systemd-tmpfiles-setup.service [at]25.084s +272ms
└─systemd-journal-flush.service [at]6.314s +18.767s
└─systemd-journald.service [at]5.548s +763ms
└─syslog.socket [at]5.545s
└─system.slice [at]5.453s
└─-.slice [at]5.403s

If that gives you any ideas then do let me know but if not don't worry, I can live with a 2 minute boot. It's a lot better than it was and you've been very helpful. Thanks again.

Ah - I should have explained that eh? :slight_smile:

That creates a file with all the (VERY LARGE) output, which is why you don't see it on the screen.
pluma /tmp/foo.txt after running that command will let you see all the details of your startup, hopefully including whatever the missing piece(s) from blame and chain are.

You CAN live with a 2-minute boot, certainly. My Dad had a laptop with Vista on it once upon a time, which took literally 15+ minutes to "truly" start up, thanks to the AV etc. But you shouldn't HAVE to live with a 2-minute boot.

I'm curious enough to want to know how this story ends, but what you should really do is try AskUbuntu: there are a lot more people there, and chances are good someone else has had the exact same problem. (Sorry, I thought I'd suggested that before, but apparently that was another thread).

gl, and if you find a solution please let us know what it turned out to be!

Came across yet another "boot time is much slower after updating from 16.04" scenario recently, which I don't think affects your case but is interesting enough to be worth mentioning.

In a nutshell, the update turned on swap that didn't exist, causing the boot to hang "invisibly" until it timed out. Probably worth checking next time you're in the mood.

Thanks arQon, I will look into those ideas and let you know if I get anywhere. Apologies for not replying sooner, other things have got in the way recently.