Ubuntu (MATE) memory handling - weird behaviour - or my misunderstanding

ericmarceau · 27 September 2024 01:44

Context:

Linux  hostname  6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05

Description:	Ubuntu 22.04.5 LTS  (from '*lsb_release -a*' )

Observations and Comments:

I decided to check to see if using the "--bwlimit=95232" option while doing backups with "rsync" was actually giving me any benefit.

Note that I had no other processes running except the two MATE terminals I had open, one running the backup script, the other I was using for "probing" and "analyzing".

My script was doing rsync involving a massive 120 GB data differential transfer.

root:~# ps -ef
root      354188       1  0 18:33 pts/0    00:00:00 /bin/sh /site/DB005_F7/Z_backup.DB001_F7.DateSize.batch
root      354192  354188  3 18:33 pts/0    00:03:51 rsync --bwlimit=95232 --one-file-system --recursive --outbuf=Line --
root      354193  354192  0 18:33 pts/0    00:00:08 rsync --bwlimit=95232 --one-file-system --recursive --outbuf=Line --
root      354194  354193 19 18:33 pts/0    00:23:30 rsync --bwlimit=95232 --one-file-system --recursive --outbuf=Line --

I expected that with the "--bwlimit" and with 4GB RAM, I would only be seeing a minimal swap usage. Instead, I got this (about 750MB swap):

root:~# swapon
NAME       TYPE        SIZE   USED PRIO
/dev/sda11 partition 996.2M 240.4M   99
/dev/sda10 partition 996.2M 239.4M   99
/dev/sdb2  partition     2G 241.2M   99
root:~# sync
root:~# swapon
NAME       TYPE        SIZE   USED PRIO
/dev/sda11 partition 996.2M 240.1M   99
/dev/sda10 partition 996.2M 239.1M   99
/dev/sdb2  partition     2G 240.9M   99
root:~#

When I checked with top, to see if anything else would pop up, I didn't see anything that was out of place:

root@OasisMega1:~# top -d 10

top - 20:36:24 up  9:17,  1 user,  load average: 3.75, 4.53, 4.39
Tasks: 282 total,   1 running, 281 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.2 us,  7.7 sy,  0.0 ni, 61.9 id, 28.8 wa,  0.0 hi,  0.3 si,  0.0 st     
%Cpu1  :  1.0 us, 10.5 sy,  0.0 ni, 65.1 id, 23.4 wa,  0.0 hi,  0.0 si,  0.0 st     
%Cpu2  :  1.6 us,  3.9 sy,  0.0 ni, 61.9 id, 32.6 wa,  0.0 hi,  0.0 si,  0.0 st     
%Cpu3  :  1.6 us,  8.0 sy,  0.0 ni, 23.2 id, 67.1 wa,  0.0 hi,  0.0 si,  0.0 st     
MiB Mem :   3663.5 total,    603.1 free,    609.8 used,   2450.6 buff/cache
MiB Swap:   4040.4 total,   3321.1 free,    719.3 used.   1721.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 354194 root      18  -2   74.1m   2.3m   1.0m D  17.1   0.1  23:59.42 rsync --bwlimit=95232 --outbuf=Line
     47 root      20   0    0.0m   0.0m   0.0m S   6.3   0.0   2:49.60 [kcompactd0]
 354192 root      18  -2   18.2m   3.0m   2.5m S   6.1   0.1   3:58.06 rsync --bwlimit=95232 --outbuf=Line
   7195 ericthe+  20   0  338.2m  10.8m   8.0m S   2.0   0.3  11:09.33 /usr/lib/mate-applets/mate-multiload-applet
   6296 root      20   0  573.1m  23.3m  14.1m S   1.7   0.6  11:02.42 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/+
   8537 root      20   0    0.0m   0.0m   0.0m D   1.5   0.0   2:29.62 [usb-storage]

My backup process lasted about 2 hours, given I was working with an external 4 TB USB hard drive on a USB2 port:

 Using previously determined bandwidth limit for rsync buffer setting ...

 Will apply parameter to limit flooding of I/O, memory and swap ==>>  --bwlimit=95232

	 Thu 26 Sep 2024 06:33:21 PM EDT |rsync| Start DB001_F7 ...
	 Background 'rsync' working ...


 Expected Log files:
	 /site/DB005_F7/Z_backup.DB001_F7.DateSize.out
	 /site/DB005_F7/Z_backup.DB001_F7.DateSize.err


 Use 'OS_Admin__partitionMirror_Monitor.sh' to monitor rsync process.


 Imported LIBRARY:  INCLUDES__TerminalEscape_SGR.bh ...


	 Thu 26 Sep 2024 06:33:32 PM EDT

	 PID 354194 is RSYNC child process ...
	 PID 354193 is RSYNC child process ...
	 PID 354192 is RSYNC MASTER process ...
	 RSYNC backup process under way ...

	 root      354192  354188 10 18:33 pts/0    00:00:01 rsync
				--bwlimit=95232
				--one-file-system
				--recursive
				--outbuf=Line
				--links
				--perms
				--times
				--group
				--owner
				--devices
				--specials
				--verbose
				--out-format=%t|%i|%M|%b|%f|
				--delete-during
				--whole-file
				--human-readable
				--protect-args
				--ignore-errors
				--msgs2stderr ./ /site/DB005_F7/DB001_F7/


	 Scanning at 10 second intervals ...
	 ..............................   5 min
	 ..............................   10 min
	 ..............................   15 min

	 ..............................   125 min
	 .............................

	 RSYNC process (# 354192) has completed.

	 Thu 26 Sep 2024 08:45:37 PM EDT

Question #1:
Can anyone explain why rsync used so much memory as to overflow into the swap, if I had specified that buffering size with "--bwlimit" ???

Question #2:
I expected the manual "sync" to flush RAM and swap of anything that was duplicated on disk. That total RAM and swap was much larger than the expected usage, given that buffer specification, so why no reduction in usage of RAM/SWAP ?

Question #3:
Is there a command I could issue (periodically) to force the flushing of the retained "dirty" RAM ?

I saw somewhere a suggestion that the following would flush that, but it had not apparent effect when I did enter it:

sync; echo 1 > /proc/sys/vm/drop_caches

Also for

sync; echo 3 > /proc/sys/vm/drop_caches

My current kernel parameters:

vm.admin_reserve_kbytes = 8192
vm.compact_unevictable_allowed = 1
vm.compaction_proactiveness = 20
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 40
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 10000
vm.extfrag_threshold = 500
vm.hugetlb_optimize_vmemmap = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256	256	32	0	0
vm.max_map_count = 65530
vm.memfd_noexec = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 65536
vm.mmap_rnd_bits = 32
vm.mmap_rnd_compat_bits = 16
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.numa_stat = 1
vm.numa_zonelist_order = Node
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 0
vm.page-cluster = 4
vm.page_lock_unfairness = 5
vm.panic_on_oom = 0
vm.percpu_pagelist_high_fraction = 0
vm.stat_interval = 1
vm.swappiness = 20
vm.unprivileged_userfaultfd = 0
vm.user_reserve_kbytes = 90551
vm.vfs_cache_pressure = 50
vm.watermark_boost_factor = 15000
vm.watermark_scale_factor = 1000
vm.zone_reclaim_mode = 0

ugnvs · 27 September 2024 08:50

Yes, Linux/Unix memory management is often somewhat confusing. I am aware of the two facts about it which might help to better understand it. First, your Linux kernel is configured to swap the whole memory at 20 rate. Next to it, Linux kernel silently caches all file reading and writing operations in the background. The latter memory is considered free one but can be reported by system as cache.

tkn · 27 September 2024 11:09

Several things:

--bwlimit limits the bandwidth of the transfer, not RAM/swap usage. It is a transfer rate limiter.
You set the bwlimit on 95MB/s (which will be more or less met by your source drive) while your target drive (USB2) will do about 20MB/s. Data will accumulate in buffer, buffer will grow and pushing other (not recently used) pages to swap.
rsync is just a userspace program and has no influence at all on what the kernel decides to do when it comes down to buffering or swapping.

What has been swapped to disk are probably not the files in transfer but everything else that was in memory and got pushed out of the way to make room for buffering.

If you want less cache/buffer pressure on your system, lower your swappiness to 5 or lower and/or use a bwlimit of 8192.
Your transfers will probably be very slow but your buffers probably stay small and swap will probably be hardly there.

EDIT:
You might want some extra documentation on the vm parameters.
https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html

About question 2 and 3: As long as you feed data faster than it can write to disk, there is no way that any of those attempts will help.

ericmarceau · 28 September 2024 01:46

Thank you, Eugene!

ericmarceau · 28 September 2024 03:21

Thank you, Thom!

After what you said, I realized that what I was doing with hdparm to determine the drive's live bandwidth was wrong, because I misread the manual page. Those testing options were only for read bandwidth, not write bandwidth!

Mea Culpa! I just realize, looking at that report again, that I also completely misinterpreted the hdparm report.

I will revisit my backup scripts function that calculates the bwlimit and come back to report.

Thank you also for the URL, but I've seen that many times. I'm trying to give precedence to my Desktop GUI while my backups are running. Except in rare cases, I don't "feel" the presence of massive backups happening in the background.

I just raised the above questions because I thought I saw something quirky. It appears that I was mistaken.

---- edit ----

Quick sanity check on thruput to USB drive:

cat /dev/zero | pv --size 500m --stop-at-size > /site/DB005_F8/DB001_F1/tmp/junker
 500MiB 0:00:16 [31.2MiB/s] [=========================================>] 100%

tkn · 28 September 2024 11:49

ericmarceau:

Quick sanity check on thruput to USB drive:

cat /dev/zero | pv --size 500m --stop-at-size > /site/DB005_F8/DB001_F1/tmp/junker
 500MiB 0:00:16 [31.2MiB/s] [=========================================>] 1

Yes, and this is the best case scenario. The throughput will take a dive when copying a lot of very small files because of seektime.

EDIT:
You might also want to check out this:

ionice is part of the 'util-linux' package and probably already installed on your system

ericmarceau · 28 September 2024 15:52

Thank you, Thom.

Are you saying it will take a dive at reading or writing of the files ? Again, since the disk drive is USB3 rated, I don't see how feeding it at USB2 rates would have much of an impact on the disk's low-level write thruput.

As for my interactivity, thank you for mentioning ionice, but I am already using that in my backup script:

	COM="ionice -c 2 -n 7 rsync \
		${limitThruput} \
		${doUpdate} \
		${doCheckSum} \
		${showProgress} \
		--one-file-system \
		--recursive \
		--outbuf=Line \
		--links \
		--perms \
		--times \
		--group \
		--owner \
		--devices \
		--specials \
		--verbose --out-format=\"%t|%i|%M|%b|%f|\" \
		--delete-during \
		--whole-file \
		--human-readable \
		--protect-args \
		--ignore-errors \
		--msgs2stderr \
		${EXCLUDES} \
		./ ${PathMirror}${dirC}/"

tkn · 28 September 2024 18:34

I am saying that writing a lot small files to an external 4 TB USB hard drive on a USB2 port will bring the throughput down because physics (= limitations of mechanical components of the hard drive).

The throughput (low level or not) is limited by the maximum physical transfer rate of the USB2 port. You can not go faster than that.

But wat brings that throughput further down is latency.
It is the random seek-time and it is unrelated to your interface speed.

A mechanical haddrive has to move the heads mechanically from one cylinder to another. This mechanical action takes a relatively a long time after the command for this action is issued. These latency spikes can bring your 30MB/s throughput further down to 20MB/s on average with short dips as low as 7MB/s
This is unrelated to the type of interface.

ericmarceau · 28 September 2024 23:31

Thank you, Thom.

I can easily visualize what you are saying might be applicable, if my external drive's circuitry was designed for a USB2 context.

However, it was designed for a USB3 context, so I still don't understand read (or write) head latency could be impacting USB2 transfer rates, given both

the computer's caching and buffering on the read and
the computer's caching and buffering on the write,

which both must be way more than USB2 limits.

Do you see what I am trying to say?

tkn · 29 September 2024 10:22

No, Internally it is designed for SATA.
They just added a USB3 interface.

USB2 and USB3 drives are mechanically and electronically identical except for the interface.

a USB3 port will behave like a USB2 port when connected to a USB2 port.

Since everything has to pass the USB2 port on its way out to the external drive, it can never go faster than USB2 speeds, even if the drive itself is equipped with USB3
seek time on all comparable mechanical drives is more or less the same, the interface does not matter.
with a lot of small files , seek time can add up tremendously. While the head is seeking it can not write: the transfer stalls.

A lot of computer side buffering will not help if the whole chain of events is waiting for the head to find the right track.

sidenote:
There is a little bit of buffering on the disk itself and that is the only thing helps the transfer going until it is full, which can easily happen during a lot of seeks, even when the transfer is at USB2 speeds.

One other thing:
A lot of computer side buffering will also not make USB2 spontaneously faster.

To use an analogy:
Think about an aquarium with a little hole in it that leaks water at a maximum speed of 1 liter per hour.
See the buffering as the size of the aquarium and the little hole as the usb2 port.
It makes no difference if the aquarium is 10 liter or 100 liter
The leakspeed will not change.

ericmarceau · 29 September 2024 12:27

Thank you again, Thom.

... and your aquarium analogy is the perfectly fitting model to represent what I was trying to convey! Nice touch! I love imagery!

Drive head seeking for read would not cause the USB2 to "sputter" but would flow at a steady speed, the interface's maximum.

In my mind, similarly for the writing to disk, the drive head seeking is designed/rated to fulfill USB3 rate demands so, the fact that the interface is forced to operate at only USB2 rates is not changing the fact that the circuitry controlling the electro-mechanical mechanism is still designed to operate at a single maximum physical speed at all times, which was conceived to respond to USB3 flow rates.

I hope that gets my point across better.

tkn · 29 September 2024 23:49

No, it isn't.

It's designed to be as fast as reasonably possible, but moving a head over a disk is a mechanical action and therefore a relatively slow event.

This is the dataflow when writing to the external disk:

internal computer buffer (aquarium)
   |
   V
USB2 interface on the computer (leak)
   |
   V
USB3 interface of the external disk
   |
   V
HDD buffer (4MB) build into the external disk
   |
   V
physical disk

The most common (2.5") mobile drives have an internal effective transferspeed from HDD buffer to physical disk of about 110MB/s and an average seektime of about 12 ms.

Writing 80 different files of 1k often results in 80 seeks so it might theoretically cost almost 1 second to write that total of 80kB resulting in a dip of 80kB/s

You can see that even on USB2 (30MB/s) the result with some tens of thousands 1kB files will fill the small HDD buffer to the max in a very short time and the transfer will stall resulting in the average USB transferspeed taking a firm nosedive.

The average backup with a mix of small and large files will usually end up somewhere between 18 and 27 MB/s

On a pure USB3 to USB3 connection it will be somewhere between 70 and 100 MB/s

ericmarceau · 30 September 2024 00:38

My reaction to the situation, not your response, is not to be share here where children may be listening.

Thank you, Thom. Much appreciate your patience with me on this.