Caja - slow writing to USB sticks

Steve · 26 January 2016 12:33

Has anyone experienced really slow writing speeds when copying large files from computer to a USB stick?

I have been copying some video files to a USB stick and found that it starts off fast, then as it got towards the end, it slows down. Even when the progress bar is right at the end it can take a few minutes before it completes.

lah7 · 26 January 2016 13:47

Depends on the USB stick. Older ones (USB 1.1) are slow anyway, newer ones (USB 2.0/3.0) have a bit of a wait at the end.

I watch my CPU usage through a panel applet, and the iowait for CPU raises up when the transfer completes in Caja. iowait is waiting for I/O to finish.

Here’s what I think happens in the background in a simple sense:
Caja is telling to copy to the USB drive. The is on it and will cache the file in memory while also copying to the USB drive. This speeds up the operation for Caja and then when it’s completed, the bottleneck is now trying to finish copying to the physical USB drive as soon as possible… which drives up iowait.

So, I think it appears to look slow, but it’s actually just caching in the background.

Also, slightly related: Never pull out a USB drive before ejecting, files not yet written to the drive may still be in memory waiting to be written, that’s one reason why.

Steve · 26 January 2016 14:01

I never do that. I've seen the result of that in the past.

The caching thing makes sense. They are USB 2.0 sticks.

stevecook172001 · 26 January 2016 14:06

I suppose one way to decide if it is objectively slow, as opposed to possibly “fake” slow (due to caching at the end of the procedure), is to compare total-copy-time on different OS’s using the same data, the same USB-stick and the same USB port.

wolfman · 26 January 2016 14:23

It will also depend on how large your system memory is and if it needs Swap, how much of that is being used!.

Steve · 26 January 2016 14:39

I have two machines of similar specification. One running Windows 10 x64, the other Ubuntu MATE 15.10 x64.

I’ll copy the same files onto the same memory sticks from both systems, timing the transfers from start to end.

I’ll post the results back here.

Steve · 26 January 2016 18:08

Results

Ubuntu MATE 15.10 x64
1.65GB file from desktop to 32GB USB 2.0 stick - 6 minutes 40 seconds
749 MB file from desktop to 32GB USB 2.0 stick - 2 minutes 55 seconds

Windows 10 x 64
1.65GB file from desktop to 32GB USB 2.0 stick - 6 minutes 35 seconds
749 MB file from desktop to 32GB USB 2.0 stick - 2 minutes 53 seconds

Both machines have same motherboard, CPU and memory

Difference is not significant.

On Ubuntu MATE, the smaller file looks like it has copied very quickly, according to the progress bar, but as you say, Luke, the time is taken to write the cache to the stick.

One thing I would say, is that Caja is displaying the progress bar based on the amount of data read from the disk, rather than the amount that has been written. Would this not be better?

stevecook172001 · 26 January 2016 20:00

To be honest, I have always found data transfer “progress” bars to be irritating in that they will say something like “40 seconds left” when I know full well it is going to stay on that for about a minute and then go from 40 seconds to two hours and then back to 10 seconds left all in the space of less than 5 seconds.

It seems to me that the above must be based on some kind of simplistic prediction based on the current data transfer rate at any given moment. Hence the ridiculous jumping about with the prediction. What all of the above usually amounts to is that the prediction of time left is usually wholly inaccurate at the start of the transfer and becomes only really accurate when there are only seconds left of the transfer.

A far better way of doing the prediction would be for the program doing the data transfer to keep a log of snapshots of the data transfer rate all the way through the transfer process. It should then base its ongoing prediction of time remaining on some kind of rolling weighted average based on an analysis of the average data transfer rates (based on the snapshots) up to that particular point and should continue to do this right up to the end of the copying process. I bet that would produce less silly variations in the prediction as the data was transferred.

quonsar · 26 January 2016 20:15

I have found that the sync command is my friend.

sync - flush file system buffers

Steve · 26 January 2016 22:35

I actually wanted to like this post twice (because I really liked it!). Unfortunately the community doesn’t work that way, but I agree with you @stevecook172001. This is a really good idea!

Steve · 26 January 2016 22:38

@quonsar, The sync command is great, but it doesn’t make the system write any faster. The problem here is to do with what Caja is estimating as the time remaining.

lah7 · 27 January 2016 00:14

Good idea, pretty much using the average speed to predict, rather then what is happening at that moment in time.

I copied 500GB+ between two HDD drives a few weeks back, it said it’ll take 2 and a bit hours… that’s great, until it came to copying 1000s of tiny files, which slowed the transfer rate… took about 5 hours or something in the end.

Get your voice heard by telling the Caja developers upstream: https://github.com/mate-desktop/caja/issues

stevecook172001 · 27 January 2016 12:41

I have thought about this a bit more and it could even include a longer term memory of total-data-transfer-time of previous data transfer events and thus could add some kind of a-priori weighting to the current data transfer event based on a meta-averaging of those previous total-data-transfer-time events. Thus, the real-time estimate that is occurring during a given data transfer event in any given moment would be based on a combination of the average data transfer up to that specific point in that particular data transfer event, plus an initial weighting to the prediction based on an average of previous total-transfer-times of previous data transfer events.

Bloody hell, that was easier to think than it was to say!

Steve · 27 January 2016 13:03

I see that you have raised this at https://github.com/mate-desktop/caja/issues @stevecook172001

Let’s hope something comes of it.