"uninterruptible sleep (usually I/O)"

Recently I've encountered a situation that winds up ending in a state of uninterruptible sleep (usually I/O). In brief, I have some 8TB of data in over 20k files. I know I have duplicates. I've installed fdupes and rdfind to help find these dupes. However, when I run either one, the process stalls eventually and I see the "D" under the [S]tatus column in ps.

The uninterruptible sleeping state (D) waits for the resources to be available before it moves into a runnable state and doesn’t react to any signals.

Running lsof shows me the (last) file opened by the process, but gives me no indication of why it has gone into this state. And since it's uninterruptible, the only remedy is a reboot.

What gives? Is my data set just too large for these processes to build a cache? Is there a way to "pad" my I/O so that I'm not running out of resources? How can I isolate this problem?

Not a recipe but some thoughts. I stumbled a similar situation when wrote client/server application. Based on that experience I consider that the problem is in fdupes or rdfind implementation. That is some of them (synchronously?) waits for the file unreachable state. It is not the OS/kernel error and/or resource exhaustion problem.

Proposals are:

  • unconditionally kill the process in question like sudo kill -9 <PID>
  • login in CLI session only to avoid extra DE - inspired processes (caja?) which can monitor FS and prohibit the file to reach its desirable state
  • compose personal script/procedure to find out the duplicates

P.S. Once upon a time I wrote script which generated three-columned .csv with filename, path and file size for my HDD. The generated .csv was opened as spreadsheet and sorted by filename column :slight_smile:

Good luck!

1 Like

Thanks! One thing: sending a kill -9 only works if there is a parent process. The "D+" status is uninterruptible. Which means it will not respond to a KILL signal. That's why a reboot is the only solution.

I've considered writing my own solution much as you suggest. I was kind of hoping fdupes or rdfind would keep me from having to do so. :melting_face:

There is an interesting info regarding killing a process with 'uninterruptible sleep' status:

2 Likes

Some pretty interesting and hoary stuff in those articles! Still, I think it comes down to one or two things (or maybe both):

  • fdupes and rdfind are unable to handle large volumes of data. On one volume (internal storage) I have 1671 directories, 24883 files. On an external drive I have 1166 directories, 18131 files. In order for these utilities to work I imagine they have to gather the data, perform a bubble sort and then spit out their findings. I have 64GB RAM, so I don't think that's the issue (and top doesn't show excess CPU or MEM utilization.
  • I have problems with disk/sectors/clusters. I've run fsck and not encountered any problems but that doesn't mean there aren't any.

The fact that I can run find and tree without issue, as well as du suggests to me the problem lies within the apps. Hmm.

Hopefully the following discussion might be interesting

1 Like