All Indications Point To Disk Corruption

Hi, @OldStrummer (Fred).

You mention that you have "20Tb of disk"?! That's a lot! Is that a single disk (given the huge size, I'm assuming it's an HDD and not an SSD) or is it some kind of RAID setup with several disks?

Before running "fsck" (filesystem check), I would first:

1 - Check the /var/log/syslog and search for disk errors there (messages like "DRDY ERR" and/or "Unrecovered read error" and/or "I/O error" are likely indicators of disk problems and/or a bad cable).

2 - Do you have a good backup of your data? If not, I suggest you create a backup first, before you run a filesystem check, especially if that filesystem check will try to repair things. The article "Repair a damaged filesystem" - https://help.ubuntu.com/stable/ubuntu-help/disk-repair.html.en - includes, amid other useful information, the following warning:

"(...) Possible data loss when repairing

If the filesystem structure is damaged it can affect the files stored in it. In some cases these files can not be brought into a valid form again and will be deleted or moved to a special directory. It is normally the lost+found folder in the top level directory of the filesystem where these recovered file parts can be found.

If the data is too valuable to be lost during this process, you are advised to back it up by saving an image of the volume before repairing.

This image can be then processed with forensic analysis tools like sleuthkit to further recover missing files and data parts which were not restored during the repair, and also previously removed files. (...)"

3 - I think you should also check the S.M.A.R.T. / SMART (Self-Monitoring, Analysis, and Reporting Technology) information for your hard drive. The article "Check your hard disk for problems" - https://help.ubuntu.com/stable/ubuntu-help/disk-check.html.en - mentions the following (amid other information):

"(...) Checking the hard disk

Hard disks have a built-in health-check tool called SMART (Self-Monitoring, Analysis, and Reporting Technology), which continually checks the disk for potential problems. SMART also warns you if the disk is about to fail, helping you avoid loss of important data.

Although SMART runs automatically, you can also check your disk’s health by running the Disks application:

Check your disk’s health using the Disks application

1. Open Disks from the Activities overview.
2. Select the disk you want to check from the list of storage devices on the left. Information and status of the disk will be shown.
3. Click the menu button and select SMART Data & Self-Tests…. The Overall Assessment should say “Disk is OK”.
4. See more information under SMART Attributes, or click the Start Self-test button to run a self-test. (...)"

(Alternatively, if you're familiar with the "smartctl" command that comes with the "smartmontools" package, you can also use that for checking the "SMART" Data and doing the "SMART Self-tests")

4 - If you don't already have a good backup of your data, you may consider using "ddrescue" (I've never used it) - Ddrescue - GNU Project - Free Software Foundation (FSF) - to create a disk image, before doing some "fsck" of the filesystem in a mode that may try to repair things. HOWEVER, it can take a LOT of time to create a disk image and using "ddrescue" for a 20 TB disk like yours will very likely take a REALLY LONG time! There's the following post here in the "Ubuntu MATE Community" regarding "ddrescue", posted on 10th April 2018 by @andyp6 :

Other than that: your approach looks good to me (entering " 'emergency boot mode' at the startup screen by pressing escape, then 'e' to edit the boot command. Once at the grub script, I should just add systemd.unit=emergency.target .") as it's described, for instance, in the article How to Boot Ubuntu 20.04 LTS in Rescue / Emergency Mode particularly in the section "Booting Ubuntu 20.04 LTS in Emergency Mode" of that article.

Having said that, in Emergency Mode, I would first run the "fsck" in its "dry run" mode by using the -N (minus uppercase N) switch that, acccording to "man fsck", has the meaning of "Don’t execute, just show what would be done." and also using the "-V" (minus uppercase V) switch to get verbose output and checking the results, before running the "fsck" command again without the -N switch.

I hope this helps. Good luck! :slight_smile:

4 Likes