General Linux Find command question

I’m trying to find a way to delete some old directories with a command in a script and place it in a crontab to run on a daily basis.
I have an ftp directory that has sub-directories by date and then more sub-directories by time by top and bottom of the hour with images in those half-hour directories. Like so:

ftp/20160728/1300/'file1.jpg
ftp/20160728/1330/file1.jpg
ftp/20160729/1300/file1.jpg
ftp/20160729/1330/file1.jpg

What I want to do is delete the day folder (20160728), with everything else under, of course, after a few days.
I found this command:
find /ftp/* -maxdepth 0 -type d -ctime +4 -exec rm -rf {} ;

But it’s not deleting the day folder, just the time sub-directories and files under. I’ve tried other find commands also that didn’t work and somehow along the way the modification time changed on the day folder to today’s date.
What is the correct means of deleting everything under the ftp directory while at the same time not changing the modification date?

Hi @t3kg33k, I’m not sure why it doesn’t get the day directory after doing some quick testing. But I have a suggestion. Consider forming the old directory name with something like:

echo /ftp/$(date +%Y%m%d -d “2 days ago”)/

Replace the echo command with rm -rf to do the job. Also, ctime seems a bit unstable to rely on. I hope I didn’t misunderstand.

3 Likes

ctime is indeed unreliable for this purpose, as @Bill_MI points out. In fact mtime would also be. Both will read metadata information that has nothing to do with the directory creation date. Linux will change ctime and mtime as new files are created, changed, or deleted inside the directory and will change ctime when directory permissions and ownership are altered.

In Linux there is currently no reliable way to check a file creation date. So any attempts to do what you are trying to do have to go through tricks like what @Bill_MI shown. In ext4 there is already support for file creation dates (called birth in this filesystem). But either the kernel or coreutils have yet to implement it. Currently in Ubuntu, the only way to read this information is through debugfs, but this a) requires sudo, b) needs the file inode number and, worst of all c) doesn’t currently work for directories, only regular files.

The state of matter on file creation time in Linux is this: You don’t. There is no standard way of doing yet, every filesystem that implemented it did it their own way, and neither the kernel developers or the coreutils developers seem interested in moving this feature to the hands of the end users.

(note: find already has support for birth time through the -newerXY option. But it doesn’t work yet on any filesystem)

So the above is to put you up to speed as to why what you want requires just a a tad bit more of work. Concerning @Bill_MI trick, I just want to stress out that you should first use the echo command to be absolutely sure you get the right directory name. Test with echo and only when you are sure, should you implement it without echo on your script.


To not let this post go without an alternative solution, another option is to use a timestamp file and use the -newer option along with a negation operator. I have not tested the command line below, but there’s no reason it shouldn’t work:

# here's your cron job
find /ftp/* -maxdepth 0 -type d ! -newer ~/.ftpcrontime -exec rm -rf {} +
touch ~/.ftpcrontime

A quick summary:
.ftpcrontime is an empty file you use as a flag indicating the time the last cron job took place. So, you want to create this file manually once before the first cron job executes.

-newer ~/.ftpcrontime will find all files (or directories in this case, as indicated by -type d) newer than .ftpcrontime – in case you are curious, most linux command read mtime by default. But we want all older files, not newer. So we negate this test with the !.

touch ~/.ftpcrontime you probably know already what it does by now. It recreates your flag file and thus sets its timestamp to the last time this cron job executed.

Notes:
Note that the job will have an alternate behavior in the first two runs. The 1st time it runs it will delete all directories older than your manually created .ftpcrontime. It will then recreate .ftpcrontime. Next time it runs, it will not delete any new directories created in the meantime because they are newer than the recreated .ftpcrontime. But it will still recreate .ftpcrontime again. The third time it runs (and all runs thereafter) it will delete those older directories it skipped on the 2nd run, but leave untouched any newer ones.

It is perfectly ok to let the job delete everything on the first run. Just make a copy of what you want to keep and after cron deletes it, copy them back (on copy, timestamps are updated on the destination to the new timestamp).

Alternatively, just manually create .ftpcrontime as you would for the first run, but set your first cron job to only start a day or two later and then repeat once a day.

4 Likes

Wow. That was a very thorough answer and I really appreciate it. I’m still learning Linux and your response has contributed to that learning process. Thanks.
I’ll try this out and see how it goes.