Find command for the million Jul 29 2018
One of the most useful Unix commands, and mostly underutilized, is find. To get the most out of it there are two key features that we need to explore and understand.
- How to iterate over the results
- Limitation on the number of results we can operate on
Table of Contents
Learn by doing
Lets create a few files and directories to use as examples:
1
2
3
4
mkdir -p one/two/three
for i in {1..10}; do touch "./one/file_${i}"; done
for i in {1..10}; do touch "./one/two/file_${i}"; done
for i in {1..10}; do touch "./one/two/three/file_${i}"; done
This should generate the following file structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
.
└── one
├── file_1
├── file_10
├── file_2
├── file_3
├── file_4
├── file_5
├── file_6
├── file_7
├── file_8
├── file_9
└── two
├── file_1
├── file_10
├── file_2
├── file_3
├── file_4
├── file_5
├── file_6
├── file_7
├── file_8
├── file_9
└── three
├── file_1
├── file_10
├── file_2
├── file_3
├── file_4
├── file_5
├── file_6
├── file_7
├── file_8
└── file_9
Now if we wish to "find" all the directories inside the current directory we could use the following command:
1
find ./ -type d
and we should see on the screen:
1
2
3
4
./
.//one
.//one/two
.//one/two/three
Same if we wanted to obtain all the normal files in the current directory we could use the command:
1
find ./ -type f
And we should see on the screen:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
.//one/file_10
.//one/file_7
.//one/file_9
.//one/file_8
.//one/file_1
.//one/file_6
.//one/two/file_10
.//one/two/file_7
.//one/two/file_9
.//one/two/file_8
.//one/two/file_1
.//one/two/file_6
.//one/two/three/file_10
.//one/two/three/file_7
.//one/two/three/file_9
.//one/two/three/file_8
.//one/two/three/file_1
.//one/two/three/file_6
.//one/two/three/file_3
.//one/two/three/file_4
.//one/two/three/file_5
.//one/two/three/file_2
.//one/two/file_3
.//one/two/file_4
.//one/two/file_5
.//one/two/file_2
.//one/file_3
.//one/file_4
.//one/file_5
.//one/file_2
Notice that the results returned might not be in the order we might expect, this is because some filesystems reorder directories entries as part of their optimization process so if you wish them to be in a specific order you'll need to specify it, most likely by sending the results to the sort command using a pipe, like the following:
1
find ./ -type f | sort
Iterate through results
Ok so now that we have the basics covered, let's see how to iterate through the results. A common and useful pattern we can find ourselves doing repeatedly is to fix the file permissions to something like the following:
- For normal files set permissions to read and write for the owner and only read for everyone else (644)
- For directories set permissions to read, write and execute for the owner and only read and execute(*) for everyone else(755)
(* The execute permission for the directories is so anyone can list the content of a directory)
Now that we have defined our objective, let's see how to do it using find.
Exec
We can pass the option exec to find to execute a command on each of the results, the command for regular files will look like this:
1
find . -type f -exec chmod 644 {} \;
and the command for directories will look like this:
1
find . -type d -exec chmod 755 {} \;
The first part, find . -type f and find . -type d, we already know what it does. The interesting part is: -exec {} \;. If the find command encounters the pattern: {} it will replace that pattern for the current occurence of a match.
Run the following command, and it'll be clear.
1
find . -type f -name "*4" -exec echo "Find found {} AND again: {}!" \;
You should see:
1
2
3
Find found ./one/two/three/file_4 AND again: ./one/two/three/file_4!
Find found ./one/two/file_4 AND again: ./one/two/file_4!
Find found ./one/file_4 AND again: ./one/file_4!
So now we are only left with one more piece of the puzzle: \;. The exec parameter expects you to finish the command with one of two options ; or +, the semicolon tells find to execute the command one time for each of the results and the plus sign tells execute to send as many matches as possible to the command. This is better explained by an example:
1
find . -type f -name "*4" -exec ls {} \;
This will call ls on each file that matches so we would see:
1
2
3
./one/two/three/file_4
./one/two/file_4
./one/file_4
and if we call it with the plus sign:
1
find . -type f -name "*4" -exec ls {} \+
ls will receive a list of files to list, the generated command would look something like:
ls ./one/file4 ./one/two/file4 ./one/two/three/file_4 and you'll see:
1
./one/file_4 ./one/two/file_4 ./one/two/three/file_4
The limitation on the length of the command line
A limit on the length of the command line is defined by the operating system, we can check it using the command:
1
getconf ARG_MAX
So this is the value that find uses to know how many parameters to send to the command. This is very useful if we have a long list of files we need to operate on, let's say we have close to a million files in a directory and we want to delete them, we would normally try to do:
1
rm *
But will get an error: Argument list too long, this is because the star operator will expand all the matches into the command line and be confronted with the ARG_MAX limit. We can see now how useful it could be to use exec with the plus option to overcome this limitation.
Fast file deletion
If you ever tried deleting cached files on a filesystem that contains hundreds of thousands of files you know it's like looking at paint dry, it takes a long time. Find can make this process faster.
We could use the command:
1
find . -type f -exec {} rm \;
This will call rm on each of the files, which should be equivalent to just calling rm * . But find gives us the option delete that makes this process faster. Instead of iterating through each element we can just run the following command:
1
find . -type f -delete
And this will be orders of magnitude faster, you could run the following test:
1
for i in $(seq 1 500000); do echo testing >> $i.txt; done
And then we can measure:
1
time find ./ -type f -exec rm {} \;
Create the half a million files and compare it with:
1
time find ./ -type f -exec rm {} \+
Create the half a million files and compare it with:
1
time find ./ -type f -delete
These are my results:
Using rm with the semicolon:
1
2
3
real 23m7.661s
user 5m37.056s
sys 11m28.556s
Using rm with plus option:
1
2
3
real 1m34.861s
user 0m1.203s
sys 1m14.445s
Using -delete:
1
2
3
real 1m30.699s
user 0m0.885s
sys 1m10.668s
As you can see -delete was faster, rm with the plus option was a close one but still 4 seconds slower, but going one by one which most people do takes a long time. If you are interested in more details and comparisons to other methods (using Perl and rsync) check this link.
Hope this gives you a better understanding of the find command, explore its other options and share with me any tricks you've discovered.
Ps. If you want me to write about a specific topic let me know and I'll do my best to share my experience.