27/09/2015

Efficient and quick search with find and grep - Bash

When you need to search inside files, as linux user you will think in "grep", but when you try to search in zillion of files (or just thousands), grep isn't the best option for that!

It's very slow especially when you use "-r" option (recursive), even if you are going to use "fgrep" or "grep -F" (--fixed-strings) which interprets patterns as a list of fixed strings), thus it's a little bit faster because it treats the strings literally and doesn't try to parse them with regex.

Here comes "find", find is one of most powerful and lovely commands in Linux/Unix systems :D So, if you want to get a better performance especially when you have to search in many files, always use grep with find:
find PATH -type f -name "FILES_PATTERN" -exec grep "STRING" {} \;
Is it enough? No it isn't!
In many cases you may need it to be faster (but at the expense of the CPU), you can use "xarges" command to invoke more grep processes like this:
find PATH -type f -name "FILES_PATTERN" -print0 | xargs -0 -n4 -P4 grep -l "STRING"
With "print0" in "find" will print all of its output without new line character "\n" to "xargs", then "xargs" will run 4 of "grep" command (-P4) and search in 4 files (-n4), and "-0" in "xargs" reads values of "find" based on "null character" instead "\n", thus it can determine file name. So, each time the actual result will work like:
grep -l "STRING" file1 file2 file3 file4
grep -l "STRING" file5 file6 file7 file8
grep -l "STRING" file9 file10 file11 file12
grep -l "STRING" file13 file14 file15 file16
In fact, you can make it faster with increasing the number of processes "-P" and number of arguments "-n", but as I mentioned before, this will affect the CPU, so, should be handled with care especially on production and so on.
Powered by Blogger.

Hello, my name is Ahmed AbouZaid and this is my "lite" technical blog!

I'm a passionate DevOps, Linux system administrator, RedHat Certified Engineer (RHCE), AWS SysOps/Solutions Architect, Free/Open source geek, author, interested in environment, calligraphy, and I believe that “Details Matter”!

Automation, data, and metrics are my preferred areas. I have a built-in monitoring chip, and too lazy to do anything manually :D

Popular Posts