This is how you can recursively grep starting
from current directory.
find
. -name "pattern" -exec grep "search_pattern" '{}' \;
That's been a known
pattern to grep recursively. But let me introduce you to a better approach
described in the second answer here which is saying that
since grep 2.5.2 shipped on August 2006 we
are happy to use a functionality from grep itself. So, check this out
00:47:50:~/Coding$ time greprepo "*.java"
"flint" .
./clojure/repos/my/jalint2/HelloWorld.java: Native.loadLibrary("flint",
Flint.class);
./clojure/repos/my/jalint2/HelloWorld.java:
System.setProperty("jna.library.path", "../flint2");
./clojure/repos/rebcabin/jalint2/HelloWorld.java:
Native.loadLibrary("flint", Flint.class);
./clojure/repos/rebcabin/jalint2/HelloWorld.java:
System.setProperty("jna.library.path", "../flint2");
real 0m0.134s
user 0m0.120s
sys 0m0.014s
00:48:25:~/Coding$ time find . -name "*.java" -exec
grep flint '{}' \;
Native.loadLibrary("flint", Flint.class);
System.setProperty("jna.library.path",
"../flint2");
Native.loadLibrary("flint", Flint.class);
System.setProperty("jna.library.path",
"../flint2");
real 0m0.545s
user 0m0.289s
sys 0m0.212s
Isn't it impressive?
And greprepo is my alias to
alias greprepo='grep
--exclude-dir ".{git,svn}" -R --mmap --include'
Performance boost is
also due to mmap which is
ok if you don't change files you are currently greping. Don't try to do mmap on network share or if you change
the stuff. Anyways, whithout mmap performance
is still much more impressive than with find
01:02:46:~/Coding$
time grep --exclude-dir ".{git,svn}" -R --include "*.java"
"flint" .
./clojure/repos/my/jalint2/HelloWorld.java:
Native.loadLibrary("flint",
Flint.class);
./clojure/repos/my/jalint2/HelloWorld.java:
System.setProperty("jna.library.path",
"../flint2");
./clojure/repos/rebcabin/jalint2/HelloWorld.java:
Native.loadLibrary("flint",
Flint.class);
./clojure/repos/rebcabin/jalint2/HelloWorld.java:
System.setProperty("jna.library.path",
"../flint2");
real 0m0.146s
user 0m0.130s
sys 0m0.015s
mmap will do its magic on really huge files. ~/Coding is not a
storage for huge files as you can presume.
Of course you may
notice I used --exclude-dir with
grep, but here is one more
1:02:56:~/Coding$ time
grep -R --include "*.java" "flint" .
./clojure/repos/my/jalint2/HelloWorld.java:
Native.loadLibrary("flint",
Flint.class);
./clojure/repos/my/jalint2/HelloWorld.java:
System.setProperty("jna.library.path",
"../flint2");
./clojure/repos/rebcabin/jalint2/HelloWorld.java:
Native.loadLibrary("flint",
Flint.class);
./clojure/repos/rebcabin/jalint2/HelloWorld.java:
System.setProperty("jna.library.path",
"../flint2");
real 0m0.135s
user 0m0.118s
sys 0m0.017s
which makes me
thinking of pre caching :) Anyway, find doesn't
feature an easy option to exclude directories. There is some
"workaround",
which I didn't understand how to easily use. But even then find + grep is much slower, than grep alone.
My friend pointed me at that there is a slightly better approach to find + grep to one that I used above, which is to combine find with xargs. So, here are all three approaches listed from the slowest to the fastest:
0 (raspberry) 14:43:20:~/.vim$ time find . -name "*.vim" -exec grep -Hn --color=always tab '{}' \; > /dev/null
real 0m1.855s
0 (raspberry) 14:43:26:~/.vim$ time find . -name "*.vim" -print0 | xargs -0 grep -Hn --color=always tab > /dev/null
real 0m0.213s
0 (raspberry) 14:43:31:~/.vim$ time greprepo "*.vim" "tab" . > /dev/null
real 0m0.172s
UPDATE [Sep 17, 2013]
My friend pointed me at that there is a slightly better approach to find + grep to one that I used above, which is to combine find with xargs. So, here are all three approaches listed from the slowest to the fastest:
0 (raspberry) 14:43:20:~/.vim$ time find . -name "*.vim" -exec grep -Hn --color=always tab '{}' \; > /dev/null
real 0m1.855s
user 0m0.370s
sys 0m1.070s
0 (raspberry) 14:43:26:~/.vim$ time find . -name "*.vim" -print0 | xargs -0 grep -Hn --color=always tab > /dev/null
real 0m0.213s
user 0m0.080s
sys 0m0.110s
0 (raspberry) 14:43:31:~/.vim$ time greprepo "*.vim" "tab" . > /dev/null
real 0m0.172s
user 0m0.080s
sys 0m0.080s
Once again, grep alone wins. However, it's worth
saying, find + xargs is much faster than find -exec.
But don't forget that using grep only approach you have exclusions
which I talked about in the original post.
No comments:
Post a Comment