Recursively iterate through all subdirectories, If a file with a specific extension exists then run a command in that folder once
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I need to recursively iterate through all the subdirectories of a folder.
Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.
Here's what I have so far
recursive() wc -l)
if [[ $file_count -gt 0 ]]; then
echo "Match found. Going to execute a command"
#execute command
fi
done
(cd /target; recursive)
But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?
shell-script shell find recursive
add a comment |Â
up vote
0
down vote
favorite
I need to recursively iterate through all the subdirectories of a folder.
Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.
Here's what I have so far
recursive() wc -l)
if [[ $file_count -gt 0 ]]; then
echo "Match found. Going to execute a command"
#execute command
fi
done
(cd /target; recursive)
But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?
shell-script shell find recursive
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I need to recursively iterate through all the subdirectories of a folder.
Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.
Here's what I have so far
recursive() wc -l)
if [[ $file_count -gt 0 ]]; then
echo "Match found. Going to execute a command"
#execute command
fi
done
(cd /target; recursive)
But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?
shell-script shell find recursive
I need to recursively iterate through all the subdirectories of a folder.
Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.
Here's what I have so far
recursive() wc -l)
if [[ $file_count -gt 0 ]]; then
echo "Match found. Going to execute a command"
#execute command
fi
done
(cd /target; recursive)
But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?
shell-script shell find recursive
asked Jan 23 at 2:16
ishanipu
32
32
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
find
has a builtin flag to print strings, which is pretty useful here:
find -iname "*.xyz" -printf "%hn"
prints the names of all directories that contain a file that matches your pattern (the %h
is just find
's magic syntax that expands to the file directory and n
is, of course, a linebreak).
Therefore, this does what you want:
COMMAND='echo'
find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
cd "$i" && pwd && $COMMAND
done
There are a few things that are going on here. To execute commands only once, we just pipe it through sort
with the -u
flag, which drops all duplicate entries. Then we loop over everything with while
. Also note that I used find `pwd`
, which is a nice trick to make find
output absolute paths, instead of relative ones, which allows us to use cd
without having to worry about any relative paths.
Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n
) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.
Issort
really doing anything here? If it is, you could pass-u
to sort to avoiduniq
.
â Mikael Kjær
Jan 23 at 3:35
@MikaelKjærsort
is needed becausefind
can output directorya
, thena/b
, thena
again. You're right about the-u
flag, thanks! I'm too used tosort|uniq
...
â PawkyPenguin
Jan 23 at 4:02
thatwhile read
loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shellwhile read
loop is to split on newline only (e.g.... | IFS= while read -d '' i; do ... ; done
. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
â cas
Jan 23 at 6:18
@cas Really? The POSIX manual forread
says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
â PawkyPenguin
Jan 23 at 14:43
Read a line from the standard input and split it into fields
...but you're right about it assigning everything to your $i, it also sayswith any leftover words assigned to the last NAME
...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting),$i
will have only a single space instead of what was originally in the input. There are many things that will break awhile read
loop. BTW, newlines in filenames are quite common on, e.g., Macs.
â cas
Jan 23 at 14:58
 |Â
show 2 more comments
up vote
4
down vote
You're re-inventing find
.
Try something like this (using GNU findutils
and GNU sort
):
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r -I sh -c "cd ; yourcommandhere"
The -printf
prints the directory names (%h
) where '*.xyz' files are found, with NUL bytes (00
) as the delimiter. sort
is used to eliminate duplicates, and then xargs
is used to cd
into each directory and run yourcommandhere
.
You can also write a script to run with xargs. e.g.
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r /path/to/myscript.sh
simple myscript.sh example:
#!/bin/sh
for d in "$@" ; do
cd "$d"
echo "Match found in $d. Going to execute command"
# execute command
done
This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.
BTW, neither printf
nor sort
nor xargs
are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.
Here's another way of doing the same thing, without sort or xargs:
find /target -iname '*.xyz' -exec bash -c
'typeset -A seen
for f in "$@"; do
d="$(dirname "$f")";
if [[ ! -v $seen[$d] ]]; then
echo "Match found in $d. Going to execute command"
# Execute command
seen["$d"]=1
fi
done' +
This uses an associative array in bash ($seen
) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml
files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.
The script executed by find's -exec
option can be a standalone script, as with the xargs version above.
BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.
See also-execdir
which gets rid of the explicitcd
.
â Kusalananda
Jan 23 at 5:32
yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz
), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
â cas
Jan 23 at 5:36
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
find
has a builtin flag to print strings, which is pretty useful here:
find -iname "*.xyz" -printf "%hn"
prints the names of all directories that contain a file that matches your pattern (the %h
is just find
's magic syntax that expands to the file directory and n
is, of course, a linebreak).
Therefore, this does what you want:
COMMAND='echo'
find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
cd "$i" && pwd && $COMMAND
done
There are a few things that are going on here. To execute commands only once, we just pipe it through sort
with the -u
flag, which drops all duplicate entries. Then we loop over everything with while
. Also note that I used find `pwd`
, which is a nice trick to make find
output absolute paths, instead of relative ones, which allows us to use cd
without having to worry about any relative paths.
Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n
) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.
Issort
really doing anything here? If it is, you could pass-u
to sort to avoiduniq
.
â Mikael Kjær
Jan 23 at 3:35
@MikaelKjærsort
is needed becausefind
can output directorya
, thena/b
, thena
again. You're right about the-u
flag, thanks! I'm too used tosort|uniq
...
â PawkyPenguin
Jan 23 at 4:02
thatwhile read
loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shellwhile read
loop is to split on newline only (e.g.... | IFS= while read -d '' i; do ... ; done
. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
â cas
Jan 23 at 6:18
@cas Really? The POSIX manual forread
says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
â PawkyPenguin
Jan 23 at 14:43
Read a line from the standard input and split it into fields
...but you're right about it assigning everything to your $i, it also sayswith any leftover words assigned to the last NAME
...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting),$i
will have only a single space instead of what was originally in the input. There are many things that will break awhile read
loop. BTW, newlines in filenames are quite common on, e.g., Macs.
â cas
Jan 23 at 14:58
 |Â
show 2 more comments
up vote
2
down vote
accepted
find
has a builtin flag to print strings, which is pretty useful here:
find -iname "*.xyz" -printf "%hn"
prints the names of all directories that contain a file that matches your pattern (the %h
is just find
's magic syntax that expands to the file directory and n
is, of course, a linebreak).
Therefore, this does what you want:
COMMAND='echo'
find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
cd "$i" && pwd && $COMMAND
done
There are a few things that are going on here. To execute commands only once, we just pipe it through sort
with the -u
flag, which drops all duplicate entries. Then we loop over everything with while
. Also note that I used find `pwd`
, which is a nice trick to make find
output absolute paths, instead of relative ones, which allows us to use cd
without having to worry about any relative paths.
Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n
) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.
Issort
really doing anything here? If it is, you could pass-u
to sort to avoiduniq
.
â Mikael Kjær
Jan 23 at 3:35
@MikaelKjærsort
is needed becausefind
can output directorya
, thena/b
, thena
again. You're right about the-u
flag, thanks! I'm too used tosort|uniq
...
â PawkyPenguin
Jan 23 at 4:02
thatwhile read
loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shellwhile read
loop is to split on newline only (e.g.... | IFS= while read -d '' i; do ... ; done
. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
â cas
Jan 23 at 6:18
@cas Really? The POSIX manual forread
says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
â PawkyPenguin
Jan 23 at 14:43
Read a line from the standard input and split it into fields
...but you're right about it assigning everything to your $i, it also sayswith any leftover words assigned to the last NAME
...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting),$i
will have only a single space instead of what was originally in the input. There are many things that will break awhile read
loop. BTW, newlines in filenames are quite common on, e.g., Macs.
â cas
Jan 23 at 14:58
 |Â
show 2 more comments
up vote
2
down vote
accepted
up vote
2
down vote
accepted
find
has a builtin flag to print strings, which is pretty useful here:
find -iname "*.xyz" -printf "%hn"
prints the names of all directories that contain a file that matches your pattern (the %h
is just find
's magic syntax that expands to the file directory and n
is, of course, a linebreak).
Therefore, this does what you want:
COMMAND='echo'
find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
cd "$i" && pwd && $COMMAND
done
There are a few things that are going on here. To execute commands only once, we just pipe it through sort
with the -u
flag, which drops all duplicate entries. Then we loop over everything with while
. Also note that I used find `pwd`
, which is a nice trick to make find
output absolute paths, instead of relative ones, which allows us to use cd
without having to worry about any relative paths.
Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n
) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.
find
has a builtin flag to print strings, which is pretty useful here:
find -iname "*.xyz" -printf "%hn"
prints the names of all directories that contain a file that matches your pattern (the %h
is just find
's magic syntax that expands to the file directory and n
is, of course, a linebreak).
Therefore, this does what you want:
COMMAND='echo'
find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
cd "$i" && pwd && $COMMAND
done
There are a few things that are going on here. To execute commands only once, we just pipe it through sort
with the -u
flag, which drops all duplicate entries. Then we loop over everything with while
. Also note that I used find `pwd`
, which is a nice trick to make find
output absolute paths, instead of relative ones, which allows us to use cd
without having to worry about any relative paths.
Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n
) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.
edited Jan 23 at 14:49
answered Jan 23 at 2:59
PawkyPenguin
686110
686110
Issort
really doing anything here? If it is, you could pass-u
to sort to avoiduniq
.
â Mikael Kjær
Jan 23 at 3:35
@MikaelKjærsort
is needed becausefind
can output directorya
, thena/b
, thena
again. You're right about the-u
flag, thanks! I'm too used tosort|uniq
...
â PawkyPenguin
Jan 23 at 4:02
thatwhile read
loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shellwhile read
loop is to split on newline only (e.g.... | IFS= while read -d '' i; do ... ; done
. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
â cas
Jan 23 at 6:18
@cas Really? The POSIX manual forread
says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
â PawkyPenguin
Jan 23 at 14:43
Read a line from the standard input and split it into fields
...but you're right about it assigning everything to your $i, it also sayswith any leftover words assigned to the last NAME
...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting),$i
will have only a single space instead of what was originally in the input. There are many things that will break awhile read
loop. BTW, newlines in filenames are quite common on, e.g., Macs.
â cas
Jan 23 at 14:58
 |Â
show 2 more comments
Issort
really doing anything here? If it is, you could pass-u
to sort to avoiduniq
.
â Mikael Kjær
Jan 23 at 3:35
@MikaelKjærsort
is needed becausefind
can output directorya
, thena/b
, thena
again. You're right about the-u
flag, thanks! I'm too used tosort|uniq
...
â PawkyPenguin
Jan 23 at 4:02
thatwhile read
loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shellwhile read
loop is to split on newline only (e.g.... | IFS= while read -d '' i; do ... ; done
. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
â cas
Jan 23 at 6:18
@cas Really? The POSIX manual forread
says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
â PawkyPenguin
Jan 23 at 14:43
Read a line from the standard input and split it into fields
...but you're right about it assigning everything to your $i, it also sayswith any leftover words assigned to the last NAME
...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting),$i
will have only a single space instead of what was originally in the input. There are many things that will break awhile read
loop. BTW, newlines in filenames are quite common on, e.g., Macs.
â cas
Jan 23 at 14:58
Is
sort
really doing anything here? If it is, you could pass -u
to sort to avoid uniq
.â Mikael Kjær
Jan 23 at 3:35
Is
sort
really doing anything here? If it is, you could pass -u
to sort to avoid uniq
.â Mikael Kjær
Jan 23 at 3:35
@MikaelKjær
sort
is needed because find
can output directory a
, then a/b
, then a
again. You're right about the -u
flag, thanks! I'm too used to sort|uniq
...â PawkyPenguin
Jan 23 at 4:02
@MikaelKjær
sort
is needed because find
can output directory a
, then a/b
, then a
again. You're right about the -u
flag, thanks! I'm too used to sort|uniq
...â PawkyPenguin
Jan 23 at 4:02
that
while read
loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read
loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done
. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.â cas
Jan 23 at 6:18
that
while read
loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read
loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done
. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.â cas
Jan 23 at 6:18
@cas Really? The POSIX manual for
read
says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.â PawkyPenguin
Jan 23 at 14:43
@cas Really? The POSIX manual for
read
says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.â PawkyPenguin
Jan 23 at 14:43
Read a line from the standard input and split it into fields
...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME
...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i
will have only a single space instead of what was originally in the input. There are many things that will break a while read
loop. BTW, newlines in filenames are quite common on, e.g., Macs.â cas
Jan 23 at 14:58
Read a line from the standard input and split it into fields
...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME
...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i
will have only a single space instead of what was originally in the input. There are many things that will break a while read
loop. BTW, newlines in filenames are quite common on, e.g., Macs.â cas
Jan 23 at 14:58
 |Â
show 2 more comments
up vote
4
down vote
You're re-inventing find
.
Try something like this (using GNU findutils
and GNU sort
):
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r -I sh -c "cd ; yourcommandhere"
The -printf
prints the directory names (%h
) where '*.xyz' files are found, with NUL bytes (00
) as the delimiter. sort
is used to eliminate duplicates, and then xargs
is used to cd
into each directory and run yourcommandhere
.
You can also write a script to run with xargs. e.g.
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r /path/to/myscript.sh
simple myscript.sh example:
#!/bin/sh
for d in "$@" ; do
cd "$d"
echo "Match found in $d. Going to execute command"
# execute command
done
This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.
BTW, neither printf
nor sort
nor xargs
are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.
Here's another way of doing the same thing, without sort or xargs:
find /target -iname '*.xyz' -exec bash -c
'typeset -A seen
for f in "$@"; do
d="$(dirname "$f")";
if [[ ! -v $seen[$d] ]]; then
echo "Match found in $d. Going to execute command"
# Execute command
seen["$d"]=1
fi
done' +
This uses an associative array in bash ($seen
) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml
files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.
The script executed by find's -exec
option can be a standalone script, as with the xargs version above.
BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.
See also-execdir
which gets rid of the explicitcd
.
â Kusalananda
Jan 23 at 5:32
yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz
), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
â cas
Jan 23 at 5:36
add a comment |Â
up vote
4
down vote
You're re-inventing find
.
Try something like this (using GNU findutils
and GNU sort
):
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r -I sh -c "cd ; yourcommandhere"
The -printf
prints the directory names (%h
) where '*.xyz' files are found, with NUL bytes (00
) as the delimiter. sort
is used to eliminate duplicates, and then xargs
is used to cd
into each directory and run yourcommandhere
.
You can also write a script to run with xargs. e.g.
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r /path/to/myscript.sh
simple myscript.sh example:
#!/bin/sh
for d in "$@" ; do
cd "$d"
echo "Match found in $d. Going to execute command"
# execute command
done
This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.
BTW, neither printf
nor sort
nor xargs
are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.
Here's another way of doing the same thing, without sort or xargs:
find /target -iname '*.xyz' -exec bash -c
'typeset -A seen
for f in "$@"; do
d="$(dirname "$f")";
if [[ ! -v $seen[$d] ]]; then
echo "Match found in $d. Going to execute command"
# Execute command
seen["$d"]=1
fi
done' +
This uses an associative array in bash ($seen
) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml
files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.
The script executed by find's -exec
option can be a standalone script, as with the xargs version above.
BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.
See also-execdir
which gets rid of the explicitcd
.
â Kusalananda
Jan 23 at 5:32
yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz
), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
â cas
Jan 23 at 5:36
add a comment |Â
up vote
4
down vote
up vote
4
down vote
You're re-inventing find
.
Try something like this (using GNU findutils
and GNU sort
):
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r -I sh -c "cd ; yourcommandhere"
The -printf
prints the directory names (%h
) where '*.xyz' files are found, with NUL bytes (00
) as the delimiter. sort
is used to eliminate duplicates, and then xargs
is used to cd
into each directory and run yourcommandhere
.
You can also write a script to run with xargs. e.g.
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r /path/to/myscript.sh
simple myscript.sh example:
#!/bin/sh
for d in "$@" ; do
cd "$d"
echo "Match found in $d. Going to execute command"
# execute command
done
This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.
BTW, neither printf
nor sort
nor xargs
are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.
Here's another way of doing the same thing, without sort or xargs:
find /target -iname '*.xyz' -exec bash -c
'typeset -A seen
for f in "$@"; do
d="$(dirname "$f")";
if [[ ! -v $seen[$d] ]]; then
echo "Match found in $d. Going to execute command"
# Execute command
seen["$d"]=1
fi
done' +
This uses an associative array in bash ($seen
) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml
files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.
The script executed by find's -exec
option can be a standalone script, as with the xargs version above.
BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.
You're re-inventing find
.
Try something like this (using GNU findutils
and GNU sort
):
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r -I sh -c "cd ; yourcommandhere"
The -printf
prints the directory names (%h
) where '*.xyz' files are found, with NUL bytes (00
) as the delimiter. sort
is used to eliminate duplicates, and then xargs
is used to cd
into each directory and run yourcommandhere
.
You can also write a script to run with xargs. e.g.
find /target -iname '*.xyz' -printf '%h00' | sort -z -u |
xargs -0 -r /path/to/myscript.sh
simple myscript.sh example:
#!/bin/sh
for d in "$@" ; do
cd "$d"
echo "Match found in $d. Going to execute command"
# execute command
done
This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.
BTW, neither printf
nor sort
nor xargs
are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.
Here's another way of doing the same thing, without sort or xargs:
find /target -iname '*.xyz' -exec bash -c
'typeset -A seen
for f in "$@"; do
d="$(dirname "$f")";
if [[ ! -v $seen[$d] ]]; then
echo "Match found in $d. Going to execute command"
# Execute command
seen["$d"]=1
fi
done' +
This uses an associative array in bash ($seen
) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml
files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.
The script executed by find's -exec
option can be a standalone script, as with the xargs version above.
BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.
edited Jan 23 at 5:36
answered Jan 23 at 2:58
cas
37.7k44393
37.7k44393
See also-execdir
which gets rid of the explicitcd
.
â Kusalananda
Jan 23 at 5:32
yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz
), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
â cas
Jan 23 at 5:36
add a comment |Â
See also-execdir
which gets rid of the explicitcd
.
â Kusalananda
Jan 23 at 5:32
yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz
), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
â cas
Jan 23 at 5:36
See also
-execdir
which gets rid of the explicit cd
.â Kusalananda
Jan 23 at 5:32
See also
-execdir
which gets rid of the explicit cd
.â Kusalananda
Jan 23 at 5:32
yeah, i know about -execdir but it can't be used here because a) we're matching filenames (
*.xyz
), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.â cas
Jan 23 at 5:36
yeah, i know about -execdir but it can't be used here because a) we're matching filenames (
*.xyz
), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.â cas
Jan 23 at 5:36
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f418988%2frecursively-iterate-through-all-subdirectories-if-a-file-with-a-specific-extens%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password