Recursively iterate through all subdirectories, If a file with a specific extension exists then run a command in that folder once

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I need to recursively iterate through all the subdirectories of a folder.
Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.



Here's what I have so far



recursive() wc -l)
if [[ $file_count -gt 0 ]]; then
echo "Match found. Going to execute a command"
#execute command
fi
done


(cd /target; recursive)


But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?







share|improve this question
























    up vote
    0
    down vote

    favorite












    I need to recursively iterate through all the subdirectories of a folder.
    Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.



    Here's what I have so far



    recursive() wc -l)
    if [[ $file_count -gt 0 ]]; then
    echo "Match found. Going to execute a command"
    #execute command
    fi
    done


    (cd /target; recursive)


    But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?







    share|improve this question






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I need to recursively iterate through all the subdirectories of a folder.
      Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.



      Here's what I have so far



      recursive() wc -l)
      if [[ $file_count -gt 0 ]]; then
      echo "Match found. Going to execute a command"
      #execute command
      fi
      done


      (cd /target; recursive)


      But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?







      share|improve this question












      I need to recursively iterate through all the subdirectories of a folder.
      Within the subdirectories, if there's a file with an extension '.xyz' then I need to run a specific command in that folder once.



      Here's what I have so far



      recursive() wc -l)
      if [[ $file_count -gt 0 ]]; then
      echo "Match found. Going to execute a command"
      #execute command
      fi
      done


      (cd /target; recursive)


      But the problem is that the "Match found.." message is displayed more than once per folder when there's a match. Is there a simpler way to do this while fixing this problem?









      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 23 at 2:16









      ishanipu

      32




      32




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          find has a builtin flag to print strings, which is pretty useful here:



          find -iname "*.xyz" -printf "%hn" prints the names of all directories that contain a file that matches your pattern (the %h is just find's magic syntax that expands to the file directory and n is, of course, a linebreak).



          Therefore, this does what you want:



          COMMAND='echo'
          find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
          cd "$i" && pwd && $COMMAND
          done


          There are a few things that are going on here. To execute commands only once, we just pipe it through sort with the -u flag, which drops all duplicate entries. Then we loop over everything with while. Also note that I used find `pwd`, which is a nice trick to make find output absolute paths, instead of relative ones, which allows us to use cd without having to worry about any relative paths.



          Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.






          share|improve this answer






















          • Is sort really doing anything here? If it is, you could pass -u to sort to avoid uniq.
            – Mikael Kjær
            Jan 23 at 3:35










          • @MikaelKjær sort is needed because find can output directory a, then a/b, then a again. You're right about the -u flag, thanks! I'm too used to sort|uniq...
            – PawkyPenguin
            Jan 23 at 4:02










          • that while read loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
            – cas
            Jan 23 at 6:18











          • @cas Really? The POSIX manual for read says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
            – PawkyPenguin
            Jan 23 at 14:43










          • Read a line from the standard input and split it into fields...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i will have only a single space instead of what was originally in the input. There are many things that will break a while read loop. BTW, newlines in filenames are quite common on, e.g., Macs.
            – cas
            Jan 23 at 14:58

















          up vote
          4
          down vote













          You're re-inventing find.



          Try something like this (using GNU findutils and GNU sort):



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r -I sh -c "cd ; yourcommandhere"


          The -printf prints the directory names (%h) where '*.xyz' files are found, with NUL bytes (00) as the delimiter. sort is used to eliminate duplicates, and then xargs is used to cd into each directory and run yourcommandhere.



          You can also write a script to run with xargs. e.g.



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r /path/to/myscript.sh


          simple myscript.sh example:



          #!/bin/sh

          for d in "$@" ; do
          cd "$d"
          echo "Match found in $d. Going to execute command"
          # execute command
          done


          This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.




          BTW, neither printf nor sort nor xargs are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.



          Here's another way of doing the same thing, without sort or xargs:



          find /target -iname '*.xyz' -exec bash -c 
          'typeset -A seen
          for f in "$@"; do
          d="$(dirname "$f")";
          if [[ ! -v $seen[$d] ]]; then
          echo "Match found in $d. Going to execute command"
          # Execute command
          seen["$d"]=1
          fi
          done' +


          This uses an associative array in bash ($seen) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.



          The script executed by find's -exec option can be a standalone script, as with the xargs version above.



          BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.






          share|improve this answer






















          • See also -execdir which gets rid of the explicit cd.
            – Kusalananda
            Jan 23 at 5:32










          • yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
            – cas
            Jan 23 at 5:36











          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f418988%2frecursively-iterate-through-all-subdirectories-if-a-file-with-a-specific-extens%23new-answer', 'question_page');

          );

          Post as a guest






























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          find has a builtin flag to print strings, which is pretty useful here:



          find -iname "*.xyz" -printf "%hn" prints the names of all directories that contain a file that matches your pattern (the %h is just find's magic syntax that expands to the file directory and n is, of course, a linebreak).



          Therefore, this does what you want:



          COMMAND='echo'
          find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
          cd "$i" && pwd && $COMMAND
          done


          There are a few things that are going on here. To execute commands only once, we just pipe it through sort with the -u flag, which drops all duplicate entries. Then we loop over everything with while. Also note that I used find `pwd`, which is a nice trick to make find output absolute paths, instead of relative ones, which allows us to use cd without having to worry about any relative paths.



          Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.






          share|improve this answer






















          • Is sort really doing anything here? If it is, you could pass -u to sort to avoid uniq.
            – Mikael Kjær
            Jan 23 at 3:35










          • @MikaelKjær sort is needed because find can output directory a, then a/b, then a again. You're right about the -u flag, thanks! I'm too used to sort|uniq...
            – PawkyPenguin
            Jan 23 at 4:02










          • that while read loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
            – cas
            Jan 23 at 6:18











          • @cas Really? The POSIX manual for read says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
            – PawkyPenguin
            Jan 23 at 14:43










          • Read a line from the standard input and split it into fields...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i will have only a single space instead of what was originally in the input. There are many things that will break a while read loop. BTW, newlines in filenames are quite common on, e.g., Macs.
            – cas
            Jan 23 at 14:58














          up vote
          2
          down vote



          accepted










          find has a builtin flag to print strings, which is pretty useful here:



          find -iname "*.xyz" -printf "%hn" prints the names of all directories that contain a file that matches your pattern (the %h is just find's magic syntax that expands to the file directory and n is, of course, a linebreak).



          Therefore, this does what you want:



          COMMAND='echo'
          find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
          cd "$i" && pwd && $COMMAND
          done


          There are a few things that are going on here. To execute commands only once, we just pipe it through sort with the -u flag, which drops all duplicate entries. Then we loop over everything with while. Also note that I used find `pwd`, which is a nice trick to make find output absolute paths, instead of relative ones, which allows us to use cd without having to worry about any relative paths.



          Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.






          share|improve this answer






















          • Is sort really doing anything here? If it is, you could pass -u to sort to avoid uniq.
            – Mikael Kjær
            Jan 23 at 3:35










          • @MikaelKjær sort is needed because find can output directory a, then a/b, then a again. You're right about the -u flag, thanks! I'm too used to sort|uniq...
            – PawkyPenguin
            Jan 23 at 4:02










          • that while read loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
            – cas
            Jan 23 at 6:18











          • @cas Really? The POSIX manual for read says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
            – PawkyPenguin
            Jan 23 at 14:43










          • Read a line from the standard input and split it into fields...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i will have only a single space instead of what was originally in the input. There are many things that will break a while read loop. BTW, newlines in filenames are quite common on, e.g., Macs.
            – cas
            Jan 23 at 14:58












          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          find has a builtin flag to print strings, which is pretty useful here:



          find -iname "*.xyz" -printf "%hn" prints the names of all directories that contain a file that matches your pattern (the %h is just find's magic syntax that expands to the file directory and n is, of course, a linebreak).



          Therefore, this does what you want:



          COMMAND='echo'
          find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
          cd "$i" && pwd && $COMMAND
          done


          There are a few things that are going on here. To execute commands only once, we just pipe it through sort with the -u flag, which drops all duplicate entries. Then we loop over everything with while. Also note that I used find `pwd`, which is a nice trick to make find output absolute paths, instead of relative ones, which allows us to use cd without having to worry about any relative paths.



          Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.






          share|improve this answer














          find has a builtin flag to print strings, which is pretty useful here:



          find -iname "*.xyz" -printf "%hn" prints the names of all directories that contain a file that matches your pattern (the %h is just find's magic syntax that expands to the file directory and n is, of course, a linebreak).



          Therefore, this does what you want:



          COMMAND='echo'
          find `pwd` -iname "*.pdf" -printf "%hn" | sort -u | while read i; do
          cd "$i" && pwd && $COMMAND
          done


          There are a few things that are going on here. To execute commands only once, we just pipe it through sort with the -u flag, which drops all duplicate entries. Then we loop over everything with while. Also note that I used find `pwd`, which is a nice trick to make find output absolute paths, instead of relative ones, which allows us to use cd without having to worry about any relative paths.



          Edit: Be careful with your directory names when executing this script, as directory names containing a newline (n) or even just can break the script (maybe other uncommon characters too, but I haven't tested any more than that). Fixing this is hard and I don't know how to do it, so I can only suggest not using such directories.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 23 at 14:49

























          answered Jan 23 at 2:59









          PawkyPenguin

          686110




          686110











          • Is sort really doing anything here? If it is, you could pass -u to sort to avoid uniq.
            – Mikael Kjær
            Jan 23 at 3:35










          • @MikaelKjær sort is needed because find can output directory a, then a/b, then a again. You're right about the -u flag, thanks! I'm too used to sort|uniq...
            – PawkyPenguin
            Jan 23 at 4:02










          • that while read loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
            – cas
            Jan 23 at 6:18











          • @cas Really? The POSIX manual for read says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
            – PawkyPenguin
            Jan 23 at 14:43










          • Read a line from the standard input and split it into fields...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i will have only a single space instead of what was originally in the input. There are many things that will break a while read loop. BTW, newlines in filenames are quite common on, e.g., Macs.
            – cas
            Jan 23 at 14:58
















          • Is sort really doing anything here? If it is, you could pass -u to sort to avoid uniq.
            – Mikael Kjær
            Jan 23 at 3:35










          • @MikaelKjær sort is needed because find can output directory a, then a/b, then a again. You're right about the -u flag, thanks! I'm too used to sort|uniq...
            – PawkyPenguin
            Jan 23 at 4:02










          • that while read loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
            – cas
            Jan 23 at 6:18











          • @cas Really? The POSIX manual for read says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
            – PawkyPenguin
            Jan 23 at 14:43










          • Read a line from the standard input and split it into fields...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i will have only a single space instead of what was originally in the input. There are many things that will break a while read loop. BTW, newlines in filenames are quite common on, e.g., Macs.
            – cas
            Jan 23 at 14:58















          Is sort really doing anything here? If it is, you could pass -u to sort to avoid uniq.
          – Mikael Kjær
          Jan 23 at 3:35




          Is sort really doing anything here? If it is, you could pass -u to sort to avoid uniq.
          – Mikael Kjær
          Jan 23 at 3:35












          @MikaelKjær sort is needed because find can output directory a, then a/b, then a again. You're right about the -u flag, thanks! I'm too used to sort|uniq...
          – PawkyPenguin
          Jan 23 at 4:02




          @MikaelKjær sort is needed because find can output directory a, then a/b, then a again. You're right about the -u flag, thanks! I'm too used to sort|uniq...
          – PawkyPenguin
          Jan 23 at 4:02












          that while read loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
          – cas
          Jan 23 at 6:18





          that while read loop splits words on whitespace, and will break if the pathname contains, e.g., spaces, tabs, newlines, etc. The best you can do with a shell while read loop is to split on newline only (e.g. ... | IFS= while read -d '' i; do ... ; done. Then it will only break if the path/filename contains newline(s) - uncommon but valid. Also, piping into a while loop prevents the loop from changing the parent scripts environment (because the pipe is executed in a subshell) - use process substitution instead if you need to do that.
          – cas
          Jan 23 at 6:18













          @cas Really? The POSIX manual for read says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
          – PawkyPenguin
          Jan 23 at 14:43




          @cas Really? The POSIX manual for read says that it reads a single line from stdin. I tested it with directories containing whitespaces on both bash and zsh, and it seems to work fine. You're right about the newlines of course. Your remark made me realize that it breaks with a name containing `` too, which is still uncommon but not quite as eccentric as newlines.
          – PawkyPenguin
          Jan 23 at 14:43












          Read a line from the standard input and split it into fields...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i will have only a single space instead of what was originally in the input. There are many things that will break a while read loop. BTW, newlines in filenames are quite common on, e.g., Macs.
          – cas
          Jan 23 at 14:58




          Read a line from the standard input and split it into fields...but you're right about it assigning everything to your $i, it also says with any leftover words assigned to the last NAME...if there's only one var, then the first variable name is also the last variable name. However, if the input line has 2+ spaces or a tab followed by a space or similar then (after splitting), $i will have only a single space instead of what was originally in the input. There are many things that will break a while read loop. BTW, newlines in filenames are quite common on, e.g., Macs.
          – cas
          Jan 23 at 14:58












          up vote
          4
          down vote













          You're re-inventing find.



          Try something like this (using GNU findutils and GNU sort):



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r -I sh -c "cd ; yourcommandhere"


          The -printf prints the directory names (%h) where '*.xyz' files are found, with NUL bytes (00) as the delimiter. sort is used to eliminate duplicates, and then xargs is used to cd into each directory and run yourcommandhere.



          You can also write a script to run with xargs. e.g.



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r /path/to/myscript.sh


          simple myscript.sh example:



          #!/bin/sh

          for d in "$@" ; do
          cd "$d"
          echo "Match found in $d. Going to execute command"
          # execute command
          done


          This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.




          BTW, neither printf nor sort nor xargs are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.



          Here's another way of doing the same thing, without sort or xargs:



          find /target -iname '*.xyz' -exec bash -c 
          'typeset -A seen
          for f in "$@"; do
          d="$(dirname "$f")";
          if [[ ! -v $seen[$d] ]]; then
          echo "Match found in $d. Going to execute command"
          # Execute command
          seen["$d"]=1
          fi
          done' +


          This uses an associative array in bash ($seen) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.



          The script executed by find's -exec option can be a standalone script, as with the xargs version above.



          BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.






          share|improve this answer






















          • See also -execdir which gets rid of the explicit cd.
            – Kusalananda
            Jan 23 at 5:32










          • yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
            – cas
            Jan 23 at 5:36















          up vote
          4
          down vote













          You're re-inventing find.



          Try something like this (using GNU findutils and GNU sort):



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r -I sh -c "cd ; yourcommandhere"


          The -printf prints the directory names (%h) where '*.xyz' files are found, with NUL bytes (00) as the delimiter. sort is used to eliminate duplicates, and then xargs is used to cd into each directory and run yourcommandhere.



          You can also write a script to run with xargs. e.g.



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r /path/to/myscript.sh


          simple myscript.sh example:



          #!/bin/sh

          for d in "$@" ; do
          cd "$d"
          echo "Match found in $d. Going to execute command"
          # execute command
          done


          This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.




          BTW, neither printf nor sort nor xargs are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.



          Here's another way of doing the same thing, without sort or xargs:



          find /target -iname '*.xyz' -exec bash -c 
          'typeset -A seen
          for f in "$@"; do
          d="$(dirname "$f")";
          if [[ ! -v $seen[$d] ]]; then
          echo "Match found in $d. Going to execute command"
          # Execute command
          seen["$d"]=1
          fi
          done' +


          This uses an associative array in bash ($seen) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.



          The script executed by find's -exec option can be a standalone script, as with the xargs version above.



          BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.






          share|improve this answer






















          • See also -execdir which gets rid of the explicit cd.
            – Kusalananda
            Jan 23 at 5:32










          • yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
            – cas
            Jan 23 at 5:36













          up vote
          4
          down vote










          up vote
          4
          down vote









          You're re-inventing find.



          Try something like this (using GNU findutils and GNU sort):



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r -I sh -c "cd ; yourcommandhere"


          The -printf prints the directory names (%h) where '*.xyz' files are found, with NUL bytes (00) as the delimiter. sort is used to eliminate duplicates, and then xargs is used to cd into each directory and run yourcommandhere.



          You can also write a script to run with xargs. e.g.



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r /path/to/myscript.sh


          simple myscript.sh example:



          #!/bin/sh

          for d in "$@" ; do
          cd "$d"
          echo "Match found in $d. Going to execute command"
          # execute command
          done


          This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.




          BTW, neither printf nor sort nor xargs are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.



          Here's another way of doing the same thing, without sort or xargs:



          find /target -iname '*.xyz' -exec bash -c 
          'typeset -A seen
          for f in "$@"; do
          d="$(dirname "$f")";
          if [[ ! -v $seen[$d] ]]; then
          echo "Match found in $d. Going to execute command"
          # Execute command
          seen["$d"]=1
          fi
          done' +


          This uses an associative array in bash ($seen) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.



          The script executed by find's -exec option can be a standalone script, as with the xargs version above.



          BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.






          share|improve this answer














          You're re-inventing find.



          Try something like this (using GNU findutils and GNU sort):



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r -I sh -c "cd ; yourcommandhere"


          The -printf prints the directory names (%h) where '*.xyz' files are found, with NUL bytes (00) as the delimiter. sort is used to eliminate duplicates, and then xargs is used to cd into each directory and run yourcommandhere.



          You can also write a script to run with xargs. e.g.



          find /target -iname '*.xyz' -printf '%h00' | sort -z -u | 
          xargs -0 -r /path/to/myscript.sh


          simple myscript.sh example:



          #!/bin/sh

          for d in "$@" ; do
          cd "$d"
          echo "Match found in $d. Going to execute command"
          # execute command
          done


          This second version will be significantly faster if there are many matching directories - it only has to fork a shell once (which then iterates over every argument) rather than forking a shell once per directory.




          BTW, neither printf nor sort nor xargs are actually needed here....but they do make it a lot easier to read and understand what's happening. Just as importantly, by eliminating the duplicates early (with the printf and sort), it runs a lot faster than using bash only and eliminates the (fairly minimal) risk of executing the command more than once in any given directory.



          Here's another way of doing the same thing, without sort or xargs:



          find /target -iname '*.xyz' -exec bash -c 
          'typeset -A seen
          for f in "$@"; do
          d="$(dirname "$f")";
          if [[ ! -v $seen[$d] ]]; then
          echo "Match found in $d. Going to execute command"
          # Execute command
          seen["$d"]=1
          fi
          done' +


          This uses an associative array in bash ($seen) to keep track of which directories have already been seen and processed. Note that if there are many thousands of matching *.xml files (enough to exceed the maximum command-line length, so that the bash script is forked more than once) then your command may be run more than once in any given directory.



          The script executed by find's -exec option can be a standalone script, as with the xargs version above.



          BTW, any of the variants here could just as easily execute an awk or perl or whatever script instead of a sh or bash script.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 23 at 5:36

























          answered Jan 23 at 2:58









          cas

          37.7k44393




          37.7k44393











          • See also -execdir which gets rid of the explicit cd.
            – Kusalananda
            Jan 23 at 5:32










          • yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
            – cas
            Jan 23 at 5:36

















          • See also -execdir which gets rid of the explicit cd.
            – Kusalananda
            Jan 23 at 5:32










          • yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
            – cas
            Jan 23 at 5:36
















          See also -execdir which gets rid of the explicit cd.
          – Kusalananda
          Jan 23 at 5:32




          See also -execdir which gets rid of the explicit cd.
          – Kusalananda
          Jan 23 at 5:32












          yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
          – cas
          Jan 23 at 5:36





          yeah, i know about -execdir but it can't be used here because a) we're matching filenames (*.xyz), and most significantly b) we only want to execute once in each directory regardless of how many .xyz files are in it, and c) we only want to exec bash once (or as few times as possible), not once per matching file.
          – cas
          Jan 23 at 5:36













           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f418988%2frecursively-iterate-through-all-subdirectories-if-a-file-with-a-specific-extens%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?