Command to find and combine files matching a complex name pattern

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












1















My Linux directory contains a dump of files and they look like:



EDW_Infile_ABC_Daily_Activity_20190204.csv
EDW_Infile_ABC_Daily_Activity.zip
EDW_Infile_PQRInc_Daily_Activity_20190204.csv
EDW_Infile_PQRInc_Daily_Activity_zip
EDW_Infile_ABC_Daily_Payment_20190204.csv
EDW_Infile_PQRInc_Daily_Payment_20190204.csv
EDW_Infile_ABC_Daily_Status_20190204.csv
EDW_Infile_PQRInc_Daily_Status_20190204.csv


These files follow few common name patterns such as



EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv


How can I -



1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv



2) Each file contains a header. How can I combine all of them into one file and have only one header










share|improve this question


























    1















    My Linux directory contains a dump of files and they look like:



    EDW_Infile_ABC_Daily_Activity_20190204.csv
    EDW_Infile_ABC_Daily_Activity.zip
    EDW_Infile_PQRInc_Daily_Activity_20190204.csv
    EDW_Infile_PQRInc_Daily_Activity_zip
    EDW_Infile_ABC_Daily_Payment_20190204.csv
    EDW_Infile_PQRInc_Daily_Payment_20190204.csv
    EDW_Infile_ABC_Daily_Status_20190204.csv
    EDW_Infile_PQRInc_Daily_Status_20190204.csv


    These files follow few common name patterns such as



    EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
    EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
    EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv


    How can I -



    1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv



    2) Each file contains a header. How can I combine all of them into one file and have only one header










    share|improve this question
























      1












      1








      1


      0






      My Linux directory contains a dump of files and they look like:



      EDW_Infile_ABC_Daily_Activity_20190204.csv
      EDW_Infile_ABC_Daily_Activity.zip
      EDW_Infile_PQRInc_Daily_Activity_20190204.csv
      EDW_Infile_PQRInc_Daily_Activity_zip
      EDW_Infile_ABC_Daily_Payment_20190204.csv
      EDW_Infile_PQRInc_Daily_Payment_20190204.csv
      EDW_Infile_ABC_Daily_Status_20190204.csv
      EDW_Infile_PQRInc_Daily_Status_20190204.csv


      These files follow few common name patterns such as



      EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
      EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
      EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv


      How can I -



      1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv



      2) Each file contains a header. How can I combine all of them into one file and have only one header










      share|improve this question














      My Linux directory contains a dump of files and they look like:



      EDW_Infile_ABC_Daily_Activity_20190204.csv
      EDW_Infile_ABC_Daily_Activity.zip
      EDW_Infile_PQRInc_Daily_Activity_20190204.csv
      EDW_Infile_PQRInc_Daily_Activity_zip
      EDW_Infile_ABC_Daily_Payment_20190204.csv
      EDW_Infile_PQRInc_Daily_Payment_20190204.csv
      EDW_Infile_ABC_Daily_Status_20190204.csv
      EDW_Infile_PQRInc_Daily_Status_20190204.csv


      These files follow few common name patterns such as



      EDW_Infile_*<3 to 8 bytes company name>*_Daily_Activity_*YYYYMMDD*.csv
      EDW_Infile_*<3 to 8 bytes company name>*_Daily_Payment_*YYYYMMDD*.csv
      EDW_Infile_*<3 to 8 bytes company name>*_Daily_Status_*YYYYMMDD*.csv


      How can I -



      1) Find all files for all customers, for all dates, which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv



      2) Each file contains a header. How can I combine all of them into one file and have only one header







      linux shell-script shell find






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 11 at 19:07









      NikNik

      82




      82




















          2 Answers
          2






          active

          oldest

          votes


















          1














          I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv and so didn't want to use a * wildcard.



          To gather the list of filenames ...




          which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv




          I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):



          $ set -o extended_glob
          $ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)


          The pattern, apart from the plain text, is:




          • ? -- any (single) character


          • (#c3,8) -- require between three and eight characters, inclusive


          • [[:digit:]] -- require a digit


          • (#c8) -- require eight of them

          See the list with:



          $ print -l $files
          EDW_Infile_ABC_Daily_Activity_20190204.csv
          EDW_Infile_PQRInc_Daily_Activity_20190204.csv


          To then ...




          combine all of them into one file and have only one header




           head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv


          This groups two commands and redirects their output to output.csv. The first command, head, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).






          share|improve this answer
































            0














            You might want something like this



            # collect all the "EDW_Infile_ABC" prefixes
            declare -A prefix
            for f in EDQ_Infile_*_Daily_Activity_*.csv; do
            p=$f%_*.csv
            prefix[$p]=1
            done

            for p in "$!prefixes[@]"; do
            awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
            zip "$p".zip "$p"_all.csv
            rm "$p"_all.csv
            done


            For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.






            share|improve this answer






















              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500013%2fcommand-to-find-and-combine-files-matching-a-complex-name-pattern%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv and so didn't want to use a * wildcard.



              To gather the list of filenames ...




              which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv




              I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):



              $ set -o extended_glob
              $ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)


              The pattern, apart from the plain text, is:




              • ? -- any (single) character


              • (#c3,8) -- require between three and eight characters, inclusive


              • [[:digit:]] -- require a digit


              • (#c8) -- require eight of them

              See the list with:



              $ print -l $files
              EDW_Infile_ABC_Daily_Activity_20190204.csv
              EDW_Infile_PQRInc_Daily_Activity_20190204.csv


              To then ...




              combine all of them into one file and have only one header




               head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv


              This groups two commands and redirects their output to output.csv. The first command, head, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).






              share|improve this answer





























                1














                I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv and so didn't want to use a * wildcard.



                To gather the list of filenames ...




                which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv




                I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):



                $ set -o extended_glob
                $ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)


                The pattern, apart from the plain text, is:




                • ? -- any (single) character


                • (#c3,8) -- require between three and eight characters, inclusive


                • [[:digit:]] -- require a digit


                • (#c8) -- require eight of them

                See the list with:



                $ print -l $files
                EDW_Infile_ABC_Daily_Activity_20190204.csv
                EDW_Infile_PQRInc_Daily_Activity_20190204.csv


                To then ...




                combine all of them into one file and have only one header




                 head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv


                This groups two commands and redirects their output to output.csv. The first command, head, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).






                share|improve this answer



























                  1












                  1








                  1







                  I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv and so didn't want to use a * wildcard.



                  To gather the list of filenames ...




                  which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv




                  I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):



                  $ set -o extended_glob
                  $ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)


                  The pattern, apart from the plain text, is:




                  • ? -- any (single) character


                  • (#c3,8) -- require between three and eight characters, inclusive


                  • [[:digit:]] -- require a digit


                  • (#c8) -- require eight of them

                  See the list with:



                  $ print -l $files
                  EDW_Infile_ABC_Daily_Activity_20190204.csv
                  EDW_Infile_PQRInc_Daily_Activity_20190204.csv


                  To then ...




                  combine all of them into one file and have only one header




                   head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv


                  This groups two commands and redirects their output to output.csv. The first command, head, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).






                  share|improve this answer















                  I pushed my zsh knowledge a bit in order to answer more specifically, in case you weren't in control of the filenames and had files named like EDQ_Infile_some uninteresting stuff here_Daily_Activity_junk here.csv and so didn't want to use a * wildcard.



                  To gather the list of filenames ...




                  which follow the pattern EDW_Infile_3 to 8 bytes any name_Daily_Activity_Any Date.csv




                  I would set up this extended_glob pattern in zsh (don't type the $ -- that's the shell prompt):



                  $ set -o extended_glob
                  $ files=(EDW_Infile_?(#c3,8)_Daily_Activity_[[:digit:]](#c8).csv)


                  The pattern, apart from the plain text, is:




                  • ? -- any (single) character


                  • (#c3,8) -- require between three and eight characters, inclusive


                  • [[:digit:]] -- require a digit


                  • (#c8) -- require eight of them

                  See the list with:



                  $ print -l $files
                  EDW_Infile_ABC_Daily_Activity_20190204.csv
                  EDW_Infile_PQRInc_Daily_Activity_20190204.csv


                  To then ...




                  combine all of them into one file and have only one header




                   head -1 "$files[1]"; for f in $files; do sed 1d "$f"; done; > output.csv


                  This groups two commands and redirects their output to output.csv. The first command, head, takes the first line from the first file in the array; the second command then loops through all of the files and deletes the first line (default-printing the remainder to stdout).







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Feb 12 at 14:17

























                  answered Feb 11 at 20:57









                  Jeff SchallerJeff Schaller

                  42.9k1159137




                  42.9k1159137























                      0














                      You might want something like this



                      # collect all the "EDW_Infile_ABC" prefixes
                      declare -A prefix
                      for f in EDQ_Infile_*_Daily_Activity_*.csv; do
                      p=$f%_*.csv
                      prefix[$p]=1
                      done

                      for p in "$!prefixes[@]"; do
                      awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
                      zip "$p".zip "$p"_all.csv
                      rm "$p"_all.csv
                      done


                      For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.






                      share|improve this answer



























                        0














                        You might want something like this



                        # collect all the "EDW_Infile_ABC" prefixes
                        declare -A prefix
                        for f in EDQ_Infile_*_Daily_Activity_*.csv; do
                        p=$f%_*.csv
                        prefix[$p]=1
                        done

                        for p in "$!prefixes[@]"; do
                        awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
                        zip "$p".zip "$p"_all.csv
                        rm "$p"_all.csv
                        done


                        For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.






                        share|improve this answer

























                          0












                          0








                          0







                          You might want something like this



                          # collect all the "EDW_Infile_ABC" prefixes
                          declare -A prefix
                          for f in EDQ_Infile_*_Daily_Activity_*.csv; do
                          p=$f%_*.csv
                          prefix[$p]=1
                          done

                          for p in "$!prefixes[@]"; do
                          awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
                          zip "$p".zip "$p"_all.csv
                          rm "$p"_all.csv
                          done


                          For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.






                          share|improve this answer













                          You might want something like this



                          # collect all the "EDW_Infile_ABC" prefixes
                          declare -A prefix
                          for f in EDQ_Infile_*_Daily_Activity_*.csv; do
                          p=$f%_*.csv
                          prefix[$p]=1
                          done

                          for p in "$!prefixes[@]"; do
                          awk 'NR==1 print FNR==1next print' "$p"_*.csv > "$p"_all.csv
                          zip "$p".zip "$p"_all.csv
                          rm "$p"_all.csv
                          done


                          For bash, requires version 4 for associative arrays. Otherwise, we can work with positional parameters.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Feb 11 at 20:22









                          glenn jackmanglenn jackman

                          52.1k572112




                          52.1k572112



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Unix & Linux Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500013%2fcommand-to-find-and-combine-files-matching-a-complex-name-pattern%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown






                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Displaying single band from multi-band raster using QGIS

                              How many registers does an x86_64 CPU actually have?