How can I merge the lines of two files by having common headers?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
8
down vote

favorite
2












I want to merge two files based on the common data present in them as header.



Following is the example



File1



>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g


File 2



>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r


And here's the kind of output I want:



>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r


I have tried some awk and sed but clearly have not been successful, how can I do this?







share|improve this question


























    up vote
    8
    down vote

    favorite
    2












    I want to merge two files based on the common data present in them as header.



    Following is the example



    File1



    >Feature scaffold1
    1 100 g
    101 200 g
    201 300 g
    >Feature scaffold2
    1 100 g
    01 500 g
    >Feature scaffold3
    10 500 g
    >Feature scaffold4
    10 300 g


    File 2



    >Feature scaffold1
    500 500 r
    900 1000 r
    >Feature scaffold2
    200 300 r
    >Feature scaffold3
    100 200 r
    >Feature scaffold4
    500 600 r
    >Feature scaffold5
    1 1000 r


    And here's the kind of output I want:



    >Feature scaffold1
    1 100 g
    101 200 g
    201 300 g
    500 500 r
    900 1000 r
    >Feature scaffold2
    1 100 g
    01 500 g
    200 300 r
    >Feature scaffold3
    10 500 g
    100 200 r
    >Feature scaffold4
    10 300 g
    500 600 r
    >Feature scaffold5
    1 1000 r


    I have tried some awk and sed but clearly have not been successful, how can I do this?







    share|improve this question
























      up vote
      8
      down vote

      favorite
      2









      up vote
      8
      down vote

      favorite
      2






      2





      I want to merge two files based on the common data present in them as header.



      Following is the example



      File1



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      >Feature scaffold2
      1 100 g
      01 500 g
      >Feature scaffold3
      10 500 g
      >Feature scaffold4
      10 300 g


      File 2



      >Feature scaffold1
      500 500 r
      900 1000 r
      >Feature scaffold2
      200 300 r
      >Feature scaffold3
      100 200 r
      >Feature scaffold4
      500 600 r
      >Feature scaffold5
      1 1000 r


      And here's the kind of output I want:



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      500 500 r
      900 1000 r
      >Feature scaffold2
      1 100 g
      01 500 g
      200 300 r
      >Feature scaffold3
      10 500 g
      100 200 r
      >Feature scaffold4
      10 300 g
      500 600 r
      >Feature scaffold5
      1 1000 r


      I have tried some awk and sed but clearly have not been successful, how can I do this?







      share|improve this question














      I want to merge two files based on the common data present in them as header.



      Following is the example



      File1



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      >Feature scaffold2
      1 100 g
      01 500 g
      >Feature scaffold3
      10 500 g
      >Feature scaffold4
      10 300 g


      File 2



      >Feature scaffold1
      500 500 r
      900 1000 r
      >Feature scaffold2
      200 300 r
      >Feature scaffold3
      100 200 r
      >Feature scaffold4
      500 600 r
      >Feature scaffold5
      1 1000 r


      And here's the kind of output I want:



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      500 500 r
      900 1000 r
      >Feature scaffold2
      1 100 g
      01 500 g
      200 300 r
      >Feature scaffold3
      10 500 g
      100 200 r
      >Feature scaffold4
      10 300 g
      500 600 r
      >Feature scaffold5
      1 1000 r


      I have tried some awk and sed but clearly have not been successful, how can I do this?









      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 4 at 9:52









      αғsнιη

      15.3k92462




      15.3k92462










      asked Jan 4 at 8:00









      Namrata Patel

      484




      484




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          7
          down vote



          accepted










          Awk solution:



          awk '/^>/ k=$1 FS $2 
          NR==FNR
          if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

          k in a
          print $0 ORS a[k]; delete a[k]; next
          1' file1 file2



          • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


          • NR==FNR ... - processing the 1st input file (file1):


            • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


            • next - jump to next record



          • k in a - if current key based on file2 record is in array a(based on file1 records):


            • print $0 ORS a[k] - print related records


            • delete a[k] - delete processed item(s)



          The output:



          >Feature scaffold1
          1 100 g
          101 200 g
          201 300 g
          500 500 r
          900 1000 r
          >Feature scaffold2
          1 100 g
          01 500 g
          200 300 r
          >Feature scaffold3
          10 500 g
          100 200 r
          >Feature scaffold4
          10 300 g
          500 600 r
          >Feature scaffold5
          1 1000 r





          share|improve this answer





























            up vote
            4
            down vote













            Another approach and to make it simpler.



            grep -v '^scaffold' <(awk -v RS='>Feature ' 
            'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





            share|improve this answer




















              Your Answer







              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );








               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414718%2fhow-can-i-merge-the-lines-of-two-files-by-having-common-headers%23new-answer', 'question_page');

              );

              Post as a guest






























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              7
              down vote



              accepted










              Awk solution:



              awk '/^>/ k=$1 FS $2 
              NR==FNR
              if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

              k in a
              print $0 ORS a[k]; delete a[k]; next
              1' file1 file2



              • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


              • NR==FNR ... - processing the 1st input file (file1):


                • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                • next - jump to next record



              • k in a - if current key based on file2 record is in array a(based on file1 records):


                • print $0 ORS a[k] - print related records


                • delete a[k] - delete processed item(s)



              The output:



              >Feature scaffold1
              1 100 g
              101 200 g
              201 300 g
              500 500 r
              900 1000 r
              >Feature scaffold2
              1 100 g
              01 500 g
              200 300 r
              >Feature scaffold3
              10 500 g
              100 200 r
              >Feature scaffold4
              10 300 g
              500 600 r
              >Feature scaffold5
              1 1000 r





              share|improve this answer


























                up vote
                7
                down vote



                accepted










                Awk solution:



                awk '/^>/ k=$1 FS $2 
                NR==FNR
                if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

                k in a
                print $0 ORS a[k]; delete a[k]; next
                1' file1 file2



                • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


                • NR==FNR ... - processing the 1st input file (file1):


                  • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                  • next - jump to next record



                • k in a - if current key based on file2 record is in array a(based on file1 records):


                  • print $0 ORS a[k] - print related records


                  • delete a[k] - delete processed item(s)



                The output:



                >Feature scaffold1
                1 100 g
                101 200 g
                201 300 g
                500 500 r
                900 1000 r
                >Feature scaffold2
                1 100 g
                01 500 g
                200 300 r
                >Feature scaffold3
                10 500 g
                100 200 r
                >Feature scaffold4
                10 300 g
                500 600 r
                >Feature scaffold5
                1 1000 r





                share|improve this answer
























                  up vote
                  7
                  down vote



                  accepted







                  up vote
                  7
                  down vote



                  accepted






                  Awk solution:



                  awk '/^>/ k=$1 FS $2 
                  NR==FNR
                  if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

                  k in a
                  print $0 ORS a[k]; delete a[k]; next
                  1' file1 file2



                  • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


                  • NR==FNR ... - processing the 1st input file (file1):


                    • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                    • next - jump to next record



                  • k in a - if current key based on file2 record is in array a(based on file1 records):


                    • print $0 ORS a[k] - print related records


                    • delete a[k] - delete processed item(s)



                  The output:



                  >Feature scaffold1
                  1 100 g
                  101 200 g
                  201 300 g
                  500 500 r
                  900 1000 r
                  >Feature scaffold2
                  1 100 g
                  01 500 g
                  200 300 r
                  >Feature scaffold3
                  10 500 g
                  100 200 r
                  >Feature scaffold4
                  10 300 g
                  500 600 r
                  >Feature scaffold5
                  1 1000 r





                  share|improve this answer














                  Awk solution:



                  awk '/^>/ k=$1 FS $2 
                  NR==FNR
                  if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

                  k in a
                  print $0 ORS a[k]; delete a[k]; next
                  1' file1 file2



                  • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


                  • NR==FNR ... - processing the 1st input file (file1):


                    • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                    • next - jump to next record



                  • k in a - if current key based on file2 record is in array a(based on file1 records):


                    • print $0 ORS a[k] - print related records


                    • delete a[k] - delete processed item(s)



                  The output:



                  >Feature scaffold1
                  1 100 g
                  101 200 g
                  201 300 g
                  500 500 r
                  900 1000 r
                  >Feature scaffold2
                  1 100 g
                  01 500 g
                  200 300 r
                  >Feature scaffold3
                  10 500 g
                  100 200 r
                  >Feature scaffold4
                  10 300 g
                  500 600 r
                  >Feature scaffold5
                  1 1000 r






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 4 at 15:40

























                  answered Jan 4 at 8:30









                  RomanPerekhrest

                  22.4k12145




                  22.4k12145






















                      up vote
                      4
                      down vote













                      Another approach and to make it simpler.



                      grep -v '^scaffold' <(awk -v RS='>Feature ' 
                      'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





                      share|improve this answer
























                        up vote
                        4
                        down vote













                        Another approach and to make it simpler.



                        grep -v '^scaffold' <(awk -v RS='>Feature ' 
                        'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





                        share|improve this answer






















                          up vote
                          4
                          down vote










                          up vote
                          4
                          down vote









                          Another approach and to make it simpler.



                          grep -v '^scaffold' <(awk -v RS='>Feature ' 
                          'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





                          share|improve this answer












                          Another approach and to make it simpler.



                          grep -v '^scaffold' <(awk -v RS='>Feature ' 
                          'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Jan 4 at 9:46









                          αғsнιη

                          15.3k92462




                          15.3k92462






















                               

                              draft saved


                              draft discarded


























                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414718%2fhow-can-i-merge-the-lines-of-two-files-by-having-common-headers%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Displaying single band from multi-band raster using QGIS

                              How many registers does an x86_64 CPU actually have?