How can I merge the lines of two files by having common headers?

Multi tool use
Multi tool use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
8
down vote

favorite
2












I want to merge two files based on the common data present in them as header.



Following is the example



File1



>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g


File 2



>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r


And here's the kind of output I want:



>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r


I have tried some awk and sed but clearly have not been successful, how can I do this?







share|improve this question


























    up vote
    8
    down vote

    favorite
    2












    I want to merge two files based on the common data present in them as header.



    Following is the example



    File1



    >Feature scaffold1
    1 100 g
    101 200 g
    201 300 g
    >Feature scaffold2
    1 100 g
    01 500 g
    >Feature scaffold3
    10 500 g
    >Feature scaffold4
    10 300 g


    File 2



    >Feature scaffold1
    500 500 r
    900 1000 r
    >Feature scaffold2
    200 300 r
    >Feature scaffold3
    100 200 r
    >Feature scaffold4
    500 600 r
    >Feature scaffold5
    1 1000 r


    And here's the kind of output I want:



    >Feature scaffold1
    1 100 g
    101 200 g
    201 300 g
    500 500 r
    900 1000 r
    >Feature scaffold2
    1 100 g
    01 500 g
    200 300 r
    >Feature scaffold3
    10 500 g
    100 200 r
    >Feature scaffold4
    10 300 g
    500 600 r
    >Feature scaffold5
    1 1000 r


    I have tried some awk and sed but clearly have not been successful, how can I do this?







    share|improve this question
























      up vote
      8
      down vote

      favorite
      2









      up vote
      8
      down vote

      favorite
      2






      2





      I want to merge two files based on the common data present in them as header.



      Following is the example



      File1



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      >Feature scaffold2
      1 100 g
      01 500 g
      >Feature scaffold3
      10 500 g
      >Feature scaffold4
      10 300 g


      File 2



      >Feature scaffold1
      500 500 r
      900 1000 r
      >Feature scaffold2
      200 300 r
      >Feature scaffold3
      100 200 r
      >Feature scaffold4
      500 600 r
      >Feature scaffold5
      1 1000 r


      And here's the kind of output I want:



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      500 500 r
      900 1000 r
      >Feature scaffold2
      1 100 g
      01 500 g
      200 300 r
      >Feature scaffold3
      10 500 g
      100 200 r
      >Feature scaffold4
      10 300 g
      500 600 r
      >Feature scaffold5
      1 1000 r


      I have tried some awk and sed but clearly have not been successful, how can I do this?







      share|improve this question














      I want to merge two files based on the common data present in them as header.



      Following is the example



      File1



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      >Feature scaffold2
      1 100 g
      01 500 g
      >Feature scaffold3
      10 500 g
      >Feature scaffold4
      10 300 g


      File 2



      >Feature scaffold1
      500 500 r
      900 1000 r
      >Feature scaffold2
      200 300 r
      >Feature scaffold3
      100 200 r
      >Feature scaffold4
      500 600 r
      >Feature scaffold5
      1 1000 r


      And here's the kind of output I want:



      >Feature scaffold1
      1 100 g
      101 200 g
      201 300 g
      500 500 r
      900 1000 r
      >Feature scaffold2
      1 100 g
      01 500 g
      200 300 r
      >Feature scaffold3
      10 500 g
      100 200 r
      >Feature scaffold4
      10 300 g
      500 600 r
      >Feature scaffold5
      1 1000 r


      I have tried some awk and sed but clearly have not been successful, how can I do this?









      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 4 at 9:52









      αғsнιη

      15.3k92462




      15.3k92462










      asked Jan 4 at 8:00









      Namrata Patel

      484




      484




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          7
          down vote



          accepted










          Awk solution:



          awk '/^>/ k=$1 FS $2 
          NR==FNR
          if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

          k in a
          print $0 ORS a[k]; delete a[k]; next
          1' file1 file2



          • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


          • NR==FNR ... - processing the 1st input file (file1):


            • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


            • next - jump to next record



          • k in a - if current key based on file2 record is in array a(based on file1 records):


            • print $0 ORS a[k] - print related records


            • delete a[k] - delete processed item(s)



          The output:



          >Feature scaffold1
          1 100 g
          101 200 g
          201 300 g
          500 500 r
          900 1000 r
          >Feature scaffold2
          1 100 g
          01 500 g
          200 300 r
          >Feature scaffold3
          10 500 g
          100 200 r
          >Feature scaffold4
          10 300 g
          500 600 r
          >Feature scaffold5
          1 1000 r





          share|improve this answer





























            up vote
            4
            down vote













            Another approach and to make it simpler.



            grep -v '^scaffold' <(awk -v RS='>Feature ' 
            'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





            share|improve this answer




















              Your Answer







              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );








               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414718%2fhow-can-i-merge-the-lines-of-two-files-by-having-common-headers%23new-answer', 'question_page');

              );

              Post as a guest






























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              7
              down vote



              accepted










              Awk solution:



              awk '/^>/ k=$1 FS $2 
              NR==FNR
              if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

              k in a
              print $0 ORS a[k]; delete a[k]; next
              1' file1 file2



              • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


              • NR==FNR ... - processing the 1st input file (file1):


                • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                • next - jump to next record



              • k in a - if current key based on file2 record is in array a(based on file1 records):


                • print $0 ORS a[k] - print related records


                • delete a[k] - delete processed item(s)



              The output:



              >Feature scaffold1
              1 100 g
              101 200 g
              201 300 g
              500 500 r
              900 1000 r
              >Feature scaffold2
              1 100 g
              01 500 g
              200 300 r
              >Feature scaffold3
              10 500 g
              100 200 r
              >Feature scaffold4
              10 300 g
              500 600 r
              >Feature scaffold5
              1 1000 r





              share|improve this answer


























                up vote
                7
                down vote



                accepted










                Awk solution:



                awk '/^>/ k=$1 FS $2 
                NR==FNR
                if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

                k in a
                print $0 ORS a[k]; delete a[k]; next
                1' file1 file2



                • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


                • NR==FNR ... - processing the 1st input file (file1):


                  • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                  • next - jump to next record



                • k in a - if current key based on file2 record is in array a(based on file1 records):


                  • print $0 ORS a[k] - print related records


                  • delete a[k] - delete processed item(s)



                The output:



                >Feature scaffold1
                1 100 g
                101 200 g
                201 300 g
                500 500 r
                900 1000 r
                >Feature scaffold2
                1 100 g
                01 500 g
                200 300 r
                >Feature scaffold3
                10 500 g
                100 200 r
                >Feature scaffold4
                10 300 g
                500 600 r
                >Feature scaffold5
                1 1000 r





                share|improve this answer
























                  up vote
                  7
                  down vote



                  accepted







                  up vote
                  7
                  down vote



                  accepted






                  Awk solution:



                  awk '/^>/ k=$1 FS $2 
                  NR==FNR
                  if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

                  k in a
                  print $0 ORS a[k]; delete a[k]; next
                  1' file1 file2



                  • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


                  • NR==FNR ... - processing the 1st input file (file1):


                    • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                    • next - jump to next record



                  • k in a - if current key based on file2 record is in array a(based on file1 records):


                    • print $0 ORS a[k] - print related records


                    • delete a[k] - delete processed item(s)



                  The output:



                  >Feature scaffold1
                  1 100 g
                  101 200 g
                  201 300 g
                  500 500 r
                  900 1000 r
                  >Feature scaffold2
                  1 100 g
                  01 500 g
                  200 300 r
                  >Feature scaffold3
                  10 500 g
                  100 200 r
                  >Feature scaffold4
                  10 300 g
                  500 600 r
                  >Feature scaffold5
                  1 1000 r





                  share|improve this answer














                  Awk solution:



                  awk '/^>/ k=$1 FS $2 
                  NR==FNR
                  if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next

                  k in a
                  print $0 ORS a[k]; delete a[k]; next
                  1' file1 file2



                  • /^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields


                  • NR==FNR ... - processing the 1st input file (file1):


                    • if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k


                    • next - jump to next record



                  • k in a - if current key based on file2 record is in array a(based on file1 records):


                    • print $0 ORS a[k] - print related records


                    • delete a[k] - delete processed item(s)



                  The output:



                  >Feature scaffold1
                  1 100 g
                  101 200 g
                  201 300 g
                  500 500 r
                  900 1000 r
                  >Feature scaffold2
                  1 100 g
                  01 500 g
                  200 300 r
                  >Feature scaffold3
                  10 500 g
                  100 200 r
                  >Feature scaffold4
                  10 300 g
                  500 600 r
                  >Feature scaffold5
                  1 1000 r






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 4 at 15:40

























                  answered Jan 4 at 8:30









                  RomanPerekhrest

                  22.4k12145




                  22.4k12145






















                      up vote
                      4
                      down vote













                      Another approach and to make it simpler.



                      grep -v '^scaffold' <(awk -v RS='>Feature ' 
                      'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





                      share|improve this answer
























                        up vote
                        4
                        down vote













                        Another approach and to make it simpler.



                        grep -v '^scaffold' <(awk -v RS='>Feature ' 
                        'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





                        share|improve this answer






















                          up vote
                          4
                          down vote










                          up vote
                          4
                          down vote









                          Another approach and to make it simpler.



                          grep -v '^scaffold' <(awk -v RS='>Feature ' 
                          'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])





                          share|improve this answer












                          Another approach and to make it simpler.



                          grep -v '^scaffold' <(awk -v RS='>Feature ' 
                          'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Jan 4 at 9:46









                          αғsнιη

                          15.3k92462




                          15.3k92462






















                               

                              draft saved


                              draft discarded


























                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414718%2fhow-can-i-merge-the-lines-of-two-files-by-having-common-headers%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              CkAw,EOkXjA4J OckltT,i9SpS3WQIg3ngEGQHaWk33Ptix5Vjh9tii,YbnnBtIHkLM mcvi,NKzFtDDM weBX9ir3OMVXpvqO PlGce,ZK
                              nNBlzQ4Xyd

                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              How many registers does an x86_64 CPU actually have?

                              Displaying single band from multi-band raster using QGIS