how to copy a column from another file when the ID's matches

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have 2 files,



head file1

1 115258827 12 HG00099
1 115258827 5 HG00100
1 115258827 8 HG00101
1 115258827 6 HG00103
1 115258827 4 HG00108
1 115258827 3 HG00110
1 115258827 4 HG00111
1 115258827 2 HG00114
1 115258827 8 HG00115
1 115258827 5 HG00116


and another file



head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0


I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like



head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0






share|improve this question























    up vote
    0
    down vote

    favorite












    I have 2 files,



    head file1

    1 115258827 12 HG00099
    1 115258827 5 HG00100
    1 115258827 8 HG00101
    1 115258827 6 HG00103
    1 115258827 4 HG00108
    1 115258827 3 HG00110
    1 115258827 4 HG00111
    1 115258827 2 HG00114
    1 115258827 8 HG00115
    1 115258827 5 HG00116


    and another file



    head file2
    HG00096 0|0
    HG00097 0|0
    HG00099 0|0
    HG00100 0|1
    HG00101 0|0
    HG00102 0|0
    HG00103 0|0
    HG00105 0|0
    HG00106 0|0
    HG00107 0|0


    I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like



    head desired
    1 115258827 12 HG00099 0|0
    1 115258827 5 HG00100 0|1
    1 115258827 8 HG00101 0|0
    1 115258827 6 HG00103 0|0






    share|improve this question





















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have 2 files,



      head file1

      1 115258827 12 HG00099
      1 115258827 5 HG00100
      1 115258827 8 HG00101
      1 115258827 6 HG00103
      1 115258827 4 HG00108
      1 115258827 3 HG00110
      1 115258827 4 HG00111
      1 115258827 2 HG00114
      1 115258827 8 HG00115
      1 115258827 5 HG00116


      and another file



      head file2
      HG00096 0|0
      HG00097 0|0
      HG00099 0|0
      HG00100 0|1
      HG00101 0|0
      HG00102 0|0
      HG00103 0|0
      HG00105 0|0
      HG00106 0|0
      HG00107 0|0


      I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like



      head desired
      1 115258827 12 HG00099 0|0
      1 115258827 5 HG00100 0|1
      1 115258827 8 HG00101 0|0
      1 115258827 6 HG00103 0|0






      share|improve this question











      I have 2 files,



      head file1

      1 115258827 12 HG00099
      1 115258827 5 HG00100
      1 115258827 8 HG00101
      1 115258827 6 HG00103
      1 115258827 4 HG00108
      1 115258827 3 HG00110
      1 115258827 4 HG00111
      1 115258827 2 HG00114
      1 115258827 8 HG00115
      1 115258827 5 HG00116


      and another file



      head file2
      HG00096 0|0
      HG00097 0|0
      HG00099 0|0
      HG00100 0|1
      HG00101 0|0
      HG00102 0|0
      HG00103 0|0
      HG00105 0|0
      HG00106 0|0
      HG00107 0|0


      I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like



      head desired
      1 115258827 12 HG00099 0|0
      1 115258827 5 HG00100 0|1
      1 115258827 8 HG00101 0|0
      1 115258827 6 HG00103 0|0








      share|improve this question










      share|improve this question




      share|improve this question









      asked May 3 at 19:11









      Anna1364

      421110




      421110




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          Try:



          awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1


          With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR (always true for first input file when there are multiple input files to read), NR will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.



          The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.






          share|improve this answer






























            up vote
            4
            down vote













            Assuming that the field that you want to merge/join the data on is sorted:



            $ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
            1 115258827 12 HG00099 0|0
            1 115258827 5 HG00100 0|1
            1 115258827 8 HG00101 0|0
            1 115258827 6 HG00103 0|0


            This joins the two files on the fourth column of file1 (specified using -1 4, since it's not the first column in that file) and the first column of file2 (these are the columns that the files needs to be sorted on).



            The -o flag tells join what fields we'd like to see in the output, and from what file they are to be taken (1.3 means "the third field from the first file", for example, and 0 means the join field).



            The benefit of using join is that neither file has to be read into memory completely.




            If the files are not sorted, you may either pre-sort them once and for all using



            sort -k4 -o file1 file1
            sort -o file2 file2


            or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using



            join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )


            It's the sorted data that allows join to only keep a few lines of each file in memory at a time.






            share|improve this answer























              Your Answer







              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );








               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f441638%2fhow-to-copy-a-column-from-another-file-when-the-ids-matches%23new-answer', 'question_page');

              );

              Post as a guest






























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              2
              down vote



              accepted










              Try:



              awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1


              With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR (always true for first input file when there are multiple input files to read), NR will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.



              The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.






              share|improve this answer



























                up vote
                2
                down vote



                accepted










                Try:



                awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1


                With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR (always true for first input file when there are multiple input files to read), NR will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.



                The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.






                share|improve this answer

























                  up vote
                  2
                  down vote



                  accepted







                  up vote
                  2
                  down vote



                  accepted






                  Try:



                  awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1


                  With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR (always true for first input file when there are multiple input files to read), NR will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.



                  The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.






                  share|improve this answer















                  Try:



                  awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1


                  With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR (always true for first input file when there are multiple input files to read), NR will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.



                  The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.







                  share|improve this answer















                  share|improve this answer



                  share|improve this answer








                  edited May 3 at 19:32


























                  answered May 3 at 19:16









                  αғsнιη

                  14.8k82462




                  14.8k82462






















                      up vote
                      4
                      down vote













                      Assuming that the field that you want to merge/join the data on is sorted:



                      $ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
                      1 115258827 12 HG00099 0|0
                      1 115258827 5 HG00100 0|1
                      1 115258827 8 HG00101 0|0
                      1 115258827 6 HG00103 0|0


                      This joins the two files on the fourth column of file1 (specified using -1 4, since it's not the first column in that file) and the first column of file2 (these are the columns that the files needs to be sorted on).



                      The -o flag tells join what fields we'd like to see in the output, and from what file they are to be taken (1.3 means "the third field from the first file", for example, and 0 means the join field).



                      The benefit of using join is that neither file has to be read into memory completely.




                      If the files are not sorted, you may either pre-sort them once and for all using



                      sort -k4 -o file1 file1
                      sort -o file2 file2


                      or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using



                      join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )


                      It's the sorted data that allows join to only keep a few lines of each file in memory at a time.






                      share|improve this answer



























                        up vote
                        4
                        down vote













                        Assuming that the field that you want to merge/join the data on is sorted:



                        $ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
                        1 115258827 12 HG00099 0|0
                        1 115258827 5 HG00100 0|1
                        1 115258827 8 HG00101 0|0
                        1 115258827 6 HG00103 0|0


                        This joins the two files on the fourth column of file1 (specified using -1 4, since it's not the first column in that file) and the first column of file2 (these are the columns that the files needs to be sorted on).



                        The -o flag tells join what fields we'd like to see in the output, and from what file they are to be taken (1.3 means "the third field from the first file", for example, and 0 means the join field).



                        The benefit of using join is that neither file has to be read into memory completely.




                        If the files are not sorted, you may either pre-sort them once and for all using



                        sort -k4 -o file1 file1
                        sort -o file2 file2


                        or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using



                        join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )


                        It's the sorted data that allows join to only keep a few lines of each file in memory at a time.






                        share|improve this answer

























                          up vote
                          4
                          down vote










                          up vote
                          4
                          down vote









                          Assuming that the field that you want to merge/join the data on is sorted:



                          $ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
                          1 115258827 12 HG00099 0|0
                          1 115258827 5 HG00100 0|1
                          1 115258827 8 HG00101 0|0
                          1 115258827 6 HG00103 0|0


                          This joins the two files on the fourth column of file1 (specified using -1 4, since it's not the first column in that file) and the first column of file2 (these are the columns that the files needs to be sorted on).



                          The -o flag tells join what fields we'd like to see in the output, and from what file they are to be taken (1.3 means "the third field from the first file", for example, and 0 means the join field).



                          The benefit of using join is that neither file has to be read into memory completely.




                          If the files are not sorted, you may either pre-sort them once and for all using



                          sort -k4 -o file1 file1
                          sort -o file2 file2


                          or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using



                          join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )


                          It's the sorted data that allows join to only keep a few lines of each file in memory at a time.






                          share|improve this answer















                          Assuming that the field that you want to merge/join the data on is sorted:



                          $ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
                          1 115258827 12 HG00099 0|0
                          1 115258827 5 HG00100 0|1
                          1 115258827 8 HG00101 0|0
                          1 115258827 6 HG00103 0|0


                          This joins the two files on the fourth column of file1 (specified using -1 4, since it's not the first column in that file) and the first column of file2 (these are the columns that the files needs to be sorted on).



                          The -o flag tells join what fields we'd like to see in the output, and from what file they are to be taken (1.3 means "the third field from the first file", for example, and 0 means the join field).



                          The benefit of using join is that neither file has to be read into memory completely.




                          If the files are not sorted, you may either pre-sort them once and for all using



                          sort -k4 -o file1 file1
                          sort -o file2 file2


                          or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using



                          join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )


                          It's the sorted data that allows join to only keep a few lines of each file in memory at a time.







                          share|improve this answer















                          share|improve this answer



                          share|improve this answer








                          edited May 3 at 19:42


























                          answered May 3 at 19:31









                          Kusalananda

                          102k13199316




                          102k13199316






















                               

                              draft saved


                              draft discarded


























                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f441638%2fhow-to-copy-a-column-from-another-file-when-the-ids-matches%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Displaying single band from multi-band raster using QGIS

                              How many registers does an x86_64 CPU actually have?