searching multiple files for a line with bigger number in column 3 of matched lines

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I have multiple files with contents similar to:



main file1:



test01:6733:4370:5342
test02:7776:2018:1001
test03:9865:5632:1429
test04:8477:4757:1890
test05:8019:8860:5298
test06:5602:3100:6995
test07:1445:2850:2755
test08:10924:2562:4867
test09:2575:1884:1611


sample file2:



test01:8777:1060:9236
test02:1322:1211:10837
test04:3737:10175:5219
test05:8467:8988:9739
test06:7452:3100:2709
test08:4707:9047:10578
test09:8669:2867:8233
test10:8615:10002:7056


sample file3:



test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:8477:10106:10069
test05:10769:10381:1102
test06:3605:3713:7695
test08:10924:2562:10568
test09:2913:5628:1305
test10:5501:10293:2319


I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.



Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).



When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.



here is my not optimal solution



$ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
file3:test01:10957:8172:2472
file3:test02:1401:6160:5894
file3:test03:7245:8934:5725
file2:test04:3737:10175:5219
file3:test05:10769:10381:1102
file3:test06:3605:3713:7695
main:test07:1445:2850:2755
file2:test08:4707:9047:10578
file3:test09:2913:5628:1305


how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?



Update:
@RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:



test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated


Update:
I have new case, which I did not predict before, which is with the other files having bigger value in $3 but also non-digit in column $2 - in such case such line (although $3 bigger) should be ignored becasue of wrong values in $2.



To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:



$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:2913:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated <- this is now update from file3


next, I changed $2 value on "test09" line in file3 to non-digits too:



$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:zzzzz:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2575:1884:1611 <-- this is now from the main file


Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?



{ if (($1 in a) && (a[$1] > $3))






share|improve this question


























    up vote
    1
    down vote

    favorite












    I have multiple files with contents similar to:



    main file1:



    test01:6733:4370:5342
    test02:7776:2018:1001
    test03:9865:5632:1429
    test04:8477:4757:1890
    test05:8019:8860:5298
    test06:5602:3100:6995
    test07:1445:2850:2755
    test08:10924:2562:4867
    test09:2575:1884:1611


    sample file2:



    test01:8777:1060:9236
    test02:1322:1211:10837
    test04:3737:10175:5219
    test05:8467:8988:9739
    test06:7452:3100:2709
    test08:4707:9047:10578
    test09:8669:2867:8233
    test10:8615:10002:7056


    sample file3:



    test01:10957:8172:2472
    test02:1401:6160:5894
    test03:7245:8934:5725
    test04:8477:10106:10069
    test05:10769:10381:1102
    test06:3605:3713:7695
    test08:10924:2562:10568
    test09:2913:5628:1305
    test10:5501:10293:2319


    I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.



    Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).



    When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.



    here is my not optimal solution



    $ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
    file3:test01:10957:8172:2472
    file3:test02:1401:6160:5894
    file3:test03:7245:8934:5725
    file2:test04:3737:10175:5219
    file3:test05:10769:10381:1102
    file3:test06:3605:3713:7695
    main:test07:1445:2850:2755
    file2:test08:4707:9047:10578
    file3:test09:2913:5628:1305


    how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?



    Update:
    @RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:



    test01:10957:8172:2472:updated
    test02:1401:6160:5894:updated
    test03:7245:8934:5725:updated
    test04:3737:10175:5219:updated
    test05:10769:10381:1102:updated
    test06:3605:3713:7695:updated
    test07:1445:2850:2755
    test08:4707:9047:10578:updated
    test09:2913:5628:1305:updated


    Update:
    I have new case, which I did not predict before, which is with the other files having bigger value in $3 but also non-digit in column $2 - in such case such line (although $3 bigger) should be ignored becasue of wrong values in $2.



    To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:



    $ grep test09 *
    file2:test09:xxxxx:2867:8233
    file3:test09:2913:5628:1305
    main:test09:2575:1884:1611
    $ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
    test01:10957:8172:2472:updated
    test02:1401:6160:5894:updated
    test03:7245:8934:5725:updated
    test04:3737:10175:5219:updated
    test05:10769:10381:1102:updated
    test06:3605:3713:7695:updated
    test07:1445:2850:2755
    test08:4707:9047:10578:updated
    test09:2913:5628:1305:updated <- this is now update from file3


    next, I changed $2 value on "test09" line in file3 to non-digits too:



    $ grep test09 *
    file2:test09:xxxxx:2867:8233
    file3:test09:zzzzz:5628:1305
    main:test09:2575:1884:1611
    $ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
    test01:10957:8172:2472:updated
    test02:1401:6160:5894:updated
    test03:7245:8934:5725:updated
    test04:3737:10175:5219:updated
    test05:10769:10381:1102:updated
    test06:3605:3713:7695:updated
    test07:1445:2850:2755
    test08:4707:9047:10578:updated
    test09:2575:1884:1611 <-- this is now from the main file


    Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?



    { if (($1 in a) && (a[$1] > $3))






    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have multiple files with contents similar to:



      main file1:



      test01:6733:4370:5342
      test02:7776:2018:1001
      test03:9865:5632:1429
      test04:8477:4757:1890
      test05:8019:8860:5298
      test06:5602:3100:6995
      test07:1445:2850:2755
      test08:10924:2562:4867
      test09:2575:1884:1611


      sample file2:



      test01:8777:1060:9236
      test02:1322:1211:10837
      test04:3737:10175:5219
      test05:8467:8988:9739
      test06:7452:3100:2709
      test08:4707:9047:10578
      test09:8669:2867:8233
      test10:8615:10002:7056


      sample file3:



      test01:10957:8172:2472
      test02:1401:6160:5894
      test03:7245:8934:5725
      test04:8477:10106:10069
      test05:10769:10381:1102
      test06:3605:3713:7695
      test08:10924:2562:10568
      test09:2913:5628:1305
      test10:5501:10293:2319


      I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.



      Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).



      When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.



      here is my not optimal solution



      $ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
      file3:test01:10957:8172:2472
      file3:test02:1401:6160:5894
      file3:test03:7245:8934:5725
      file2:test04:3737:10175:5219
      file3:test05:10769:10381:1102
      file3:test06:3605:3713:7695
      main:test07:1445:2850:2755
      file2:test08:4707:9047:10578
      file3:test09:2913:5628:1305


      how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?



      Update:
      @RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:



      test01:10957:8172:2472:updated
      test02:1401:6160:5894:updated
      test03:7245:8934:5725:updated
      test04:3737:10175:5219:updated
      test05:10769:10381:1102:updated
      test06:3605:3713:7695:updated
      test07:1445:2850:2755
      test08:4707:9047:10578:updated
      test09:2913:5628:1305:updated


      Update:
      I have new case, which I did not predict before, which is with the other files having bigger value in $3 but also non-digit in column $2 - in such case such line (although $3 bigger) should be ignored becasue of wrong values in $2.



      To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:



      $ grep test09 *
      file2:test09:xxxxx:2867:8233
      file3:test09:2913:5628:1305
      main:test09:2575:1884:1611
      $ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
      test01:10957:8172:2472:updated
      test02:1401:6160:5894:updated
      test03:7245:8934:5725:updated
      test04:3737:10175:5219:updated
      test05:10769:10381:1102:updated
      test06:3605:3713:7695:updated
      test07:1445:2850:2755
      test08:4707:9047:10578:updated
      test09:2913:5628:1305:updated <- this is now update from file3


      next, I changed $2 value on "test09" line in file3 to non-digits too:



      $ grep test09 *
      file2:test09:xxxxx:2867:8233
      file3:test09:zzzzz:5628:1305
      main:test09:2575:1884:1611
      $ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
      test01:10957:8172:2472:updated
      test02:1401:6160:5894:updated
      test03:7245:8934:5725:updated
      test04:3737:10175:5219:updated
      test05:10769:10381:1102:updated
      test06:3605:3713:7695:updated
      test07:1445:2850:2755
      test08:4707:9047:10578:updated
      test09:2575:1884:1611 <-- this is now from the main file


      Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?



      { if (($1 in a) && (a[$1] > $3))






      share|improve this question














      I have multiple files with contents similar to:



      main file1:



      test01:6733:4370:5342
      test02:7776:2018:1001
      test03:9865:5632:1429
      test04:8477:4757:1890
      test05:8019:8860:5298
      test06:5602:3100:6995
      test07:1445:2850:2755
      test08:10924:2562:4867
      test09:2575:1884:1611


      sample file2:



      test01:8777:1060:9236
      test02:1322:1211:10837
      test04:3737:10175:5219
      test05:8467:8988:9739
      test06:7452:3100:2709
      test08:4707:9047:10578
      test09:8669:2867:8233
      test10:8615:10002:7056


      sample file3:



      test01:10957:8172:2472
      test02:1401:6160:5894
      test03:7245:8934:5725
      test04:8477:10106:10069
      test05:10769:10381:1102
      test06:3605:3713:7695
      test08:10924:2562:10568
      test09:2913:5628:1305
      test10:5501:10293:2319


      I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.



      Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).



      When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.



      here is my not optimal solution



      $ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
      file3:test01:10957:8172:2472
      file3:test02:1401:6160:5894
      file3:test03:7245:8934:5725
      file2:test04:3737:10175:5219
      file3:test05:10769:10381:1102
      file3:test06:3605:3713:7695
      main:test07:1445:2850:2755
      file2:test08:4707:9047:10578
      file3:test09:2913:5628:1305


      how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?



      Update:
      @RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:



      test01:10957:8172:2472:updated
      test02:1401:6160:5894:updated
      test03:7245:8934:5725:updated
      test04:3737:10175:5219:updated
      test05:10769:10381:1102:updated
      test06:3605:3713:7695:updated
      test07:1445:2850:2755
      test08:4707:9047:10578:updated
      test09:2913:5628:1305:updated


      Update:
      I have new case, which I did not predict before, which is with the other files having bigger value in $3 but also non-digit in column $2 - in such case such line (although $3 bigger) should be ignored becasue of wrong values in $2.



      To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:



      $ grep test09 *
      file2:test09:xxxxx:2867:8233
      file3:test09:2913:5628:1305
      main:test09:2575:1884:1611
      $ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
      test01:10957:8172:2472:updated
      test02:1401:6160:5894:updated
      test03:7245:8934:5725:updated
      test04:3737:10175:5219:updated
      test05:10769:10381:1102:updated
      test06:3605:3713:7695:updated
      test07:1445:2850:2755
      test08:4707:9047:10578:updated
      test09:2913:5628:1305:updated <- this is now update from file3


      next, I changed $2 value on "test09" line in file3 to non-digits too:



      $ grep test09 *
      file2:test09:xxxxx:2867:8233
      file3:test09:zzzzz:5628:1305
      main:test09:2575:1884:1611
      $ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
      test01:10957:8172:2472:updated
      test02:1401:6160:5894:updated
      test03:7245:8934:5725:updated
      test04:3737:10175:5219:updated
      test05:10769:10381:1102:updated
      test06:3605:3713:7695:updated
      test07:1445:2850:2755
      test08:4707:9047:10578:updated
      test09:2575:1884:1611 <-- this is now from the main file


      Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?



      { if (($1 in a) && (a[$1] > $3))








      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 8 at 17:47

























      asked Mar 24 at 19:04









      DonJ

      768




      768




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Optimized awk solution which is about 27 times faster:



          awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 


          if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1]
          else print;
          ' file* main


          The output:



          test01:10957:8172:2472
          test02:1401:6160:5894
          test03:7245:8934:5725
          test04:3737:10175:5219
          test05:10769:10381:1102
          test06:3605:3713:7695
          test07:1445:2850:2755
          test08:4707:9047:10578
          test09:2913:5628:1305



          Execution Time comparison:



          $ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

          real 0m0.111s
          user 0m0.004s
          sys 0m0.012s

          $ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

          real 0m0.004s
          user 0m0.000s
          sys 0m0.000s





          share|improve this answer




















          • Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
            – DonJ
            Mar 24 at 22:53










          • @DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
            – RomanPerekhrest
            Mar 25 at 6:36











          • So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
            – glenn jackman
            Apr 14 at 13:33










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f433302%2fsearching-multiple-files-for-a-line-with-bigger-number-in-column-3-of-matched-li%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          Optimized awk solution which is about 27 times faster:



          awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 


          if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1]
          else print;
          ' file* main


          The output:



          test01:10957:8172:2472
          test02:1401:6160:5894
          test03:7245:8934:5725
          test04:3737:10175:5219
          test05:10769:10381:1102
          test06:3605:3713:7695
          test07:1445:2850:2755
          test08:4707:9047:10578
          test09:2913:5628:1305



          Execution Time comparison:



          $ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

          real 0m0.111s
          user 0m0.004s
          sys 0m0.012s

          $ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

          real 0m0.004s
          user 0m0.000s
          sys 0m0.000s





          share|improve this answer




















          • Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
            – DonJ
            Mar 24 at 22:53










          • @DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
            – RomanPerekhrest
            Mar 25 at 6:36











          • So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
            – glenn jackman
            Apr 14 at 13:33














          up vote
          3
          down vote



          accepted










          Optimized awk solution which is about 27 times faster:



          awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 


          if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1]
          else print;
          ' file* main


          The output:



          test01:10957:8172:2472
          test02:1401:6160:5894
          test03:7245:8934:5725
          test04:3737:10175:5219
          test05:10769:10381:1102
          test06:3605:3713:7695
          test07:1445:2850:2755
          test08:4707:9047:10578
          test09:2913:5628:1305



          Execution Time comparison:



          $ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

          real 0m0.111s
          user 0m0.004s
          sys 0m0.012s

          $ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

          real 0m0.004s
          user 0m0.000s
          sys 0m0.000s





          share|improve this answer




















          • Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
            – DonJ
            Mar 24 at 22:53










          • @DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
            – RomanPerekhrest
            Mar 25 at 6:36











          • So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
            – glenn jackman
            Apr 14 at 13:33












          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          Optimized awk solution which is about 27 times faster:



          awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 


          if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1]
          else print;
          ' file* main


          The output:



          test01:10957:8172:2472
          test02:1401:6160:5894
          test03:7245:8934:5725
          test04:3737:10175:5219
          test05:10769:10381:1102
          test06:3605:3713:7695
          test07:1445:2850:2755
          test08:4707:9047:10578
          test09:2913:5628:1305



          Execution Time comparison:



          $ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

          real 0m0.111s
          user 0m0.004s
          sys 0m0.012s

          $ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

          real 0m0.004s
          user 0m0.000s
          sys 0m0.000s





          share|improve this answer












          Optimized awk solution which is about 27 times faster:



          awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 


          if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1]
          else print;
          ' file* main


          The output:



          test01:10957:8172:2472
          test02:1401:6160:5894
          test03:7245:8934:5725
          test04:3737:10175:5219
          test05:10769:10381:1102
          test06:3605:3713:7695
          test07:1445:2850:2755
          test08:4707:9047:10578
          test09:2913:5628:1305



          Execution Time comparison:



          $ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

          real 0m0.111s
          user 0m0.004s
          sys 0m0.012s

          $ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

          real 0m0.004s
          user 0m0.000s
          sys 0m0.000s






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 24 at 20:25









          RomanPerekhrest

          22.4k12144




          22.4k12144











          • Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
            – DonJ
            Mar 24 at 22:53










          • @DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
            – RomanPerekhrest
            Mar 25 at 6:36











          • So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
            – glenn jackman
            Apr 14 at 13:33
















          • Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
            – DonJ
            Mar 24 at 22:53










          • @DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
            – RomanPerekhrest
            Mar 25 at 6:36











          • So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
            – glenn jackman
            Apr 14 at 13:33















          Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
          – DonJ
          Mar 24 at 22:53




          Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
          – DonJ
          Mar 24 at 22:53












          @DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
          – RomanPerekhrest
          Mar 25 at 6:36





          @DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
          – RomanPerekhrest
          Mar 25 at 6:36













          So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
          – glenn jackman
          Apr 14 at 13:33




          So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
          – glenn jackman
          Apr 14 at 13:33












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f433302%2fsearching-multiple-files-for-a-line-with-bigger-number-in-column-3-of-matched-li%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?