searching multiple files for a line with bigger number in column 3 of matched lines

up vote
1
down vote

favorite

I have multiple files with contents similar to:

main file1:

test01:6733:4370:5342
test02:7776:2018:1001
test03:9865:5632:1429
test04:8477:4757:1890
test05:8019:8860:5298
test06:5602:3100:6995
test07:1445:2850:2755
test08:10924:2562:4867
test09:2575:1884:1611

sample file2:

test01:8777:1060:9236
test02:1322:1211:10837
test04:3737:10175:5219
test05:8467:8988:9739
test06:7452:3100:2709
test08:4707:9047:10578
test09:8669:2867:8233
test10:8615:10002:7056

sample file3:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:8477:10106:10069
test05:10769:10381:1102
test06:3605:3713:7695
test08:10924:2562:10568
test09:2913:5628:1305
test10:5501:10293:2319

I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.

Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).

When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.

here is my not optimal solution

$ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
file3:test01:10957:8172:2472
file3:test02:1401:6160:5894
file3:test03:7245:8934:5725
file2:test04:3737:10175:5219
file3:test05:10769:10381:1102
file3:test06:3605:3713:7695
main:test07:1445:2850:2755
file2:test08:4707:9047:10578
file3:test09:2913:5628:1305

how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?

Update:
@RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:

test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated

Update:
I have new case, which I did not predict before, which is with the other files having bigger value in $3 but also non-digit in column $2 - in such case such line (although $3 bigger) should be ignored becasue of wrong values in $2.

To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:2913:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated <- this is now update from file3

next, I changed $2 value on "test09" line in file3 to non-digits too:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:zzzzz:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2575:1884:1611 <-- this is now from the main file

Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?

{ if (($1 in a) && (a[$1] > $3))

edited Apr 8 at 17:47

asked Mar 24 at 19:04

DonJ

768

add a commentÂ |Â

up vote
1
down vote

favorite

I have multiple files with contents similar to:

main file1:

test01:6733:4370:5342
test02:7776:2018:1001
test03:9865:5632:1429
test04:8477:4757:1890
test05:8019:8860:5298
test06:5602:3100:6995
test07:1445:2850:2755
test08:10924:2562:4867
test09:2575:1884:1611

sample file2:

test01:8777:1060:9236
test02:1322:1211:10837
test04:3737:10175:5219
test05:8467:8988:9739
test06:7452:3100:2709
test08:4707:9047:10578
test09:8669:2867:8233
test10:8615:10002:7056

sample file3:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:8477:10106:10069
test05:10769:10381:1102
test06:3605:3713:7695
test08:10924:2562:10568
test09:2913:5628:1305
test10:5501:10293:2319

I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.

Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).

When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.

here is my not optimal solution

$ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
file3:test01:10957:8172:2472
file3:test02:1401:6160:5894
file3:test03:7245:8934:5725
file2:test04:3737:10175:5219
file3:test05:10769:10381:1102
file3:test06:3605:3713:7695
main:test07:1445:2850:2755
file2:test08:4707:9047:10578
file3:test09:2913:5628:1305

how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?

Update:
@RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:

test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated

To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:2913:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated <- this is now update from file3

next, I changed $2 value on "test09" line in file3 to non-digits too:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:zzzzz:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2575:1884:1611 <-- this is now from the main file

Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?

{ if (($1 in a) && (a[$1] > $3))

edited Apr 8 at 17:47

asked Mar 24 at 19:04

DonJ

768

add a commentÂ |Â

up vote
1
down vote

favorite

I have multiple files with contents similar to:

main file1:

test01:6733:4370:5342
test02:7776:2018:1001
test03:9865:5632:1429
test04:8477:4757:1890
test05:8019:8860:5298
test06:5602:3100:6995
test07:1445:2850:2755
test08:10924:2562:4867
test09:2575:1884:1611

sample file2:

test01:8777:1060:9236
test02:1322:1211:10837
test04:3737:10175:5219
test05:8467:8988:9739
test06:7452:3100:2709
test08:4707:9047:10578
test09:8669:2867:8233
test10:8615:10002:7056

sample file3:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:8477:10106:10069
test05:10769:10381:1102
test06:3605:3713:7695
test08:10924:2562:10568
test09:2913:5628:1305
test10:5501:10293:2319

I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.

Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).

When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.

here is my not optimal solution

$ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
file3:test01:10957:8172:2472
file3:test02:1401:6160:5894
file3:test03:7245:8934:5725
file2:test04:3737:10175:5219
file3:test05:10769:10381:1102
file3:test06:3605:3713:7695
main:test07:1445:2850:2755
file2:test08:4707:9047:10578
file3:test09:2913:5628:1305

how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?

Update:
@RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:

test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated

To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:2913:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated <- this is now update from file3

next, I changed $2 value on "test09" line in file3 to non-digits too:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:zzzzz:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2575:1884:1611 <-- this is now from the main file

Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?

{ if (($1 in a) && (a[$1] > $3))

edited Apr 8 at 17:47

asked Mar 24 at 19:04

DonJ

768

I have multiple files with contents similar to:

main file1:

test01:6733:4370:5342
test02:7776:2018:1001
test03:9865:5632:1429
test04:8477:4757:1890
test05:8019:8860:5298
test06:5602:3100:6995
test07:1445:2850:2755
test08:10924:2562:4867
test09:2575:1884:1611

sample file2:

test01:8777:1060:9236
test02:1322:1211:10837
test04:3737:10175:5219
test05:8467:8988:9739
test06:7452:3100:2709
test08:4707:9047:10578
test09:8669:2867:8233
test10:8615:10002:7056

sample file3:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:8477:10106:10069
test05:10769:10381:1102
test06:3605:3713:7695
test08:10924:2562:10568
test09:2913:5628:1305
test10:5501:10293:2319

I want to update each line in the main file1 with a line from another file with the same first column and in 3rd column having the biggest number from all the files.

Only first columns in main file should be considered (test## which are existing in the other files but are not existing in the main file should be ignored).

When more lines are found in the other files (with bigger but the same number in 3rd column) any (one) of them can be taken to update the main file.

here is my not optimal solution

$ awk -F: 'print $1,$3' main|while read a b;do grep ^$a: main file*|sort -t":" -rnk4|awk -F: -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL'|head -1;done
file3:test01:10957:8172:2472
file3:test02:1401:6160:5894
file3:test03:7245:8934:5725
file2:test04:3737:10175:5219
file3:test05:10769:10381:1102
file3:test06:3605:3713:7695
main:test07:1445:2850:2755
file2:test08:4707:9047:10578
file3:test09:2913:5628:1305

how to process all such files in awk at once and do the job without while loops and many pipes which I have in my command?

Update:
@RomanPerekhrest, thank you for your awesome code, how to add yet :updated suffix to all lines which comes from the other files? I'd like to have something like:

test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated

To show this case, using above sample files, in "test09" line of file2 I replace second column with "xxxxx", and now I have:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:2913:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2913:5628:1305:updated <- this is now update from file3

next, I changed $2 value on "test09" line in file3 to non-digits too:

$ grep test09 *
file2:test09:xxxxx:2867:8233
file3:test09:zzzzz:5628:1305
main:test09:2575:1884:1611
$ awk -F':' 'FILENAME != "main" if ($2~/^[0-9]+/&&(!($1 in a) if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1] else print ' file* main
test01:10957:8172:2472:updated
test02:1401:6160:5894:updated
test03:7245:8934:5725:updated
test04:3737:10175:5219:updated
test05:10769:10381:1102:updated
test06:3605:3713:7695:updated
test07:1445:2850:2755
test08:4707:9047:10578:updated
test09:2575:1884:1611 <-- this is now from the main file

Although it seems to be working fine, could comeone please explain the second "if" in the code? Does it also need the condition for $2~/^[0-9]+/ too?

{ if (($1 in a) && (a[$1] > $3))

edited Apr 8 at 17:47

asked Mar 24 at 19:04

DonJ

768

edited Apr 8 at 17:47

asked Mar 24 at 19:04

DonJ

768

asked Mar 24 at 19:04

DonJ

768

asked Mar 24 at 19:04

DonJ

768

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

Optimized awk solution which is about 27 times faster:

awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 
 
 
 if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] 
 else print; 
 ' file* main

The output:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:3737:10175:5219
test05:10769:10381:1102
test06:3605:3713:7695
test07:1445:2850:2755
test08:4707:9047:10578
test09:2913:5628:1305

Execution Time comparison:

$ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

real 0m0.111s
user 0m0.004s
sys 0m0.012s

$ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

real 0m0.004s
user 0m0.000s
sys 0m0.000s

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
â€“Â DonJ
Mar 24 at 22:53

@DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
â€“Â RomanPerekhrest
Mar 25 at 6:36

So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
â€“Â glenn jackman
Apr 14 at 13:33

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f433302%2fsearching-multiple-files-for-a-line-with-bigger-number-in-column-3-of-matched-li%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

Optimized awk solution which is about 27 times faster:

awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 
 
 
 if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] 
 else print; 
 ' file* main

The output:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:3737:10175:5219
test05:10769:10381:1102
test06:3605:3713:7695
test07:1445:2850:2755
test08:4707:9047:10578
test09:2913:5628:1305

Execution Time comparison:

$ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

real 0m0.111s
user 0m0.004s
sys 0m0.012s

$ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

real 0m0.004s
user 0m0.000s
sys 0m0.000s

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
â€“Â DonJ
Mar 24 at 22:53

@DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
â€“Â RomanPerekhrest
Mar 25 at 6:36

So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
â€“Â glenn jackman
Apr 14 at 13:33

add a commentÂ |Â

up vote
3
down vote

accepted

Optimized awk solution which is about 27 times faster:

awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 
 
 
 if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] 
 else print; 
 ' file* main

The output:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:3737:10175:5219
test05:10769:10381:1102
test06:3605:3713:7695
test07:1445:2850:2755
test08:4707:9047:10578
test09:2913:5628:1305

Execution Time comparison:

$ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

real 0m0.111s
user 0m0.004s
sys 0m0.012s

$ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

real 0m0.004s
user 0m0.000s
sys 0m0.000s

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
â€“Â DonJ
Mar 24 at 22:53

@DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
â€“Â RomanPerekhrest
Mar 25 at 6:36

So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
â€“Â glenn jackman
Apr 14 at 13:33

add a commentÂ |Â

up vote
3
down vote

accepted

Optimized awk solution which is about 27 times faster:

awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 
 
 
 if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] 
 else print; 
 ' file* main

The output:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:3737:10175:5219
test05:10769:10381:1102
test06:3605:3713:7695
test07:1445:2850:2755
test08:4707:9047:10578
test09:2913:5628:1305

Execution Time comparison:

$ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

real 0m0.111s
user 0m0.004s
sys 0m0.012s

$ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

real 0m0.004s
user 0m0.000s
sys 0m0.000s

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

Optimized awk solution which is about 27 times faster:

awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1] = $3; b[$1] = $0 next; 
 
 
 if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] 
 else print; 
 ' file* main

The output:

test01:10957:8172:2472
test02:1401:6160:5894
test03:7245:8934:5725
test04:3737:10175:5219
test05:10769:10381:1102
test06:3605:3713:7695
test07:1445:2850:2755
test08:4707:9047:10578
test09:2913:5628:1305

Execution Time comparison:

$ time(awk -F: 'print $1,$3' main |while read a b; do grep ^$a: main file* | sort -t":" -rnk4 | awk -F':' -vb=$b 'if($4>b)print $0;next else print ($1=="main")? $0 : NULL' | head -1; done > /dev/null)

real 0m0.111s
user 0m0.004s
sys 0m0.012s

$ time(awk -F':' 'FILENAME != "main" $3 > a[$1]) a[$1]=$3; b[$1]=$0 next if (($1 in a) && (a[$1] > $3)) print b[$1]; delete b[$1] else print ' file* main > /dev/null)

real 0m0.004s
user 0m0.000s
sys 0m0.000s

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

answered Mar 24 at 20:25

RomanPerekhrest

22.4k12144

Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
â€“Â DonJ
Mar 24 at 22:53

@DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
â€“Â RomanPerekhrest
Mar 25 at 6:36

So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
â€“Â glenn jackman
Apr 14 at 13:33

add a commentÂ |Â

Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
â€“Â DonJ
Mar 24 at 22:53

@DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
â€“Â RomanPerekhrest
Mar 25 at 6:36

So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
â€“Â glenn jackman
Apr 14 at 13:33

Thank you, it is awesome. I forgot about one thing, cour it be possible to add to all newer lines, these which are from the other files, additional column at the end, eg. :updated?
â€“Â DonJ
Mar 24 at 22:53

@DonJ, welcome. As for additional suffix :updated - change the 2nd if condition to the following: if (($1 in a) && (a[$1] > $3)) print b[$1]":updated"; delete b[$1]
â€“Â RomanPerekhrest
Mar 25 at 6:36

So you don't have to hardcode "main" inside awk is to take the last filename: BEGIN last_file = ARGV[ARGC-1] FILENAME != last_file ...
â€“Â glenn jackman
Apr 14 at 13:33

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu