Matching two files and printing lines that appear first time
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
I have two files that look like this:
file1 (unique IDs):
C84610112
C96209347
C84774620
C84774691
C85594749
C89372772
C89651687
C89845500
C89914896
C91269765
C91526663
C92210411
C92254517
C93709504
C94303303
C95100561
C95100609
C95417520
C95696352
C96045246
C96045496
C96060727
C96076986
and file2:
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
28 C70006381 score: -49.092 nathvy = 24 nconfs = 880
I would like to match IDs from file1
with IDs in file2
(second column) and for those that are matching to print them. Also, in file2
some IDs are repeating, such as C96209347
(although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347
only third line from file2
should be printed. Anybody can help?
command-line text-processing
add a comment |Â
up vote
4
down vote
favorite
I have two files that look like this:
file1 (unique IDs):
C84610112
C96209347
C84774620
C84774691
C85594749
C89372772
C89651687
C89845500
C89914896
C91269765
C91526663
C92210411
C92254517
C93709504
C94303303
C95100561
C95100609
C95417520
C95696352
C96045246
C96045496
C96060727
C96076986
and file2:
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
28 C70006381 score: -49.092 nathvy = 24 nconfs = 880
I would like to match IDs from file1
with IDs in file2
(second column) and for those that are matching to print them. Also, in file2
some IDs are repeating, such as C96209347
(although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347
only third line from file2
should be printed. Anybody can help?
command-line text-processing
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I have two files that look like this:
file1 (unique IDs):
C84610112
C96209347
C84774620
C84774691
C85594749
C89372772
C89651687
C89845500
C89914896
C91269765
C91526663
C92210411
C92254517
C93709504
C94303303
C95100561
C95100609
C95417520
C95696352
C96045246
C96045496
C96060727
C96076986
and file2:
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
28 C70006381 score: -49.092 nathvy = 24 nconfs = 880
I would like to match IDs from file1
with IDs in file2
(second column) and for those that are matching to print them. Also, in file2
some IDs are repeating, such as C96209347
(although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347
only third line from file2
should be printed. Anybody can help?
command-line text-processing
I have two files that look like this:
file1 (unique IDs):
C84610112
C96209347
C84774620
C84774691
C85594749
C89372772
C89651687
C89845500
C89914896
C91269765
C91526663
C92210411
C92254517
C93709504
C94303303
C95100561
C95100609
C95417520
C95696352
C96045246
C96045496
C96060727
C96076986
and file2:
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
28 C70006381 score: -49.092 nathvy = 24 nconfs = 880
I would like to match IDs from file1
with IDs in file2
(second column) and for those that are matching to print them. Also, in file2
some IDs are repeating, such as C96209347
(although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347
only third line from file2
should be printed. Anybody can help?
command-line text-processing
command-line text-processing
edited Aug 31 at 7:45
pa4080
12.3k52256
12.3k52256
asked Aug 31 at 7:25
sergio
786
786
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
9
down vote
accepted
Try this,
grep -f file1 file2 | awk '!_[$2]++'
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
Explanation
grep -f file1 file2
: search in file2 for matches of patterns obtained from file1awk '!_[$2]++'
: Don't print anything if field$2
has been seen before (via)_
is the array name (can be anything, e.g. "seen")_[$2]++
will create an array entry with the key being the content of field$2
and add 1- If
_[$2]
was not (!
) already set, print the line. Theprint
command is the default action that is made by awk when the condition matches.
1
This works. Thank you very much! All the best
â sergio
Aug 31 at 7:45
Wow, nice and simple solution.
â abu_bua
yesterday
add a comment |Â
up vote
1
down vote
With awk alone:
$ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
9
down vote
accepted
Try this,
grep -f file1 file2 | awk '!_[$2]++'
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
Explanation
grep -f file1 file2
: search in file2 for matches of patterns obtained from file1awk '!_[$2]++'
: Don't print anything if field$2
has been seen before (via)_
is the array name (can be anything, e.g. "seen")_[$2]++
will create an array entry with the key being the content of field$2
and add 1- If
_[$2]
was not (!
) already set, print the line. Theprint
command is the default action that is made by awk when the condition matches.
1
This works. Thank you very much! All the best
â sergio
Aug 31 at 7:45
Wow, nice and simple solution.
â abu_bua
yesterday
add a comment |Â
up vote
9
down vote
accepted
Try this,
grep -f file1 file2 | awk '!_[$2]++'
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
Explanation
grep -f file1 file2
: search in file2 for matches of patterns obtained from file1awk '!_[$2]++'
: Don't print anything if field$2
has been seen before (via)_
is the array name (can be anything, e.g. "seen")_[$2]++
will create an array entry with the key being the content of field$2
and add 1- If
_[$2]
was not (!
) already set, print the line. Theprint
command is the default action that is made by awk when the condition matches.
1
This works. Thank you very much! All the best
â sergio
Aug 31 at 7:45
Wow, nice and simple solution.
â abu_bua
yesterday
add a comment |Â
up vote
9
down vote
accepted
up vote
9
down vote
accepted
Try this,
grep -f file1 file2 | awk '!_[$2]++'
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
Explanation
grep -f file1 file2
: search in file2 for matches of patterns obtained from file1awk '!_[$2]++'
: Don't print anything if field$2
has been seen before (via)_
is the array name (can be anything, e.g. "seen")_[$2]++
will create an array entry with the key being the content of field$2
and add 1- If
_[$2]
was not (!
) already set, print the line. Theprint
command is the default action that is made by awk when the condition matches.
Try this,
grep -f file1 file2 | awk '!_[$2]++'
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
Explanation
grep -f file1 file2
: search in file2 for matches of patterns obtained from file1awk '!_[$2]++'
: Don't print anything if field$2
has been seen before (via)_
is the array name (can be anything, e.g. "seen")_[$2]++
will create an array entry with the key being the content of field$2
and add 1- If
_[$2]
was not (!
) already set, print the line. Theprint
command is the default action that is made by awk when the condition matches.
edited Aug 31 at 7:52
answered Aug 31 at 7:40
RoVo
5,6441237
5,6441237
1
This works. Thank you very much! All the best
â sergio
Aug 31 at 7:45
Wow, nice and simple solution.
â abu_bua
yesterday
add a comment |Â
1
This works. Thank you very much! All the best
â sergio
Aug 31 at 7:45
Wow, nice and simple solution.
â abu_bua
yesterday
1
1
This works. Thank you very much! All the best
â sergio
Aug 31 at 7:45
This works. Thank you very much! All the best
â sergio
Aug 31 at 7:45
Wow, nice and simple solution.
â abu_bua
yesterday
Wow, nice and simple solution.
â abu_bua
yesterday
add a comment |Â
up vote
1
down vote
With awk alone:
$ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
add a comment |Â
up vote
1
down vote
With awk alone:
$ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
add a comment |Â
up vote
1
down vote
up vote
1
down vote
With awk alone:
$ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
With awk alone:
$ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
answered Aug 31 at 10:57
steeldriver
63.3k1198167
63.3k1198167
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1070748%2fmatching-two-files-and-printing-lines-that-appear-first-time%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password