AWK script to compare content of 2 files
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have 2 files ..
file1:
abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator
file2:
check
map
equator
globe
AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content
- should return 1 if ALL lines are matching
- else should return 2
text-processing awk
add a comment |
up vote
1
down vote
favorite
I have 2 files ..
file1:
abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator
file2:
check
map
equator
globe
AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content
- should return 1 if ALL lines are matching
- else should return 2
text-processing awk
2
I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36
1
I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46
Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10
1
@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have 2 files ..
file1:
abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator
file2:
check
map
equator
globe
AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content
- should return 1 if ALL lines are matching
- else should return 2
text-processing awk
I have 2 files ..
file1:
abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator
file2:
check
map
equator
globe
AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content
- should return 1 if ALL lines are matching
- else should return 2
text-processing awk
text-processing awk
edited Nov 20 at 22:30
Rui F Ribeiro
38.2k1475125
38.2k1475125
asked May 6 '15 at 1:32
Nandini
61
61
2
I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36
1
I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46
Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10
1
@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21
add a comment |
2
I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36
1
I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46
Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10
1
@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21
2
2
I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36
I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36
1
1
I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46
I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46
Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10
Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10
1
1
@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21
@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
function are_all_there
local num_diff=$(comm -3 <(cut -d'
add a comment |
up vote
0
down vote
Based on your comments, it seems that awk
is not your only option. so here is a non-awk method.
You don't mention the need for unique comparison in the question, but you have used uniq
in the example in your comment. If you don't need a unique match, just remove sort's -u
option. (tested in bash
)
.
(($(comm -3 <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
echo 1 - all match
Or, using awk
for the final comparison - with a bit of help from paste
.
paste <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
awk '$1!=$2m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
Or, awk
comparing two input files
awk 'if(NR == FNR)a[NR]=$1
else if($1 != a[NR])m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
<( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
add a comment |
up vote
0
down vote
Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.
The input files are representation of a set in which the elements are pairs. For instance if the line foo
occurs 3 times in file1
, that represents the element <foo
, 3>. If file2
contains foo
3 times, that means both sets contain this element. If file2
doesn't contain foo
or contains a different number of repetitions of foo
, then it represents a set which does not contain <foo
, 3>.
Furthermore, note that a set of pairs like <foo
, 3> can be represented by a hash which maps the key foo
to 3.
TXR Lisp awk macro:
(awk (:begin (set fs "|"))
(:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
((= arg 1) (inc [h1 [f 2] 0]))
((= arg 2) (inc [h2 rec 0]))
(:end (exit (equal h1 h2))))
This produces a successful termination status if the files are equal in the required way, otherwise a failed status:
$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1
If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end
rule:
(:end (prn (if (equal h1 h2) "1" "2")))
Here is how things look in regular awk. The main difference is that we have terse syntax in which we don't have to define any variable that we reference; on the other hand, we have to write a pair of loops to compare two associative arrays, and generate our own arg
variable to track which file we are processing. (GNU Awk has the ARGIND
for this purpose.)
BEGIN FS = "
FNR == 1 arg++
arg == 1 h1[$3]++;
arg == 2 h2[$0]++;
END same = 1
for (i in h1)
if (h1[i] != h2[i])
same = 0
break
if (same)
for (i in h2)
if (h2[i] != h1[i])
same = 0
break
print same ? "1" : "2";
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
function are_all_there
local num_diff=$(comm -3 <(cut -d'
add a comment |
up vote
1
down vote
function are_all_there
local num_diff=$(comm -3 <(cut -d'
add a comment |
up vote
1
down vote
up vote
1
down vote
function are_all_there
local num_diff=$(comm -3 <(cut -d'
function are_all_there
local num_diff=$(comm -3 <(cut -d'
answered May 6 '15 at 10:22
glenn jackman
49.5k469106
49.5k469106
add a comment |
add a comment |
up vote
0
down vote
Based on your comments, it seems that awk
is not your only option. so here is a non-awk method.
You don't mention the need for unique comparison in the question, but you have used uniq
in the example in your comment. If you don't need a unique match, just remove sort's -u
option. (tested in bash
)
.
(($(comm -3 <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
echo 1 - all match
Or, using awk
for the final comparison - with a bit of help from paste
.
paste <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
awk '$1!=$2m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
Or, awk
comparing two input files
awk 'if(NR == FNR)a[NR]=$1
else if($1 != a[NR])m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
<( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
add a comment |
up vote
0
down vote
Based on your comments, it seems that awk
is not your only option. so here is a non-awk method.
You don't mention the need for unique comparison in the question, but you have used uniq
in the example in your comment. If you don't need a unique match, just remove sort's -u
option. (tested in bash
)
.
(($(comm -3 <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
echo 1 - all match
Or, using awk
for the final comparison - with a bit of help from paste
.
paste <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
awk '$1!=$2m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
Or, awk
comparing two input files
awk 'if(NR == FNR)a[NR]=$1
else if($1 != a[NR])m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
<( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
add a comment |
up vote
0
down vote
up vote
0
down vote
Based on your comments, it seems that awk
is not your only option. so here is a non-awk method.
You don't mention the need for unique comparison in the question, but you have used uniq
in the example in your comment. If you don't need a unique match, just remove sort's -u
option. (tested in bash
)
.
(($(comm -3 <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
echo 1 - all match
Or, using awk
for the final comparison - with a bit of help from paste
.
paste <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
awk '$1!=$2m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
Or, awk
comparing two input files
awk 'if(NR == FNR)a[NR]=$1
else if($1 != a[NR])m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
<( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
Based on your comments, it seems that awk
is not your only option. so here is a non-awk method.
You don't mention the need for unique comparison in the question, but you have used uniq
in the example in your comment. If you don't need a unique match, just remove sort's -u
option. (tested in bash
)
.
(($(comm -3 <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
echo 1 - all match
Or, using awk
for the final comparison - with a bit of help from paste
.
paste <( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
awk '$1!=$2m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
Or, awk
comparing two input files
awk 'if(NR == FNR)a[NR]=$1
else if($1 != a[NR])m=2; exit
END if(m == 2)print "2 - not all match"; exit;
print "1 - all match";'
<( cut -d'|' -f3 file1 | sort -u )
<( sort -u file2 ) |
edited Apr 13 '17 at 12:36
Community♦
1
1
answered May 6 '15 at 9:38
Peter.O
18.7k1791143
18.7k1791143
add a comment |
add a comment |
up vote
0
down vote
Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.
The input files are representation of a set in which the elements are pairs. For instance if the line foo
occurs 3 times in file1
, that represents the element <foo
, 3>. If file2
contains foo
3 times, that means both sets contain this element. If file2
doesn't contain foo
or contains a different number of repetitions of foo
, then it represents a set which does not contain <foo
, 3>.
Furthermore, note that a set of pairs like <foo
, 3> can be represented by a hash which maps the key foo
to 3.
TXR Lisp awk macro:
(awk (:begin (set fs "|"))
(:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
((= arg 1) (inc [h1 [f 2] 0]))
((= arg 2) (inc [h2 rec 0]))
(:end (exit (equal h1 h2))))
This produces a successful termination status if the files are equal in the required way, otherwise a failed status:
$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1
If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end
rule:
(:end (prn (if (equal h1 h2) "1" "2")))
Here is how things look in regular awk. The main difference is that we have terse syntax in which we don't have to define any variable that we reference; on the other hand, we have to write a pair of loops to compare two associative arrays, and generate our own arg
variable to track which file we are processing. (GNU Awk has the ARGIND
for this purpose.)
BEGIN FS = "
FNR == 1 arg++
arg == 1 h1[$3]++;
arg == 2 h2[$0]++;
END same = 1
for (i in h1)
if (h1[i] != h2[i])
same = 0
break
if (same)
for (i in h2)
if (h2[i] != h1[i])
same = 0
break
print same ? "1" : "2";
add a comment |
up vote
0
down vote
Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.
The input files are representation of a set in which the elements are pairs. For instance if the line foo
occurs 3 times in file1
, that represents the element <foo
, 3>. If file2
contains foo
3 times, that means both sets contain this element. If file2
doesn't contain foo
or contains a different number of repetitions of foo
, then it represents a set which does not contain <foo
, 3>.
Furthermore, note that a set of pairs like <foo
, 3> can be represented by a hash which maps the key foo
to 3.
TXR Lisp awk macro:
(awk (:begin (set fs "|"))
(:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
((= arg 1) (inc [h1 [f 2] 0]))
((= arg 2) (inc [h2 rec 0]))
(:end (exit (equal h1 h2))))
This produces a successful termination status if the files are equal in the required way, otherwise a failed status:
$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1
If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end
rule:
(:end (prn (if (equal h1 h2) "1" "2")))
Here is how things look in regular awk. The main difference is that we have terse syntax in which we don't have to define any variable that we reference; on the other hand, we have to write a pair of loops to compare two associative arrays, and generate our own arg
variable to track which file we are processing. (GNU Awk has the ARGIND
for this purpose.)
BEGIN FS = "
FNR == 1 arg++
arg == 1 h1[$3]++;
arg == 2 h2[$0]++;
END same = 1
for (i in h1)
if (h1[i] != h2[i])
same = 0
break
if (same)
for (i in h2)
if (h2[i] != h1[i])
same = 0
break
print same ? "1" : "2";
add a comment |
up vote
0
down vote
up vote
0
down vote
Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.
The input files are representation of a set in which the elements are pairs. For instance if the line foo
occurs 3 times in file1
, that represents the element <foo
, 3>. If file2
contains foo
3 times, that means both sets contain this element. If file2
doesn't contain foo
or contains a different number of repetitions of foo
, then it represents a set which does not contain <foo
, 3>.
Furthermore, note that a set of pairs like <foo
, 3> can be represented by a hash which maps the key foo
to 3.
TXR Lisp awk macro:
(awk (:begin (set fs "|"))
(:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
((= arg 1) (inc [h1 [f 2] 0]))
((= arg 2) (inc [h2 rec 0]))
(:end (exit (equal h1 h2))))
This produces a successful termination status if the files are equal in the required way, otherwise a failed status:
$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1
If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end
rule:
(:end (prn (if (equal h1 h2) "1" "2")))
Here is how things look in regular awk. The main difference is that we have terse syntax in which we don't have to define any variable that we reference; on the other hand, we have to write a pair of loops to compare two associative arrays, and generate our own arg
variable to track which file we are processing. (GNU Awk has the ARGIND
for this purpose.)
BEGIN FS = "
FNR == 1 arg++
arg == 1 h1[$3]++;
arg == 2 h2[$0]++;
END same = 1
for (i in h1)
if (h1[i] != h2[i])
same = 0
break
if (same)
for (i in h2)
if (h2[i] != h1[i])
same = 0
break
print same ? "1" : "2";
Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.
The input files are representation of a set in which the elements are pairs. For instance if the line foo
occurs 3 times in file1
, that represents the element <foo
, 3>. If file2
contains foo
3 times, that means both sets contain this element. If file2
doesn't contain foo
or contains a different number of repetitions of foo
, then it represents a set which does not contain <foo
, 3>.
Furthermore, note that a set of pairs like <foo
, 3> can be represented by a hash which maps the key foo
to 3.
TXR Lisp awk macro:
(awk (:begin (set fs "|"))
(:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
((= arg 1) (inc [h1 [f 2] 0]))
((= arg 2) (inc [h2 rec 0]))
(:end (exit (equal h1 h2))))
This produces a successful termination status if the files are equal in the required way, otherwise a failed status:
$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1
If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end
rule:
(:end (prn (if (equal h1 h2) "1" "2")))
Here is how things look in regular awk. The main difference is that we have terse syntax in which we don't have to define any variable that we reference; on the other hand, we have to write a pair of loops to compare two associative arrays, and generate our own arg
variable to track which file we are processing. (GNU Awk has the ARGIND
for this purpose.)
BEGIN FS = "
FNR == 1 arg++
arg == 1 h1[$3]++;
arg == 2 h2[$0]++;
END same = 1
for (i in h1)
if (h1[i] != h2[i])
same = 0
break
if (same)
for (i in h2)
if (h2[i] != h1[i])
same = 0
break
print same ? "1" : "2";
edited Sep 13 '16 at 16:16
answered Sep 13 '16 at 16:10
Kaz
4,48811431
4,48811431
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f201684%2fawk-script-to-compare-content-of-2-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36
1
I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46
Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10
1
@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21