AWK script to compare content of 2 files

up vote
1
down vote

favorite

I have 2 files ..

file1:

abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator

file2:

check
map
equator
globe

AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content

should return 1 if ALL lines are matching

else should return 2

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

asked May 6 '15 at 1:32

Nandini

2

I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36

1

I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46

Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10

1

@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21

add a comment |

up vote
1
down vote

favorite

I have 2 files ..

file1:

abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator

file2:

check
map
equator
globe

AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content

should return 1 if ALL lines are matching

else should return 2

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

asked May 6 '15 at 1:32

Nandini

2

I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36

1

I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46

Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10

1

@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21

add a comment |

up vote
1
down vote

favorite

I have 2 files ..

file1:

abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator

file2:

check
map
equator
globe

AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content

should return 1 if ALL lines are matching

else should return 2

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

asked May 6 '15 at 1:32

Nandini

I have 2 files ..

file1:

abc|123|check
def|456|map
ijk|789|globe
lmn|101112|equator

file2:

check
map
equator
globe

AWK function should compare 3rd column (after cut 3rd column & sort) of file1 comparing with file2 sorted content

should return 1 if ALL lines are matching

else should return 2

text-processing awk

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

asked May 6 '15 at 1:32

Nandini

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

asked May 6 '15 at 1:32

Nandini

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

edited Nov 20 at 22:30

Rui F Ribeiro

38.2k1475125

asked May 6 '15 at 1:32

Nandini

asked May 6 '15 at 1:32

Nandini

asked May 6 '15 at 1:32

Nandini

2

I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36

1

I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46

Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10

1

@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21

add a comment |

2

I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36

1

I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46

Thanks for the response ... comm -13 <(cut -d'|' -f3 file1 | sort | uniq) <(cat file2 | sort | uniq) When I am running this command, getting the following error ksh: 0403-057 Syntax error: `(' is not expected. TIA
– Nandini
May 6 '15 at 3:10

1

@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21

I'm voting to close this question as off-topic because this site is not a script writing service and you should at least attempt to do your own homework.
– jordanm
May 6 '15 at 1:36

I tried my level best to use diff command, comm utility and tried to use uniq/sort ... but not able to join all those in two a single liner .. that was the reason to ask help for command .. if I get the idea, I can script it ... Thanks
– Nandini
May 6 '15 at 1:46

@jordanm the OP is not asking you to write a script. He/she is asking about how to use a specific command in a specific way. That falls within the boundaries of questions here, AFAIK. A rudimentary question that falls within the guidelines for asking questions is nonetheless a legitimate question.
– njboot
May 6 '15 at 6:21

add a comment |

3 Answers
3

active

oldest

votes

up vote
1
down vote

function are_all_there 
 local num_diff=$(comm -3 <(cut -d'

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

add a comment |

up vote
0
down vote

Based on your comments, it seems that awk is not your only option. so here is a non-awk method.

You don't mention the need for unique comparison in the question, but you have used uniq in the example in your comment. If you don't need a unique match, just remove sort's -u option. (tested in bash)
.

(($(comm -3 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
 echo 1 - all match

Or, using awk for the final comparison - with a bit of help from paste.

paste <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |
 awk '$1!=$2m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";'

Or, awk comparing two input files

 awk 'if(NR == FNR)a[NR]=$1
 else if($1 != a[NR])m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";' 
 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |

edited Apr 13 '17 at 12:36

Community♦

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

add a comment |

up vote
0
down vote

Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.

The input files are representation of a set in which the elements are pairs. For instance if the line foo occurs 3 times in file1, that represents the element <foo, 3>. If file2 contains foo 3 times, that means both sets contain this element. If file2 doesn't contain foo or contains a different number of repetitions of foo, then it represents a set which does not contain <foo, 3>.

Furthermore, note that a set of pairs like <foo, 3> can be represented by a hash which maps the key foo to 3.

TXR Lisp awk macro:

(awk (:begin (set fs "|"))
 (:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
 ((= arg 1) (inc [h1 [f 2] 0]))
 ((= arg 2) (inc [h2 rec 0]))
 (:end (exit (equal h1 h2))))

This produces a successful termination status if the files are equal in the required way, otherwise a failed status:


$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1

If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end rule:

(:end (prn (if (equal h1 h2) "1" "2")))

Here is how things look in regular awk. The main difference is that we have terse syntax in which we don't have to define any variable that we reference; on the other hand, we have to write a pair of loops to compare two associative arrays, and generate our own arg variable to track which file we are processing. (GNU Awk has the ARGIND for this purpose.)

BEGIN FS = "
FNR == 1 arg++ 
arg == 1 h1[$3]++; 
arg == 2 h2[$0]++; 
END same = 1
 for (i in h1)
 if (h1[i] != h2[i]) 
 same = 0
 break
 
 if (same)
 for (i in h2)
 if (h2[i] != h1[i]) 
 same = 0
 break
 
 print same ? "1" : "2";

edited Sep 13 '16 at 16:16

answered Sep 13 '16 at 16:10

Kaz

4,48811431

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f201684%2fawk-script-to-compare-content-of-2-files%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

function are_all_there 
 local num_diff=$(comm -3 <(cut -d'

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

add a comment |

up vote
1
down vote

function are_all_there 
 local num_diff=$(comm -3 <(cut -d'

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

add a comment |

up vote
1
down vote

function are_all_there 
 local num_diff=$(comm -3 <(cut -d'

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

function are_all_there 
 local num_diff=$(comm -3 <(cut -d'

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

answered May 6 '15 at 10:22

glenn jackman

49.5k469106

add a comment |

up vote
0
down vote

(($(comm -3 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
 echo 1 - all match

Or, using awk for the final comparison - with a bit of help from paste.

paste <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |
 awk '$1!=$2m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";'

Or, awk comparing two input files

 awk 'if(NR == FNR)a[NR]=$1
 else if($1 != a[NR])m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";' 
 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |

edited Apr 13 '17 at 12:36

Community♦

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

add a comment |

up vote
0
down vote

(($(comm -3 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
 echo 1 - all match

Or, using awk for the final comparison - with a bit of help from paste.

paste <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |
 awk '$1!=$2m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";'

Or, awk comparing two input files

 awk 'if(NR == FNR)a[NR]=$1
 else if($1 != a[NR])m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";' 
 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |

edited Apr 13 '17 at 12:36

Community♦

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

add a comment |

up vote
0
down vote

(($(comm -3 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
 echo 1 - all match

Or, using awk for the final comparison - with a bit of help from paste.

paste <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |
 awk '$1!=$2m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";'

Or, awk comparing two input files

 awk 'if(NR == FNR)a[NR]=$1
 else if($1 != a[NR])m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";' 
 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |

edited Apr 13 '17 at 12:36

Community♦

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

(($(comm -3 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) | wc -l))) && echo 2 - not all match ||
 echo 1 - all match

Or, using awk for the final comparison - with a bit of help from paste.

paste <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |
 awk '$1!=$2m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";'

Or, awk comparing two input files

 awk 'if(NR == FNR)a[NR]=$1
 else if($1 != a[NR])m=2; exit 
 END if(m == 2)print "2 - not all match"; exit; 
 print "1 - all match";' 
 <( cut -d'|' -f3 file1 | sort -u ) 
 <( sort -u file2 ) |

edited Apr 13 '17 at 12:36

Community♦

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

edited Apr 13 '17 at 12:36

Community♦

edited Apr 13 '17 at 12:36

Community♦

edited Apr 13 '17 at 12:36

Community♦

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

answered May 6 '15 at 9:38

Peter.O

18.7k1791143

add a comment |

up vote
0
down vote

Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.

Furthermore, note that a set of pairs like <foo, 3> can be represented by a hash which maps the key foo to 3.

TXR Lisp awk macro:

(awk (:begin (set fs "|"))
 (:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
 ((= arg 1) (inc [h1 [f 2] 0]))
 ((= arg 2) (inc [h2 rec 0]))
 (:end (exit (equal h1 h2))))

This produces a successful termination status if the files are equal in the required way, otherwise a failed status:


$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1

If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end rule:

(:end (prn (if (equal h1 h2) "1" "2")))

BEGIN FS = "
FNR == 1 arg++ 
arg == 1 h1[$3]++; 
arg == 2 h2[$0]++; 
END same = 1
 for (i in h1)
 if (h1[i] != h2[i]) 
 same = 0
 break
 
 if (same)
 for (i in h2)
 if (h2[i] != h1[i]) 
 same = 0
 break
 
 print same ? "1" : "2";

edited Sep 13 '16 at 16:16

answered Sep 13 '16 at 16:10

Kaz

4,48811431

add a comment |

up vote
0
down vote

Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.

Furthermore, note that a set of pairs like <foo, 3> can be represented by a hash which maps the key foo to 3.

TXR Lisp awk macro:

(awk (:begin (set fs "|"))
 (:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
 ((= arg 1) (inc [h1 [f 2] 0]))
 ((= arg 2) (inc [h2 rec 0]))
 (:end (exit (equal h1 h2))))

This produces a successful termination status if the files are equal in the required way, otherwise a failed status:


$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1

If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end rule:

(:end (prn (if (equal h1 h2) "1" "2")))

BEGIN FS = "
FNR == 1 arg++ 
arg == 1 h1[$3]++; 
arg == 2 h2[$0]++; 
END same = 1
 for (i in h1)
 if (h1[i] != h2[i]) 
 same = 0
 break
 
 if (same)
 for (i in h2)
 if (h2[i] != h1[i]) 
 same = 0
 break
 
 print same ? "1" : "2";

edited Sep 13 '16 at 16:16

answered Sep 13 '16 at 16:10

Kaz

4,48811431

add a comment |

up vote
0
down vote

Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.

Furthermore, note that a set of pairs like <foo, 3> can be represented by a hash which maps the key foo to 3.

TXR Lisp awk macro:

(awk (:begin (set fs "|"))
 (:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
 ((= arg 1) (inc [h1 [f 2] 0]))
 ((= arg 2) (inc [h2 rec 0]))
 (:end (exit (equal h1 h2))))

This produces a successful termination status if the files are equal in the required way, otherwise a failed status:


$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1

If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end rule:

(:end (prn (if (equal h1 h2) "1" "2")))

BEGIN FS = "
FNR == 1 arg++ 
arg == 1 h1[$3]++; 
arg == 2 h2[$0]++; 
END same = 1
 for (i in h1)
 if (h1[i] != h2[i]) 
 same = 0
 break
 
 if (same)
 for (i in h2)
 if (h2[i] != h1[i]) 
 same = 0
 break
 
 print same ? "1" : "2";

edited Sep 13 '16 at 16:16

answered Sep 13 '16 at 16:10

Kaz

4,48811431

Amusing CS answers! We do not actually have to sort anything, because this is a pure set comparison.

Furthermore, note that a set of pairs like <foo, 3> can be represented by a hash which maps the key foo to 3.

TXR Lisp awk macro:

(awk (:begin (set fs "|"))
 (:let (h1 (hash :equal-based)) (h2 (hash :equal-based)))
 ((= arg 1) (inc [h1 [f 2] 0]))
 ((= arg 2) (inc [h2 rec 0]))
 (:end (exit (equal h1 h2))))

This produces a successful termination status if the files are equal in the required way, otherwise a failed status:


$ txr comp.tl file1 file2
$ echo $?
0
$ echo map >> file2
$ txr comp.tl file1 file2
$ echo $?
1

If we want to complicate things for the calling program by making it parse "1" or "2" output, that can be done by changing the :end rule:

(:end (prn (if (equal h1 h2) "1" "2")))

BEGIN FS = "
FNR == 1 arg++ 
arg == 1 h1[$3]++; 
arg == 2 h2[$0]++; 
END same = 1
 for (i in h1)
 if (h1[i] != h2[i]) 
 same = 0
 break
 
 if (same)
 for (i in h2)
 if (h2[i] != h1[i]) 
 same = 0
 break
 
 print same ? "1" : "2";

edited Sep 13 '16 at 16:16

answered Sep 13 '16 at 16:10

Kaz

4,48811431

edited Sep 13 '16 at 16:16

answered Sep 13 '16 at 16:10

Kaz

4,48811431

answered Sep 13 '16 at 16:10

Kaz

4,48811431

answered Sep 13 '16 at 16:10

Kaz

4,48811431

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu