Joining entries based off of column using awk/join

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.



file1



1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10


file2



1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40


The desired output would be the following for the first entry (I have this working)



1111|AAA|50|10


For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)



1111|BBB|30|0


And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)



3333|AAA|0|40


So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).



I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:



join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)


The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?



Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?







share|improve this question




















  • It would be a lot easier to do this with sqlite.
    – Red Cricket
    Mar 7 at 6:33














up vote
3
down vote

favorite












I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.



file1



1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10


file2



1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40


The desired output would be the following for the first entry (I have this working)



1111|AAA|50|10


For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)



1111|BBB|30|0


And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)



3333|AAA|0|40


So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).



I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:



join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)


The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?



Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?







share|improve this question




















  • It would be a lot easier to do this with sqlite.
    – Red Cricket
    Mar 7 at 6:33












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.



file1



1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10


file2



1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40


The desired output would be the following for the first entry (I have this working)



1111|AAA|50|10


For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)



1111|BBB|30|0


And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)



3333|AAA|0|40


So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).



I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:



join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)


The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?



Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?







share|improve this question












I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.



file1



1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10


file2



1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40


The desired output would be the following for the first entry (I have this working)



1111|AAA|50|10


For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)



1111|BBB|30|0


And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)



3333|AAA|0|40


So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).



I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:



join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)


The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?



Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?









share|improve this question











share|improve this question




share|improve this question










asked Mar 6 at 22:30









EDubman

1286




1286











  • It would be a lot easier to do this with sqlite.
    – Red Cricket
    Mar 7 at 6:33
















  • It would be a lot easier to do this with sqlite.
    – Red Cricket
    Mar 7 at 6:33















It would be a lot easier to do this with sqlite.
– Red Cricket
Mar 7 at 6:33




It would be a lot easier to do this with sqlite.
– Red Cricket
Mar 7 at 6:33










3 Answers
3






active

oldest

votes

















up vote
3
down vote



accepted










With your approach you have to use join twice (or change your approach to do it with a single join invocation) :



  • print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

  • print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:



awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
ENDfor (j in x)if (!(j in z))print j, "0", x[j];
for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
' OFS="|" file1 file2





share|improve this answer






















  • This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
    – EDubman
    Mar 7 at 5:48










  • Both options work great! Thank you
    – EDubman
    Mar 7 at 21:49

















up vote
1
down vote













The following replaces the first | in both files with @ (use a character that does not occur elsewhere in the file), performs the join and then changes the @ back to the original |. This way, we create a new |-delimited join field consisting of columns 1 and 2 from the original files.



join -t'|' -e0 -a1 -a2 -o0,1.3,2.3 
<( sed 's/|/@/' file1 | sort )
<( sed 's/|/@/' file2 | sort ) |
tr '@' '|'


In the output field specification (-o), a zero represents the join field and column 3 in either file is actually column 4 from the original data.



For the given input files, this generates



1111|AAA|50|10
1111|BBB|30|0
1111|CCC|0|20
2222|BBB|10|0
3333|AAA|0|40





share|improve this answer




















  • Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
    – don_crissti
    Mar 7 at 11:22

















up vote
1
down vote













Another awk approach:



awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF 
ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
for (y in f2) print y, 0, f2[y]
' file[12] OFS='|'


Explanation:




  • NR==FNRf1[$1FS$2]=$NF;next, this will run only for file1 and with the key combination of $1FS$2 will store last column value $NF in array called f1 (FS will substitute with | as awk's Field Seperator).


  • f2[$1FS$2]=$NF, same as above but this will run only for file2


  • for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x], loop within array f1 and print key (x), its value in file1 f1[x] and if there same file1 key in file2, then print it as well, else print 0 (Used ternary condition f2[x]?f2[x]:0), after that we are also deleting record of same key from file2 with delete f2[x].


  • for (y in f2) print y, 0, f2[y], now array f2 has records which exist in file2 only, so we are printing their key (y), 0 because doesn't not exist in file1 and their value in file2 f2[y].





share|improve this answer






















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f428633%2fjoining-entries-based-off-of-column-using-awk-join%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    With your approach you have to use join twice (or change your approach to do it with a single join invocation) :



    • print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    • print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:



    awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
    ENDfor (j in x)if (!(j in z))print j, "0", x[j];
    for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
    ' OFS="|" file1 file2





    share|improve this answer






















    • This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
      – EDubman
      Mar 7 at 5:48










    • Both options work great! Thank you
      – EDubman
      Mar 7 at 21:49














    up vote
    3
    down vote



    accepted










    With your approach you have to use join twice (or change your approach to do it with a single join invocation) :



    • print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    • print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:



    awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
    ENDfor (j in x)if (!(j in z))print j, "0", x[j];
    for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
    ' OFS="|" file1 file2





    share|improve this answer






















    • This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
      – EDubman
      Mar 7 at 5:48










    • Both options work great! Thank you
      – EDubman
      Mar 7 at 21:49












    up vote
    3
    down vote



    accepted







    up vote
    3
    down vote



    accepted






    With your approach you have to use join twice (or change your approach to do it with a single join invocation) :



    • print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    • print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:



    awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
    ENDfor (j in x)if (!(j in z))print j, "0", x[j];
    for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
    ' OFS="|" file1 file2





    share|improve this answer














    With your approach you have to use join twice (or change your approach to do it with a single join invocation) :



    • print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    • print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)

    You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:



    awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
    ENDfor (j in x)if (!(j in z))print j, "0", x[j];
    for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
    ' OFS="|" file1 file2






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 7 at 11:22

























    answered Mar 6 at 23:24









    don_crissti

    46.4k15123153




    46.4k15123153











    • This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
      – EDubman
      Mar 7 at 5:48










    • Both options work great! Thank you
      – EDubman
      Mar 7 at 21:49
















    • This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
      – EDubman
      Mar 7 at 5:48










    • Both options work great! Thank you
      – EDubman
      Mar 7 at 21:49















    This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
    – EDubman
    Mar 7 at 5:48




    This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
    – EDubman
    Mar 7 at 5:48












    Both options work great! Thank you
    – EDubman
    Mar 7 at 21:49




    Both options work great! Thank you
    – EDubman
    Mar 7 at 21:49












    up vote
    1
    down vote













    The following replaces the first | in both files with @ (use a character that does not occur elsewhere in the file), performs the join and then changes the @ back to the original |. This way, we create a new |-delimited join field consisting of columns 1 and 2 from the original files.



    join -t'|' -e0 -a1 -a2 -o0,1.3,2.3 
    <( sed 's/|/@/' file1 | sort )
    <( sed 's/|/@/' file2 | sort ) |
    tr '@' '|'


    In the output field specification (-o), a zero represents the join field and column 3 in either file is actually column 4 from the original data.



    For the given input files, this generates



    1111|AAA|50|10
    1111|BBB|30|0
    1111|CCC|0|20
    2222|BBB|10|0
    3333|AAA|0|40





    share|improve this answer




















    • Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
      – don_crissti
      Mar 7 at 11:22














    up vote
    1
    down vote













    The following replaces the first | in both files with @ (use a character that does not occur elsewhere in the file), performs the join and then changes the @ back to the original |. This way, we create a new |-delimited join field consisting of columns 1 and 2 from the original files.



    join -t'|' -e0 -a1 -a2 -o0,1.3,2.3 
    <( sed 's/|/@/' file1 | sort )
    <( sed 's/|/@/' file2 | sort ) |
    tr '@' '|'


    In the output field specification (-o), a zero represents the join field and column 3 in either file is actually column 4 from the original data.



    For the given input files, this generates



    1111|AAA|50|10
    1111|BBB|30|0
    1111|CCC|0|20
    2222|BBB|10|0
    3333|AAA|0|40





    share|improve this answer




















    • Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
      – don_crissti
      Mar 7 at 11:22












    up vote
    1
    down vote










    up vote
    1
    down vote









    The following replaces the first | in both files with @ (use a character that does not occur elsewhere in the file), performs the join and then changes the @ back to the original |. This way, we create a new |-delimited join field consisting of columns 1 and 2 from the original files.



    join -t'|' -e0 -a1 -a2 -o0,1.3,2.3 
    <( sed 's/|/@/' file1 | sort )
    <( sed 's/|/@/' file2 | sort ) |
    tr '@' '|'


    In the output field specification (-o), a zero represents the join field and column 3 in either file is actually column 4 from the original data.



    For the given input files, this generates



    1111|AAA|50|10
    1111|BBB|30|0
    1111|CCC|0|20
    2222|BBB|10|0
    3333|AAA|0|40





    share|improve this answer












    The following replaces the first | in both files with @ (use a character that does not occur elsewhere in the file), performs the join and then changes the @ back to the original |. This way, we create a new |-delimited join field consisting of columns 1 and 2 from the original files.



    join -t'|' -e0 -a1 -a2 -o0,1.3,2.3 
    <( sed 's/|/@/' file1 | sort )
    <( sed 's/|/@/' file2 | sort ) |
    tr '@' '|'


    In the output field specification (-o), a zero represents the join field and column 3 in either file is actually column 4 from the original data.



    For the given input files, this generates



    1111|AAA|50|10
    1111|BBB|30|0
    1111|CCC|0|20
    2222|BBB|10|0
    3333|AAA|0|40






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Mar 7 at 6:40









    Kusalananda

    103k13202318




    103k13202318











    • Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
      – don_crissti
      Mar 7 at 11:22
















    • Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
      – don_crissti
      Mar 7 at 11:22















    Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
    – don_crissti
    Mar 7 at 11:22




    Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
    – don_crissti
    Mar 7 at 11:22










    up vote
    1
    down vote













    Another awk approach:



    awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF 
    ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
    for (y in f2) print y, 0, f2[y]
    ' file[12] OFS='|'


    Explanation:




    • NR==FNRf1[$1FS$2]=$NF;next, this will run only for file1 and with the key combination of $1FS$2 will store last column value $NF in array called f1 (FS will substitute with | as awk's Field Seperator).


    • f2[$1FS$2]=$NF, same as above but this will run only for file2


    • for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x], loop within array f1 and print key (x), its value in file1 f1[x] and if there same file1 key in file2, then print it as well, else print 0 (Used ternary condition f2[x]?f2[x]:0), after that we are also deleting record of same key from file2 with delete f2[x].


    • for (y in f2) print y, 0, f2[y], now array f2 has records which exist in file2 only, so we are printing their key (y), 0 because doesn't not exist in file1 and their value in file2 f2[y].





    share|improve this answer


























      up vote
      1
      down vote













      Another awk approach:



      awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF 
      ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
      for (y in f2) print y, 0, f2[y]
      ' file[12] OFS='|'


      Explanation:




      • NR==FNRf1[$1FS$2]=$NF;next, this will run only for file1 and with the key combination of $1FS$2 will store last column value $NF in array called f1 (FS will substitute with | as awk's Field Seperator).


      • f2[$1FS$2]=$NF, same as above but this will run only for file2


      • for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x], loop within array f1 and print key (x), its value in file1 f1[x] and if there same file1 key in file2, then print it as well, else print 0 (Used ternary condition f2[x]?f2[x]:0), after that we are also deleting record of same key from file2 with delete f2[x].


      • for (y in f2) print y, 0, f2[y], now array f2 has records which exist in file2 only, so we are printing their key (y), 0 because doesn't not exist in file1 and their value in file2 f2[y].





      share|improve this answer
























        up vote
        1
        down vote










        up vote
        1
        down vote









        Another awk approach:



        awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF 
        ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
        for (y in f2) print y, 0, f2[y]
        ' file[12] OFS='|'


        Explanation:




        • NR==FNRf1[$1FS$2]=$NF;next, this will run only for file1 and with the key combination of $1FS$2 will store last column value $NF in array called f1 (FS will substitute with | as awk's Field Seperator).


        • f2[$1FS$2]=$NF, same as above but this will run only for file2


        • for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x], loop within array f1 and print key (x), its value in file1 f1[x] and if there same file1 key in file2, then print it as well, else print 0 (Used ternary condition f2[x]?f2[x]:0), after that we are also deleting record of same key from file2 with delete f2[x].


        • for (y in f2) print y, 0, f2[y], now array f2 has records which exist in file2 only, so we are printing their key (y), 0 because doesn't not exist in file1 and their value in file2 f2[y].





        share|improve this answer














        Another awk approach:



        awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF 
        ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
        for (y in f2) print y, 0, f2[y]
        ' file[12] OFS='|'


        Explanation:




        • NR==FNRf1[$1FS$2]=$NF;next, this will run only for file1 and with the key combination of $1FS$2 will store last column value $NF in array called f1 (FS will substitute with | as awk's Field Seperator).


        • f2[$1FS$2]=$NF, same as above but this will run only for file2


        • for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x], loop within array f1 and print key (x), its value in file1 f1[x] and if there same file1 key in file2, then print it as well, else print 0 (Used ternary condition f2[x]?f2[x]:0), after that we are also deleting record of same key from file2 with delete f2[x].


        • for (y in f2) print y, 0, f2[y], now array f2 has records which exist in file2 only, so we are printing their key (y), 0 because doesn't not exist in file1 and their value in file2 f2[y].






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 7 at 7:22

























        answered Mar 7 at 6:18









        αғsнιη

        14.9k82462




        14.9k82462






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f428633%2fjoining-entries-based-off-of-column-using-awk-join%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            How many registers does an x86_64 CPU actually have?

            Nur Jahan