Joining entries based off of column using awk/join
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.
file1
1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10
file2
1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40
The desired output would be the following for the first entry (I have this working)
1111|AAA|50|10
For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)
1111|BBB|30|0
And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)
3333|AAA|0|40
So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).
I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:
join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)
The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?
Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?
text-processing awk join
add a comment |Â
up vote
3
down vote
favorite
I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.
file1
1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10
file2
1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40
The desired output would be the following for the first entry (I have this working)
1111|AAA|50|10
For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)
1111|BBB|30|0
And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)
3333|AAA|0|40
So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).
I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:
join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)
The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?
Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?
text-processing awk join
It would be a lot easier to do this with sqlite.
â Red Cricket
Mar 7 at 6:33
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.
file1
1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10
file2
1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40
The desired output would be the following for the first entry (I have this working)
1111|AAA|50|10
For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)
1111|BBB|30|0
And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)
3333|AAA|0|40
So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).
I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:
join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)
The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?
Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?
text-processing awk join
I have two files which are pipe-delimited and may have column 1+column2 matches in both, or one file may have the entry while the other does not. Assume my match-key I am going off of equals $1"-"$2 using a pipe '|' as the FS.
file1
1111|AAA|foo|50
1111|BBB|foo|30
2222|BBB|foo|10
file2
1111|AAA|bar|10
1111|CCC|bar|20
3333|AAA|bar|40
The desired output would be the following for the first entry (I have this working)
1111|AAA|50|10
For the second entry file1 (If there is no matching column1+column2 in both files, replace the entry which is missing for foo as 0. And the other way around)
1111|BBB|30|0
And for an entry key (column1+column2) in file2, but not in file1 (This is entry 3 of file 2 expected output)
3333|AAA|0|40
So, desired output overall format is listing ALL unique keys which are represented by column1+column2 in BOTH files. With the 3rd column entries being those values from file1 column 4 (or 0 if value doesn't exist in file1) and the 4th column in output as those values in column 4 of file 2 (or 0 if value doesn't exist in file2).
I have done a lot of research and tried many things but I have values not outputting if the column1+column2 pair exists in file2 but not file1 by using the following:
join -t"|" -e0 -a1 -a2 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F"|" 'print $1"-"$2"' | sort -k1,1) <(<file2 awk -F"|" 'print $1"-"$2"' | sort -k1,1)
The above case gives me expected output if there is a column1+column2 match in file1 but not file2, and appends a 0 for the match not existing... How can I get this to work for ALL scenarios?
Above command will do some process substitution by adding a key in column 1 in both files which is column1+column2, and then join based off of that new key. -e0 will add a 0 if this key exists in file1 but not file2. How can I get it to cover the case of: New key (column1-column2) exists in file 2 but NOT file 1?
text-processing awk join
asked Mar 6 at 22:30
EDubman
1286
1286
It would be a lot easier to do this with sqlite.
â Red Cricket
Mar 7 at 6:33
add a comment |Â
It would be a lot easier to do this with sqlite.
â Red Cricket
Mar 7 at 6:33
It would be a lot easier to do this with sqlite.
â Red Cricket
Mar 7 at 6:33
It would be a lot easier to do this with sqlite.
â Red Cricket
Mar 7 at 6:33
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
With your approach you have to use join
twice (or change your approach to do it with a single join
invocation) :
- print the common lines and the unpairable lines from
file1
withjoin -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
- print the unpairable lines from
file2
withjoin -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
You can do the same with a single awk
invocation, storing $4
in two arrays indexed by e.g. $1|$2
and then in the END
block iterating over each array indices, comparing them and printing accordingly:
awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
ENDfor (j in x)if (!(j in z))print j, "0", x[j];
for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
' OFS="|" file1 file2
This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
â EDubman
Mar 7 at 5:48
Both options work great! Thank you
â EDubman
Mar 7 at 21:49
add a comment |Â
up vote
1
down vote
The following replaces the first |
in both files with @
(use a character that does not occur elsewhere in the file), performs the join
and then changes the @
back to the original |
. This way, we create a new |
-delimited join field consisting of columns 1 and 2 from the original files.
join -t'|' -e0 -a1 -a2 -o0,1.3,2.3
<( sed 's/|/@/' file1 | sort )
<( sed 's/|/@/' file2 | sort ) |
tr '@' '|'
In the output field specification (-o
), a zero represents the join field and column 3 in either file is actually column 4 from the original data.
For the given input files, this generates
1111|AAA|50|10
1111|BBB|30|0
1111|CCC|0|20
2222|BBB|10|0
3333|AAA|0|40
Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
â don_crissti
Mar 7 at 11:22
add a comment |Â
up vote
1
down vote
Another awk
approach:
awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF
ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
for (y in f2) print y, 0, f2[y]
' file[12] OFS='|'
Explanation:
NR==FNRf1[$1FS$2]=$NF;next
, this will run only for file1 and with the key combination of$1FS$2
will store last column value$NF
in array calledf1
(FS
will substitute with|
as awk's Field Seperator).f2[$1FS$2]=$NF
, same as above but this will run only for file2for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x]
, loop within arrayf1
and print key (x
), its value in file1f1[x]
and if there same file1 key in file2, then print it as well, else print0
(Used ternary conditionf2[x]?f2[x]:0
), after that we are also deleting record of same key from file2 withdelete f2[x]
.for (y in f2) print y, 0, f2[y]
, now arrayf2
has records which exist in file2 only, so we are printing their key (y
),0
because doesn't not exist in file1 and their value in file2f2[y]
.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
With your approach you have to use join
twice (or change your approach to do it with a single join
invocation) :
- print the common lines and the unpairable lines from
file1
withjoin -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
- print the unpairable lines from
file2
withjoin -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
You can do the same with a single awk
invocation, storing $4
in two arrays indexed by e.g. $1|$2
and then in the END
block iterating over each array indices, comparing them and printing accordingly:
awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
ENDfor (j in x)if (!(j in z))print j, "0", x[j];
for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
' OFS="|" file1 file2
This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
â EDubman
Mar 7 at 5:48
Both options work great! Thank you
â EDubman
Mar 7 at 21:49
add a comment |Â
up vote
3
down vote
accepted
With your approach you have to use join
twice (or change your approach to do it with a single join
invocation) :
- print the common lines and the unpairable lines from
file1
withjoin -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
- print the unpairable lines from
file2
withjoin -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
You can do the same with a single awk
invocation, storing $4
in two arrays indexed by e.g. $1|$2
and then in the END
block iterating over each array indices, comparing them and printing accordingly:
awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
ENDfor (j in x)if (!(j in z))print j, "0", x[j];
for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
' OFS="|" file1 file2
This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
â EDubman
Mar 7 at 5:48
Both options work great! Thank you
â EDubman
Mar 7 at 21:49
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
With your approach you have to use join
twice (or change your approach to do it with a single join
invocation) :
- print the common lines and the unpairable lines from
file1
withjoin -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
- print the unpairable lines from
file2
withjoin -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
You can do the same with a single awk
invocation, storing $4
in two arrays indexed by e.g. $1|$2
and then in the END
block iterating over each array indices, comparing them and printing accordingly:
awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
ENDfor (j in x)if (!(j in z))print j, "0", x[j];
for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
' OFS="|" file1 file2
With your approach you have to use join
twice (or change your approach to do it with a single join
invocation) :
- print the common lines and the unpairable lines from
file1
withjoin -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
- print the unpairable lines from
file2
withjoin -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1) <(<file2 awk -F'|' 'print $1"-"$2"' | sort -t'|' -k1,1)
You can do the same with a single awk
invocation, storing $4
in two arrays indexed by e.g. $1|$2
and then in the END
block iterating over each array indices, comparing them and printing accordingly:
awk -F'|' 'NR==FNR"$2]=$4;nextx[$1"
ENDfor (j in x)if (!(j in z))print j, "0", x[j];
for (i in z)if (i in x)print i, z[i], x[i] else print i, z[i], "0"
' OFS="|" file1 file2
edited Mar 7 at 11:22
answered Mar 6 at 23:24
don_crissti
46.4k15123153
46.4k15123153
This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
â EDubman
Mar 7 at 5:48
Both options work great! Thank you
â EDubman
Mar 7 at 21:49
add a comment |Â
This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
â EDubman
Mar 7 at 5:48
Both options work great! Thank you
â EDubman
Mar 7 at 21:49
This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
â EDubman
Mar 7 at 5:48
This looks great! I had attempted using a single awk expression but just could not get it down correctly. I will be verifying this answer tomorrow and update accordingly. Much appreciated
â EDubman
Mar 7 at 5:48
Both options work great! Thank you
â EDubman
Mar 7 at 21:49
Both options work great! Thank you
â EDubman
Mar 7 at 21:49
add a comment |Â
up vote
1
down vote
The following replaces the first |
in both files with @
(use a character that does not occur elsewhere in the file), performs the join
and then changes the @
back to the original |
. This way, we create a new |
-delimited join field consisting of columns 1 and 2 from the original files.
join -t'|' -e0 -a1 -a2 -o0,1.3,2.3
<( sed 's/|/@/' file1 | sort )
<( sed 's/|/@/' file2 | sort ) |
tr '@' '|'
In the output field specification (-o
), a zero represents the join field and column 3 in either file is actually column 4 from the original data.
For the given input files, this generates
1111|AAA|50|10
1111|BBB|30|0
1111|CCC|0|20
2222|BBB|10|0
3333|AAA|0|40
Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
â don_crissti
Mar 7 at 11:22
add a comment |Â
up vote
1
down vote
The following replaces the first |
in both files with @
(use a character that does not occur elsewhere in the file), performs the join
and then changes the @
back to the original |
. This way, we create a new |
-delimited join field consisting of columns 1 and 2 from the original files.
join -t'|' -e0 -a1 -a2 -o0,1.3,2.3
<( sed 's/|/@/' file1 | sort )
<( sed 's/|/@/' file2 | sort ) |
tr '@' '|'
In the output field specification (-o
), a zero represents the join field and column 3 in either file is actually column 4 from the original data.
For the given input files, this generates
1111|AAA|50|10
1111|BBB|30|0
1111|CCC|0|20
2222|BBB|10|0
3333|AAA|0|40
Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
â don_crissti
Mar 7 at 11:22
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The following replaces the first |
in both files with @
(use a character that does not occur elsewhere in the file), performs the join
and then changes the @
back to the original |
. This way, we create a new |
-delimited join field consisting of columns 1 and 2 from the original files.
join -t'|' -e0 -a1 -a2 -o0,1.3,2.3
<( sed 's/|/@/' file1 | sort )
<( sed 's/|/@/' file2 | sort ) |
tr '@' '|'
In the output field specification (-o
), a zero represents the join field and column 3 in either file is actually column 4 from the original data.
For the given input files, this generates
1111|AAA|50|10
1111|BBB|30|0
1111|CCC|0|20
2222|BBB|10|0
3333|AAA|0|40
The following replaces the first |
in both files with @
(use a character that does not occur elsewhere in the file), performs the join
and then changes the @
back to the original |
. This way, we create a new |
-delimited join field consisting of columns 1 and 2 from the original files.
join -t'|' -e0 -a1 -a2 -o0,1.3,2.3
<( sed 's/|/@/' file1 | sort )
<( sed 's/|/@/' file2 | sort ) |
tr '@' '|'
In the output field specification (-o
), a zero represents the join field and column 3 in either file is actually column 4 from the original data.
For the given input files, this generates
1111|AAA|50|10
1111|BBB|30|0
1111|CCC|0|20
2222|BBB|10|0
3333|AAA|0|40
answered Mar 7 at 6:40
Kusalananda
103k13202318
103k13202318
Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
â don_crissti
Mar 7 at 11:22
add a comment |Â
Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
â don_crissti
Mar 7 at 11:22
Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
â don_crissti
Mar 7 at 11:22
Yup; with unknown input I usually resort to low ascii chars which are almost guaranteed to never occur in a text file but that is on gnu setups...
â don_crissti
Mar 7 at 11:22
add a comment |Â
up vote
1
down vote
Another awk
approach:
awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF
ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
for (y in f2) print y, 0, f2[y]
' file[12] OFS='|'
Explanation:
NR==FNRf1[$1FS$2]=$NF;next
, this will run only for file1 and with the key combination of$1FS$2
will store last column value$NF
in array calledf1
(FS
will substitute with|
as awk's Field Seperator).f2[$1FS$2]=$NF
, same as above but this will run only for file2for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x]
, loop within arrayf1
and print key (x
), its value in file1f1[x]
and if there same file1 key in file2, then print it as well, else print0
(Used ternary conditionf2[x]?f2[x]:0
), after that we are also deleting record of same key from file2 withdelete f2[x]
.for (y in f2) print y, 0, f2[y]
, now arrayf2
has records which exist in file2 only, so we are printing their key (y
),0
because doesn't not exist in file1 and their value in file2f2[y]
.
add a comment |Â
up vote
1
down vote
Another awk
approach:
awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF
ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
for (y in f2) print y, 0, f2[y]
' file[12] OFS='|'
Explanation:
NR==FNRf1[$1FS$2]=$NF;next
, this will run only for file1 and with the key combination of$1FS$2
will store last column value$NF
in array calledf1
(FS
will substitute with|
as awk's Field Seperator).f2[$1FS$2]=$NF
, same as above but this will run only for file2for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x]
, loop within arrayf1
and print key (x
), its value in file1f1[x]
and if there same file1 key in file2, then print it as well, else print0
(Used ternary conditionf2[x]?f2[x]:0
), after that we are also deleting record of same key from file2 withdelete f2[x]
.for (y in f2) print y, 0, f2[y]
, now arrayf2
has records which exist in file2 only, so we are printing their key (y
),0
because doesn't not exist in file1 and their value in file2f2[y]
.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Another awk
approach:
awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF
ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
for (y in f2) print y, 0, f2[y]
' file[12] OFS='|'
Explanation:
NR==FNRf1[$1FS$2]=$NF;next
, this will run only for file1 and with the key combination of$1FS$2
will store last column value$NF
in array calledf1
(FS
will substitute with|
as awk's Field Seperator).f2[$1FS$2]=$NF
, same as above but this will run only for file2for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x]
, loop within arrayf1
and print key (x
), its value in file1f1[x]
and if there same file1 key in file2, then print it as well, else print0
(Used ternary conditionf2[x]?f2[x]:0
), after that we are also deleting record of same key from file2 withdelete f2[x]
.for (y in f2) print y, 0, f2[y]
, now arrayf2
has records which exist in file2 only, so we are printing their key (y
),0
because doesn't not exist in file1 and their value in file2f2[y]
.
Another awk
approach:
awk -F'|' 'NR==FNRf1[$1FS$2]=$NF;next f2[$1FS$2]=$NF
ENDfor (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x];
for (y in f2) print y, 0, f2[y]
' file[12] OFS='|'
Explanation:
NR==FNRf1[$1FS$2]=$NF;next
, this will run only for file1 and with the key combination of$1FS$2
will store last column value$NF
in array calledf1
(FS
will substitute with|
as awk's Field Seperator).f2[$1FS$2]=$NF
, same as above but this will run only for file2for (x in f1)print x,f1[x],f2[x]?f2[x]:0; delete f2[x]
, loop within arrayf1
and print key (x
), its value in file1f1[x]
and if there same file1 key in file2, then print it as well, else print0
(Used ternary conditionf2[x]?f2[x]:0
), after that we are also deleting record of same key from file2 withdelete f2[x]
.for (y in f2) print y, 0, f2[y]
, now arrayf2
has records which exist in file2 only, so we are printing their key (y
),0
because doesn't not exist in file1 and their value in file2f2[y]
.
edited Mar 7 at 7:22
answered Mar 7 at 6:18
ñÃÂsýù÷
14.9k82462
14.9k82462
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f428633%2fjoining-entries-based-off-of-column-using-awk-join%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
It would be a lot easier to do this with sqlite.
â Red Cricket
Mar 7 at 6:33