Comparing csv files to lookup column 1 then check values in column 2

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












OK, I will try and explain what I need to do as best as possible.
Basically I have two CSV files, as per the examples below:



File 1:



Column 1, Column 2
abc , 123
def , 234
adf , 567


File 2



Column 1, Column 2
abc , 123
def , 234
adf , 578


I need to write either a shell script or simple command that will do the following:



  1. Sort both files by column 1

  2. Row by row, do the following:

    • Using column 1 in file 1, search for this value in column 1 in file 2.

      1. if found, compare the value in column 2 in file 1 against the value in column 2 of file 2

      2. if it matches, write column 1, column 2 and "Validated" in column 3 to a separate file

      3. if it does not match, write column 1, column 2 and "Failed" to a separate file



This results in two output files: the first with everything that was found in column 1 and column 2 matches, and a second file containing either column 1 lookups that failed or where column 1 was found, where column 2 did not match, so, essentially, using column 1 as the key to check column 2.










share|improve this question























  • Your question would be clearer if you (1) showed exactly what you want the output files to look like for your example input, and (2) described (and illustrated) what you want to happen if a value in column 1 of file 1 is not present in file 2. Also, have you made any attempt to solve this yourself? Done any research? What have you found? What have you tried?
    – G-Man
    Mar 21 '17 at 18:10










  • Hi, apologies yes I have been looking into this extensively. I first tried the below but this did not give the desired result so had to play around with the columns etc awk -F, 'NR==FNRa[$1,$3]++;next (a[$1,$2])' file1.txt file2.txt I have been also looking at whether I should be doing this in SQL but that means importing on eof the files into SQL and running innerjoins which with 150 million lines can take a while. I am still investigating and will obviously share anything I find that works on this post.
    – Veyron
    Mar 22 '17 at 14:24















up vote
0
down vote

favorite












OK, I will try and explain what I need to do as best as possible.
Basically I have two CSV files, as per the examples below:



File 1:



Column 1, Column 2
abc , 123
def , 234
adf , 567


File 2



Column 1, Column 2
abc , 123
def , 234
adf , 578


I need to write either a shell script or simple command that will do the following:



  1. Sort both files by column 1

  2. Row by row, do the following:

    • Using column 1 in file 1, search for this value in column 1 in file 2.

      1. if found, compare the value in column 2 in file 1 against the value in column 2 of file 2

      2. if it matches, write column 1, column 2 and "Validated" in column 3 to a separate file

      3. if it does not match, write column 1, column 2 and "Failed" to a separate file



This results in two output files: the first with everything that was found in column 1 and column 2 matches, and a second file containing either column 1 lookups that failed or where column 1 was found, where column 2 did not match, so, essentially, using column 1 as the key to check column 2.










share|improve this question























  • Your question would be clearer if you (1) showed exactly what you want the output files to look like for your example input, and (2) described (and illustrated) what you want to happen if a value in column 1 of file 1 is not present in file 2. Also, have you made any attempt to solve this yourself? Done any research? What have you found? What have you tried?
    – G-Man
    Mar 21 '17 at 18:10










  • Hi, apologies yes I have been looking into this extensively. I first tried the below but this did not give the desired result so had to play around with the columns etc awk -F, 'NR==FNRa[$1,$3]++;next (a[$1,$2])' file1.txt file2.txt I have been also looking at whether I should be doing this in SQL but that means importing on eof the files into SQL and running innerjoins which with 150 million lines can take a while. I am still investigating and will obviously share anything I find that works on this post.
    – Veyron
    Mar 22 '17 at 14:24













up vote
0
down vote

favorite









up vote
0
down vote

favorite











OK, I will try and explain what I need to do as best as possible.
Basically I have two CSV files, as per the examples below:



File 1:



Column 1, Column 2
abc , 123
def , 234
adf , 567


File 2



Column 1, Column 2
abc , 123
def , 234
adf , 578


I need to write either a shell script or simple command that will do the following:



  1. Sort both files by column 1

  2. Row by row, do the following:

    • Using column 1 in file 1, search for this value in column 1 in file 2.

      1. if found, compare the value in column 2 in file 1 against the value in column 2 of file 2

      2. if it matches, write column 1, column 2 and "Validated" in column 3 to a separate file

      3. if it does not match, write column 1, column 2 and "Failed" to a separate file



This results in two output files: the first with everything that was found in column 1 and column 2 matches, and a second file containing either column 1 lookups that failed or where column 1 was found, where column 2 did not match, so, essentially, using column 1 as the key to check column 2.










share|improve this question















OK, I will try and explain what I need to do as best as possible.
Basically I have two CSV files, as per the examples below:



File 1:



Column 1, Column 2
abc , 123
def , 234
adf , 567


File 2



Column 1, Column 2
abc , 123
def , 234
adf , 578


I need to write either a shell script or simple command that will do the following:



  1. Sort both files by column 1

  2. Row by row, do the following:

    • Using column 1 in file 1, search for this value in column 1 in file 2.

      1. if found, compare the value in column 2 in file 1 against the value in column 2 of file 2

      2. if it matches, write column 1, column 2 and "Validated" in column 3 to a separate file

      3. if it does not match, write column 1, column 2 and "Failed" to a separate file



This results in two output files: the first with everything that was found in column 1 and column 2 matches, and a second file containing either column 1 lookups that failed or where column 1 was found, where column 2 did not match, so, essentially, using column 1 as the key to check column 2.







shell-script text-processing columns file-comparison






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 21 '17 at 18:08









G-Man

11.8k92658




11.8k92658










asked Mar 21 '17 at 17:08









Veyron

11




11











  • Your question would be clearer if you (1) showed exactly what you want the output files to look like for your example input, and (2) described (and illustrated) what you want to happen if a value in column 1 of file 1 is not present in file 2. Also, have you made any attempt to solve this yourself? Done any research? What have you found? What have you tried?
    – G-Man
    Mar 21 '17 at 18:10










  • Hi, apologies yes I have been looking into this extensively. I first tried the below but this did not give the desired result so had to play around with the columns etc awk -F, 'NR==FNRa[$1,$3]++;next (a[$1,$2])' file1.txt file2.txt I have been also looking at whether I should be doing this in SQL but that means importing on eof the files into SQL and running innerjoins which with 150 million lines can take a while. I am still investigating and will obviously share anything I find that works on this post.
    – Veyron
    Mar 22 '17 at 14:24

















  • Your question would be clearer if you (1) showed exactly what you want the output files to look like for your example input, and (2) described (and illustrated) what you want to happen if a value in column 1 of file 1 is not present in file 2. Also, have you made any attempt to solve this yourself? Done any research? What have you found? What have you tried?
    – G-Man
    Mar 21 '17 at 18:10










  • Hi, apologies yes I have been looking into this extensively. I first tried the below but this did not give the desired result so had to play around with the columns etc awk -F, 'NR==FNRa[$1,$3]++;next (a[$1,$2])' file1.txt file2.txt I have been also looking at whether I should be doing this in SQL but that means importing on eof the files into SQL and running innerjoins which with 150 million lines can take a while. I am still investigating and will obviously share anything I find that works on this post.
    – Veyron
    Mar 22 '17 at 14:24
















Your question would be clearer if you (1) showed exactly what you want the output files to look like for your example input, and (2) described (and illustrated) what you want to happen if a value in column 1 of file 1 is not present in file 2. Also, have you made any attempt to solve this yourself? Done any research? What have you found? What have you tried?
– G-Man
Mar 21 '17 at 18:10




Your question would be clearer if you (1) showed exactly what you want the output files to look like for your example input, and (2) described (and illustrated) what you want to happen if a value in column 1 of file 1 is not present in file 2. Also, have you made any attempt to solve this yourself? Done any research? What have you found? What have you tried?
– G-Man
Mar 21 '17 at 18:10












Hi, apologies yes I have been looking into this extensively. I first tried the below but this did not give the desired result so had to play around with the columns etc awk -F, 'NR==FNRa[$1,$3]++;next (a[$1,$2])' file1.txt file2.txt I have been also looking at whether I should be doing this in SQL but that means importing on eof the files into SQL and running innerjoins which with 150 million lines can take a while. I am still investigating and will obviously share anything I find that works on this post.
– Veyron
Mar 22 '17 at 14:24





Hi, apologies yes I have been looking into this extensively. I first tried the below but this did not give the desired result so had to play around with the columns etc awk -F, 'NR==FNRa[$1,$3]++;next (a[$1,$2])' file1.txt file2.txt I have been also looking at whether I should be doing this in SQL but that means importing on eof the files into SQL and running innerjoins which with 150 million lines can take a while. I am still investigating and will obviously share anything I find that works on this post.
– Veyron
Mar 22 '17 at 14:24











2 Answers
2






active

oldest

votes

















up vote
0
down vote













Given the following input files:



$ cat in1 in2
Column 1, Column 2
abc , 123
def , 234
adf , 567
Column 1, Column 2
abc , 123
def , 234
adf , 578


First, we sort them; we can then stitch them together into one file:



$ sort in1 > in1.sorted; sort in2 > in2.sorted; paste in1,2.sorted
Column 1, Column 2 Column 1, Column 2
abc , 123 abc , 123
adf , 567 adf , 578
def , 234 def , 234


awk will help us here, but the commas get in our way; we can get rid of them with sed first:



$ paste in1,2.sorted | sed s/,//g
Column 1 Column 2 Column 1 Column 2
abc 123 abc 123
adf 567 adf 578
def 234 def 234


And then we can dump that through a quick awk:



$ paste in1,2.sorted | sed s/,//g | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
Column Failed
abc Validated
adf Failed
def Validated


This can also be done with raw awk, with the advantage of being able to strip out the header rows and not rely on the same data being in the same order, thus removing the need for sorting:



$ awk 'FNR != 1 && NR == FNR data[$1]=$3 FNR != 1 && NR != FNR if(data[$1]==$3) print $1, "Validated" else print $1, "Failed" ' in1,2
abc Validated
adf Failed
def Validated


This relies on a few magic awk built-in variables and tricks related to them:




  • NR - the total number of records processed


  • FNR - the total number of records in the current file processed


  • FNR != 1 - skips the first row of each file (does not treat the headers as data)


  • NR != FNR - runs only after the first file has been completely read and we have started reading subsequent files. This allows us to prepopulate the data array for testing once we start chewing on the second file.





share|improve this answer






















  • Here is a modified version as a one-liner: paste <(sort in1) <(sort in2) | tr -d , | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
    – hschou
    Mar 21 '17 at 22:23










  • Thank you, I am working on the final lists to compare and will give this a try and will let you know the end results and if I need to tweak anything. Will update you on progress.
    – Veyron
    Mar 22 '17 at 14:28










  • OK, so did a test with this. The first file is the complete source list of 15 million lines, the second file is a subset of what has been check, ~22,000 lines, end result is that only about 50 lines were validated. I am looking into another method and will post here shortly.
    – Veyron
    Mar 22 '17 at 15:48

















up vote
0
down vote













I think I have this sorted out now with the following, in case any else reads this and needs this. Thanks again.



FNR == NR
for (i = 2; i <= NF; i++) a[i,$1] = $i
b[$1];
next;

($1 in b) # check if row in file2 existed in file1
for (i = 2; i <= NF; i++)
if (a[i,$1] == $i)
printf("%s->col%d: %s vs %s: Validn", $1, i-1, a[i,$1], $i);
else
printf("%s->col%d: %s vs %s: Failuren", $1, i-1, a[i,$1], $i);

delete b[$1]; # delete entries which are processed



END
for (left in b) # look which didn't match
for (i = 2; i <= NF; i++)
printf("%s->col%d: %s vs (blank): Not Equaln", left, i-1, a[i,left])






share|improve this answer




















  • Further to my email, although I thought I had an answer, I have tested this with a couple of CSR files but it does not work. I have spent hours going through this but cannot seem to work out where the awk statement is wrong. Any help would be appreciated and thanks in advance.
    – Veyron
    Mar 27 '17 at 4:28










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f352877%2fcomparing-csv-files-to-lookup-column-1-then-check-values-in-column-2%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Given the following input files:



$ cat in1 in2
Column 1, Column 2
abc , 123
def , 234
adf , 567
Column 1, Column 2
abc , 123
def , 234
adf , 578


First, we sort them; we can then stitch them together into one file:



$ sort in1 > in1.sorted; sort in2 > in2.sorted; paste in1,2.sorted
Column 1, Column 2 Column 1, Column 2
abc , 123 abc , 123
adf , 567 adf , 578
def , 234 def , 234


awk will help us here, but the commas get in our way; we can get rid of them with sed first:



$ paste in1,2.sorted | sed s/,//g
Column 1 Column 2 Column 1 Column 2
abc 123 abc 123
adf 567 adf 578
def 234 def 234


And then we can dump that through a quick awk:



$ paste in1,2.sorted | sed s/,//g | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
Column Failed
abc Validated
adf Failed
def Validated


This can also be done with raw awk, with the advantage of being able to strip out the header rows and not rely on the same data being in the same order, thus removing the need for sorting:



$ awk 'FNR != 1 && NR == FNR data[$1]=$3 FNR != 1 && NR != FNR if(data[$1]==$3) print $1, "Validated" else print $1, "Failed" ' in1,2
abc Validated
adf Failed
def Validated


This relies on a few magic awk built-in variables and tricks related to them:




  • NR - the total number of records processed


  • FNR - the total number of records in the current file processed


  • FNR != 1 - skips the first row of each file (does not treat the headers as data)


  • NR != FNR - runs only after the first file has been completely read and we have started reading subsequent files. This allows us to prepopulate the data array for testing once we start chewing on the second file.





share|improve this answer






















  • Here is a modified version as a one-liner: paste <(sort in1) <(sort in2) | tr -d , | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
    – hschou
    Mar 21 '17 at 22:23










  • Thank you, I am working on the final lists to compare and will give this a try and will let you know the end results and if I need to tweak anything. Will update you on progress.
    – Veyron
    Mar 22 '17 at 14:28










  • OK, so did a test with this. The first file is the complete source list of 15 million lines, the second file is a subset of what has been check, ~22,000 lines, end result is that only about 50 lines were validated. I am looking into another method and will post here shortly.
    – Veyron
    Mar 22 '17 at 15:48














up vote
0
down vote













Given the following input files:



$ cat in1 in2
Column 1, Column 2
abc , 123
def , 234
adf , 567
Column 1, Column 2
abc , 123
def , 234
adf , 578


First, we sort them; we can then stitch them together into one file:



$ sort in1 > in1.sorted; sort in2 > in2.sorted; paste in1,2.sorted
Column 1, Column 2 Column 1, Column 2
abc , 123 abc , 123
adf , 567 adf , 578
def , 234 def , 234


awk will help us here, but the commas get in our way; we can get rid of them with sed first:



$ paste in1,2.sorted | sed s/,//g
Column 1 Column 2 Column 1 Column 2
abc 123 abc 123
adf 567 adf 578
def 234 def 234


And then we can dump that through a quick awk:



$ paste in1,2.sorted | sed s/,//g | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
Column Failed
abc Validated
adf Failed
def Validated


This can also be done with raw awk, with the advantage of being able to strip out the header rows and not rely on the same data being in the same order, thus removing the need for sorting:



$ awk 'FNR != 1 && NR == FNR data[$1]=$3 FNR != 1 && NR != FNR if(data[$1]==$3) print $1, "Validated" else print $1, "Failed" ' in1,2
abc Validated
adf Failed
def Validated


This relies on a few magic awk built-in variables and tricks related to them:




  • NR - the total number of records processed


  • FNR - the total number of records in the current file processed


  • FNR != 1 - skips the first row of each file (does not treat the headers as data)


  • NR != FNR - runs only after the first file has been completely read and we have started reading subsequent files. This allows us to prepopulate the data array for testing once we start chewing on the second file.





share|improve this answer






















  • Here is a modified version as a one-liner: paste <(sort in1) <(sort in2) | tr -d , | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
    – hschou
    Mar 21 '17 at 22:23










  • Thank you, I am working on the final lists to compare and will give this a try and will let you know the end results and if I need to tweak anything. Will update you on progress.
    – Veyron
    Mar 22 '17 at 14:28










  • OK, so did a test with this. The first file is the complete source list of 15 million lines, the second file is a subset of what has been check, ~22,000 lines, end result is that only about 50 lines were validated. I am looking into another method and will post here shortly.
    – Veyron
    Mar 22 '17 at 15:48












up vote
0
down vote










up vote
0
down vote









Given the following input files:



$ cat in1 in2
Column 1, Column 2
abc , 123
def , 234
adf , 567
Column 1, Column 2
abc , 123
def , 234
adf , 578


First, we sort them; we can then stitch them together into one file:



$ sort in1 > in1.sorted; sort in2 > in2.sorted; paste in1,2.sorted
Column 1, Column 2 Column 1, Column 2
abc , 123 abc , 123
adf , 567 adf , 578
def , 234 def , 234


awk will help us here, but the commas get in our way; we can get rid of them with sed first:



$ paste in1,2.sorted | sed s/,//g
Column 1 Column 2 Column 1 Column 2
abc 123 abc 123
adf 567 adf 578
def 234 def 234


And then we can dump that through a quick awk:



$ paste in1,2.sorted | sed s/,//g | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
Column Failed
abc Validated
adf Failed
def Validated


This can also be done with raw awk, with the advantage of being able to strip out the header rows and not rely on the same data being in the same order, thus removing the need for sorting:



$ awk 'FNR != 1 && NR == FNR data[$1]=$3 FNR != 1 && NR != FNR if(data[$1]==$3) print $1, "Validated" else print $1, "Failed" ' in1,2
abc Validated
adf Failed
def Validated


This relies on a few magic awk built-in variables and tricks related to them:




  • NR - the total number of records processed


  • FNR - the total number of records in the current file processed


  • FNR != 1 - skips the first row of each file (does not treat the headers as data)


  • NR != FNR - runs only after the first file has been completely read and we have started reading subsequent files. This allows us to prepopulate the data array for testing once we start chewing on the second file.





share|improve this answer














Given the following input files:



$ cat in1 in2
Column 1, Column 2
abc , 123
def , 234
adf , 567
Column 1, Column 2
abc , 123
def , 234
adf , 578


First, we sort them; we can then stitch them together into one file:



$ sort in1 > in1.sorted; sort in2 > in2.sorted; paste in1,2.sorted
Column 1, Column 2 Column 1, Column 2
abc , 123 abc , 123
adf , 567 adf , 578
def , 234 def , 234


awk will help us here, but the commas get in our way; we can get rid of them with sed first:



$ paste in1,2.sorted | sed s/,//g
Column 1 Column 2 Column 1 Column 2
abc 123 abc 123
adf 567 adf 578
def 234 def 234


And then we can dump that through a quick awk:



$ paste in1,2.sorted | sed s/,//g | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
Column Failed
abc Validated
adf Failed
def Validated


This can also be done with raw awk, with the advantage of being able to strip out the header rows and not rely on the same data being in the same order, thus removing the need for sorting:



$ awk 'FNR != 1 && NR == FNR data[$1]=$3 FNR != 1 && NR != FNR if(data[$1]==$3) print $1, "Validated" else print $1, "Failed" ' in1,2
abc Validated
adf Failed
def Validated


This relies on a few magic awk built-in variables and tricks related to them:




  • NR - the total number of records processed


  • FNR - the total number of records in the current file processed


  • FNR != 1 - skips the first row of each file (does not treat the headers as data)


  • NR != FNR - runs only after the first file has been completely read and we have started reading subsequent files. This allows us to prepopulate the data array for testing once we start chewing on the second file.






share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 21 '17 at 22:50

























answered Mar 21 '17 at 17:25









DopeGhoti

41k55080




41k55080











  • Here is a modified version as a one-liner: paste <(sort in1) <(sort in2) | tr -d , | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
    – hschou
    Mar 21 '17 at 22:23










  • Thank you, I am working on the final lists to compare and will give this a try and will let you know the end results and if I need to tweak anything. Will update you on progress.
    – Veyron
    Mar 22 '17 at 14:28










  • OK, so did a test with this. The first file is the complete source list of 15 million lines, the second file is a subset of what has been check, ~22,000 lines, end result is that only about 50 lines were validated. I am looking into another method and will post here shortly.
    – Veyron
    Mar 22 '17 at 15:48
















  • Here is a modified version as a one-liner: paste <(sort in1) <(sort in2) | tr -d , | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
    – hschou
    Mar 21 '17 at 22:23










  • Thank you, I am working on the final lists to compare and will give this a try and will let you know the end results and if I need to tweak anything. Will update you on progress.
    – Veyron
    Mar 22 '17 at 14:28










  • OK, so did a test with this. The first file is the complete source list of 15 million lines, the second file is a subset of what has been check, ~22,000 lines, end result is that only about 50 lines were validated. I am looking into another method and will post here shortly.
    – Veyron
    Mar 22 '17 at 15:48















Here is a modified version as a one-liner: paste <(sort in1) <(sort in2) | tr -d , | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
– hschou
Mar 21 '17 at 22:23




Here is a modified version as a one-liner: paste <(sort in1) <(sort in2) | tr -d , | awk '$2 == $4 print $1,"Validated"; $2 != $4 print $1,"Failed"'
– hschou
Mar 21 '17 at 22:23












Thank you, I am working on the final lists to compare and will give this a try and will let you know the end results and if I need to tweak anything. Will update you on progress.
– Veyron
Mar 22 '17 at 14:28




Thank you, I am working on the final lists to compare and will give this a try and will let you know the end results and if I need to tweak anything. Will update you on progress.
– Veyron
Mar 22 '17 at 14:28












OK, so did a test with this. The first file is the complete source list of 15 million lines, the second file is a subset of what has been check, ~22,000 lines, end result is that only about 50 lines were validated. I am looking into another method and will post here shortly.
– Veyron
Mar 22 '17 at 15:48




OK, so did a test with this. The first file is the complete source list of 15 million lines, the second file is a subset of what has been check, ~22,000 lines, end result is that only about 50 lines were validated. I am looking into another method and will post here shortly.
– Veyron
Mar 22 '17 at 15:48












up vote
0
down vote













I think I have this sorted out now with the following, in case any else reads this and needs this. Thanks again.



FNR == NR
for (i = 2; i <= NF; i++) a[i,$1] = $i
b[$1];
next;

($1 in b) # check if row in file2 existed in file1
for (i = 2; i <= NF; i++)
if (a[i,$1] == $i)
printf("%s->col%d: %s vs %s: Validn", $1, i-1, a[i,$1], $i);
else
printf("%s->col%d: %s vs %s: Failuren", $1, i-1, a[i,$1], $i);

delete b[$1]; # delete entries which are processed



END
for (left in b) # look which didn't match
for (i = 2; i <= NF; i++)
printf("%s->col%d: %s vs (blank): Not Equaln", left, i-1, a[i,left])






share|improve this answer




















  • Further to my email, although I thought I had an answer, I have tested this with a couple of CSR files but it does not work. I have spent hours going through this but cannot seem to work out where the awk statement is wrong. Any help would be appreciated and thanks in advance.
    – Veyron
    Mar 27 '17 at 4:28














up vote
0
down vote













I think I have this sorted out now with the following, in case any else reads this and needs this. Thanks again.



FNR == NR
for (i = 2; i <= NF; i++) a[i,$1] = $i
b[$1];
next;

($1 in b) # check if row in file2 existed in file1
for (i = 2; i <= NF; i++)
if (a[i,$1] == $i)
printf("%s->col%d: %s vs %s: Validn", $1, i-1, a[i,$1], $i);
else
printf("%s->col%d: %s vs %s: Failuren", $1, i-1, a[i,$1], $i);

delete b[$1]; # delete entries which are processed



END
for (left in b) # look which didn't match
for (i = 2; i <= NF; i++)
printf("%s->col%d: %s vs (blank): Not Equaln", left, i-1, a[i,left])






share|improve this answer




















  • Further to my email, although I thought I had an answer, I have tested this with a couple of CSR files but it does not work. I have spent hours going through this but cannot seem to work out where the awk statement is wrong. Any help would be appreciated and thanks in advance.
    – Veyron
    Mar 27 '17 at 4:28












up vote
0
down vote










up vote
0
down vote









I think I have this sorted out now with the following, in case any else reads this and needs this. Thanks again.



FNR == NR
for (i = 2; i <= NF; i++) a[i,$1] = $i
b[$1];
next;

($1 in b) # check if row in file2 existed in file1
for (i = 2; i <= NF; i++)
if (a[i,$1] == $i)
printf("%s->col%d: %s vs %s: Validn", $1, i-1, a[i,$1], $i);
else
printf("%s->col%d: %s vs %s: Failuren", $1, i-1, a[i,$1], $i);

delete b[$1]; # delete entries which are processed



END
for (left in b) # look which didn't match
for (i = 2; i <= NF; i++)
printf("%s->col%d: %s vs (blank): Not Equaln", left, i-1, a[i,left])






share|improve this answer












I think I have this sorted out now with the following, in case any else reads this and needs this. Thanks again.



FNR == NR
for (i = 2; i <= NF; i++) a[i,$1] = $i
b[$1];
next;

($1 in b) # check if row in file2 existed in file1
for (i = 2; i <= NF; i++)
if (a[i,$1] == $i)
printf("%s->col%d: %s vs %s: Validn", $1, i-1, a[i,$1], $i);
else
printf("%s->col%d: %s vs %s: Failuren", $1, i-1, a[i,$1], $i);

delete b[$1]; # delete entries which are processed



END
for (left in b) # look which didn't match
for (i = 2; i <= NF; i++)
printf("%s->col%d: %s vs (blank): Not Equaln", left, i-1, a[i,left])







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 23 '17 at 11:31









Veyron

11




11











  • Further to my email, although I thought I had an answer, I have tested this with a couple of CSR files but it does not work. I have spent hours going through this but cannot seem to work out where the awk statement is wrong. Any help would be appreciated and thanks in advance.
    – Veyron
    Mar 27 '17 at 4:28
















  • Further to my email, although I thought I had an answer, I have tested this with a couple of CSR files but it does not work. I have spent hours going through this but cannot seem to work out where the awk statement is wrong. Any help would be appreciated and thanks in advance.
    – Veyron
    Mar 27 '17 at 4:28















Further to my email, although I thought I had an answer, I have tested this with a couple of CSR files but it does not work. I have spent hours going through this but cannot seem to work out where the awk statement is wrong. Any help would be appreciated and thanks in advance.
– Veyron
Mar 27 '17 at 4:28




Further to my email, although I thought I had an answer, I have tested this with a couple of CSR files but it does not work. I have spent hours going through this but cannot seem to work out where the awk statement is wrong. Any help would be appreciated and thanks in advance.
– Veyron
Mar 27 '17 at 4:28

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f352877%2fcomparing-csv-files-to-lookup-column-1-then-check-values-in-column-2%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?