how to copy a column from another file when the ID's matches
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have 2 files,
head file1
1 115258827 12 HG00099
1 115258827 5 HG00100
1 115258827 8 HG00101
1 115258827 6 HG00103
1 115258827 4 HG00108
1 115258827 3 HG00110
1 115258827 4 HG00111
1 115258827 2 HG00114
1 115258827 8 HG00115
1 115258827 5 HG00116
and another file
head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0
I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like
head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
text-processing awk
add a comment |Â
up vote
0
down vote
favorite
I have 2 files,
head file1
1 115258827 12 HG00099
1 115258827 5 HG00100
1 115258827 8 HG00101
1 115258827 6 HG00103
1 115258827 4 HG00108
1 115258827 3 HG00110
1 115258827 4 HG00111
1 115258827 2 HG00114
1 115258827 8 HG00115
1 115258827 5 HG00116
and another file
head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0
I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like
head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
text-processing awk
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have 2 files,
head file1
1 115258827 12 HG00099
1 115258827 5 HG00100
1 115258827 8 HG00101
1 115258827 6 HG00103
1 115258827 4 HG00108
1 115258827 3 HG00110
1 115258827 4 HG00111
1 115258827 2 HG00114
1 115258827 8 HG00115
1 115258827 5 HG00116
and another file
head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0
I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like
head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
text-processing awk
I have 2 files,
head file1
1 115258827 12 HG00099
1 115258827 5 HG00100
1 115258827 8 HG00101
1 115258827 6 HG00103
1 115258827 4 HG00108
1 115258827 3 HG00110
1 115258827 4 HG00111
1 115258827 2 HG00114
1 115258827 8 HG00115
1 115258827 5 HG00116
and another file
head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0
I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like
head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
text-processing awk
asked May 3 at 19:11
Anna1364
421110
421110
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
Try:
awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1
With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR
(always true for first input file when there are multiple input files to read), NR
will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR
will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.
The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.
add a comment |Â
up vote
4
down vote
Assuming that the field that you want to merge/join the data on is sorted:
$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
This joins the two files on the fourth column of file1
(specified using -1 4
, since it's not the first column in that file) and the first column of file2
(these are the columns that the files needs to be sorted on).
The -o
flag tells join
what fields we'd like to see in the output, and from what file they are to be taken (1.3
means "the third field from the first file", for example, and 0
means the join field).
The benefit of using join
is that neither file has to be read into memory completely.
If the files are not sorted, you may either pre-sort them once and for all using
sort -k4 -o file1 file1
sort -o file2 file2
or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using
join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )
It's the sorted data that allows join
to only keep a few lines of each file in memory at a time.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Try:
awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1
With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR
(always true for first input file when there are multiple input files to read), NR
will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR
will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.
The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.
add a comment |Â
up vote
2
down vote
accepted
Try:
awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1
With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR
(always true for first input file when there are multiple input files to read), NR
will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR
will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.
The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Try:
awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1
With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR
(always true for first input file when there are multiple input files to read), NR
will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR
will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.
The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.
Try:
awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1
With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR
(always true for first input file when there are multiple input files to read), NR
will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR
will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.
The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.
edited May 3 at 19:32
answered May 3 at 19:16
ñÃÂsýù÷
14.8k82462
14.8k82462
add a comment |Â
add a comment |Â
up vote
4
down vote
Assuming that the field that you want to merge/join the data on is sorted:
$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
This joins the two files on the fourth column of file1
(specified using -1 4
, since it's not the first column in that file) and the first column of file2
(these are the columns that the files needs to be sorted on).
The -o
flag tells join
what fields we'd like to see in the output, and from what file they are to be taken (1.3
means "the third field from the first file", for example, and 0
means the join field).
The benefit of using join
is that neither file has to be read into memory completely.
If the files are not sorted, you may either pre-sort them once and for all using
sort -k4 -o file1 file1
sort -o file2 file2
or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using
join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )
It's the sorted data that allows join
to only keep a few lines of each file in memory at a time.
add a comment |Â
up vote
4
down vote
Assuming that the field that you want to merge/join the data on is sorted:
$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
This joins the two files on the fourth column of file1
(specified using -1 4
, since it's not the first column in that file) and the first column of file2
(these are the columns that the files needs to be sorted on).
The -o
flag tells join
what fields we'd like to see in the output, and from what file they are to be taken (1.3
means "the third field from the first file", for example, and 0
means the join field).
The benefit of using join
is that neither file has to be read into memory completely.
If the files are not sorted, you may either pre-sort them once and for all using
sort -k4 -o file1 file1
sort -o file2 file2
or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using
join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )
It's the sorted data that allows join
to only keep a few lines of each file in memory at a time.
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Assuming that the field that you want to merge/join the data on is sorted:
$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
This joins the two files on the fourth column of file1
(specified using -1 4
, since it's not the first column in that file) and the first column of file2
(these are the columns that the files needs to be sorted on).
The -o
flag tells join
what fields we'd like to see in the output, and from what file they are to be taken (1.3
means "the third field from the first file", for example, and 0
means the join field).
The benefit of using join
is that neither file has to be read into memory completely.
If the files are not sorted, you may either pre-sort them once and for all using
sort -k4 -o file1 file1
sort -o file2 file2
or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using
join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )
It's the sorted data that allows join
to only keep a few lines of each file in memory at a time.
Assuming that the field that you want to merge/join the data on is sorted:
$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0
This joins the two files on the fourth column of file1
(specified using -1 4
, since it's not the first column in that file) and the first column of file2
(these are the columns that the files needs to be sorted on).
The -o
flag tells join
what fields we'd like to see in the output, and from what file they are to be taken (1.3
means "the third field from the first file", for example, and 0
means the join field).
The benefit of using join
is that neither file has to be read into memory completely.
If the files are not sorted, you may either pre-sort them once and for all using
sort -k4 -o file1 file1
sort -o file2 file2
or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using
join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )
It's the sorted data that allows join
to only keep a few lines of each file in memory at a time.
edited May 3 at 19:42
answered May 3 at 19:31
Kusalananda
102k13199316
102k13199316
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f441638%2fhow-to-copy-a-column-from-another-file-when-the-ids-matches%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password