how to copy a column from another file when the ID's matches

up vote
0
down vote

favorite

I have 2 files,

head file1

 1 115258827 12 HG00099
 1 115258827 5 HG00100
 1 115258827 8 HG00101
 1 115258827 6 HG00103
 1 115258827 4 HG00108
 1 115258827 3 HG00110
 1 115258827 4 HG00111
 1 115258827 2 HG00114
 1 115258827 8 HG00115
 1 115258827 5 HG00116

and another file

head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0

I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like

head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

asked May 3 at 19:11

Anna1364

421110

add a commentÂ |Â

up vote
0
down vote

favorite

I have 2 files,

head file1

 1 115258827 12 HG00099
 1 115258827 5 HG00100
 1 115258827 8 HG00101
 1 115258827 6 HG00103
 1 115258827 4 HG00108
 1 115258827 3 HG00110
 1 115258827 4 HG00111
 1 115258827 2 HG00114
 1 115258827 8 HG00115
 1 115258827 5 HG00116

and another file

head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0

I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like

head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

asked May 3 at 19:11

Anna1364

421110

add a commentÂ |Â

up vote
0
down vote

favorite

I have 2 files,

head file1

 1 115258827 12 HG00099
 1 115258827 5 HG00100
 1 115258827 8 HG00101
 1 115258827 6 HG00103
 1 115258827 4 HG00108
 1 115258827 3 HG00110
 1 115258827 4 HG00111
 1 115258827 2 HG00114
 1 115258827 8 HG00115
 1 115258827 5 HG00116

and another file

head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0

I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like

head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

asked May 3 at 19:11

Anna1364

421110

I have 2 files,

head file1

 1 115258827 12 HG00099
 1 115258827 5 HG00100
 1 115258827 8 HG00101
 1 115258827 6 HG00103
 1 115258827 4 HG00108
 1 115258827 3 HG00110
 1 115258827 4 HG00111
 1 115258827 2 HG00114
 1 115258827 8 HG00115
 1 115258827 5 HG00116

and another file

head file2
HG00096 0|0
HG00097 0|0
HG00099 0|0
HG00100 0|1
HG00101 0|0
HG00102 0|0
HG00103 0|0
HG00105 0|0
HG00106 0|0
HG00107 0|0

I want to check if the last column of file1 matches the first column of file2, add the second column of file 2 to file1, ending up with something like

head desired
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

asked May 3 at 19:11

Anna1364

421110

asked May 3 at 19:11

Anna1364

421110

asked May 3 at 19:11

Anna1364

421110

asked May 3 at 19:11

Anna1364

421110

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Try:

awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1

With the key of column$1 save the corresponding value of column$2 into an associated array called seen when awk reads only from file2 where NR==FNR (always true for first input file when there are multiple input files to read), NR will set to 1 on first record/line read by awk and incrementing until all records/lines read either if single input file or multiple files; FNR will set to 1 on first record/line read by awk and incrementing until all records/lines read in current input file and will reset back to 1 for the next file.

The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.

edited May 3 at 19:32

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
4
down vote

Assuming that the field that you want to merge/join the data on is sorted:

$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

This joins the two files on the fourth column of file1 (specified using -1 4, since it's not the first column in that file) and the first column of file2 (these are the columns that the files needs to be sorted on).

The -o flag tells join what fields we'd like to see in the output, and from what file they are to be taken (1.3 means "the third field from the first file", for example, and 0 means the join field).

The benefit of using join is that neither file has to be read into memory completely.

If the files are not sorted, you may either pre-sort them once and for all using

sort -k4 -o file1 file1
sort -o file2 file2

or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using

join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )

It's the sorted data that allows join to only keep a few lines of each file in memory at a time.

edited May 3 at 19:42

answered May 3 at 19:31

Kusalananda

102k13199316

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f441638%2fhow-to-copy-a-column-from-another-file-when-the-ids-matches%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Try:

awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1

The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.

edited May 3 at 19:32

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
2
down vote

accepted

Try:

awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1

The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.

edited May 3 at 19:32

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
2
down vote

accepted

Try:

awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1

The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.

edited May 3 at 19:32

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

Try:

awk 'FNR==NRseen[$1]=$2; next seen[$NF]print $0, seen[$NF]' file2 file1

The next block if the value of last column matched with the same key value in array seen, then print the entire line from file1 and value of same key in array.

edited May 3 at 19:32

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

edited May 3 at 19:32

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

answered May 3 at 19:16

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
4
down vote

Assuming that the field that you want to merge/join the data on is sorted:

$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

The benefit of using join is that neither file has to be read into memory completely.

If the files are not sorted, you may either pre-sort them once and for all using

sort -k4 -o file1 file1
sort -o file2 file2

or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using

join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )

It's the sorted data that allows join to only keep a few lines of each file in memory at a time.

edited May 3 at 19:42

answered May 3 at 19:31

Kusalananda

102k13199316

add a commentÂ |Â

up vote
4
down vote

Assuming that the field that you want to merge/join the data on is sorted:

$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

The benefit of using join is that neither file has to be read into memory completely.

If the files are not sorted, you may either pre-sort them once and for all using

sort -k4 -o file1 file1
sort -o file2 file2

or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using

join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )

It's the sorted data that allows join to only keep a few lines of each file in memory at a time.

edited May 3 at 19:42

answered May 3 at 19:31

Kusalananda

102k13199316

add a commentÂ |Â

up vote
4
down vote

Assuming that the field that you want to merge/join the data on is sorted:

$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

The benefit of using join is that neither file has to be read into memory completely.

If the files are not sorted, you may either pre-sort them once and for all using

sort -k4 -o file1 file1
sort -o file2 file2

or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using

join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )

It's the sorted data that allows join to only keep a few lines of each file in memory at a time.

edited May 3 at 19:42

answered May 3 at 19:31

Kusalananda

102k13199316

Assuming that the field that you want to merge/join the data on is sorted:

$ join -1 4 -o1.1,1.2,1.3,0,2.2 file1 file2
1 115258827 12 HG00099 0|0
1 115258827 5 HG00100 0|1
1 115258827 8 HG00101 0|0
1 115258827 6 HG00103 0|0

The benefit of using join is that neither file has to be read into memory completely.

If the files are not sorted, you may either pre-sort them once and for all using

sort -k4 -o file1 file1
sort -o file2 file2

or, if you're using a shell that understands process substitution, you may sort them at the same time as you do the join using

join -1 4 -o1.1,1.2,1.3,0,2.2 <( sort -k4 file1 ) <( sort file2 )

It's the sorted data that allows join to only keep a few lines of each file in memory at a time.

edited May 3 at 19:42

answered May 3 at 19:31

Kusalananda

102k13199316

edited May 3 at 19:42

answered May 3 at 19:31

Kusalananda

102k13199316

answered May 3 at 19:31

Kusalananda

102k13199316

answered May 3 at 19:31

Kusalananda

102k13199316

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu