How can I merge the lines of two files by having common headers?

up vote
8
down vote

favorite

I want to merge two files based on the common data present in them as header.

Following is the example

File1

>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g

File 2

>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r

And here's the kind of output I want:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

I have tried some awk and sed but clearly have not been successful, how can I do this?

edited Jan 4 at 9:52

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

asked Jan 4 at 8:00

Namrata Patel

484

add a commentÂ |Â

up vote
8
down vote

favorite

I want to merge two files based on the common data present in them as header.

Following is the example

File1

>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g

File 2

>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r

And here's the kind of output I want:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

I have tried some awk and sed but clearly have not been successful, how can I do this?

edited Jan 4 at 9:52

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

asked Jan 4 at 8:00

Namrata Patel

484

add a commentÂ |Â

up vote
8
down vote

favorite

I want to merge two files based on the common data present in them as header.

Following is the example

File1

>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g

File 2

>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r

And here's the kind of output I want:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

I have tried some awk and sed but clearly have not been successful, how can I do this?

edited Jan 4 at 9:52

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

asked Jan 4 at 8:00

Namrata Patel

484

I want to merge two files based on the common data present in them as header.

Following is the example

File1

>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g

File 2

>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r

And here's the kind of output I want:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

I have tried some awk and sed but clearly have not been successful, how can I do this?

edited Jan 4 at 9:52

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

asked Jan 4 at 8:00

Namrata Patel

484

edited Jan 4 at 9:52

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

edited Jan 4 at 9:52

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

edited Jan 4 at 9:52

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

asked Jan 4 at 8:00

Namrata Patel

484

asked Jan 4 at 8:00

Namrata Patel

484

asked Jan 4 at 8:00

Namrata Patel

484

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
7
down vote

accepted

Awk solution:

awk '/^>/ k=$1 FS $2 
 NR==FNR 
 if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
 
 k in a 
 print $0 ORS a[k]; delete a[k]; next 
 1' file1 file2

/^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields

NR==FNR ... - processing the 1st input file (file1):
- if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k
- next - jump to next record

k in a - if current key based on file2 record is in array a(based on file1 records):
- print $0 ORS a[k] - print related records
- delete a[k] - delete processed item(s)

The output:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

edited Jan 4 at 15:40

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
4
down vote

Another approach and to make it simpler.

grep -v '^scaffold' <(awk -v RS='>Feature ' 
 'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414718%2fhow-can-i-merge-the-lines-of-two-files-by-having-common-headers%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
7
down vote

accepted

Awk solution:

awk '/^>/ k=$1 FS $2 
 NR==FNR 
 if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
 
 k in a 
 print $0 ORS a[k]; delete a[k]; next 
 1' file1 file2

/^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields

NR==FNR ... - processing the 1st input file (file1):
- if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k
- next - jump to next record

k in a - if current key based on file2 record is in array a(based on file1 records):
- print $0 ORS a[k] - print related records
- delete a[k] - delete processed item(s)

The output:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

edited Jan 4 at 15:40

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
7
down vote

accepted

Awk solution:

awk '/^>/ k=$1 FS $2 
 NR==FNR 
 if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
 
 k in a 
 print $0 ORS a[k]; delete a[k]; next 
 1' file1 file2

/^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields

NR==FNR ... - processing the 1st input file (file1):
- if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k
- next - jump to next record

k in a - if current key based on file2 record is in array a(based on file1 records):
- print $0 ORS a[k] - print related records
- delete a[k] - delete processed item(s)

The output:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

edited Jan 4 at 15:40

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
7
down vote

accepted

Awk solution:

awk '/^>/ k=$1 FS $2 
 NR==FNR 
 if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
 
 k in a 
 print $0 ORS a[k]; delete a[k]; next 
 1' file1 file2

/^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields

NR==FNR ... - processing the 1st input file (file1):
- if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k
- next - jump to next record

k in a - if current key based on file2 record is in array a(based on file1 records):
- print $0 ORS a[k] - print related records
- delete a[k] - delete processed item(s)

The output:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

edited Jan 4 at 15:40

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

Awk solution:

awk '/^>/ k=$1 FS $2 
 NR==FNR 
 if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
 
 k in a 
 print $0 ORS a[k]; delete a[k]; next 
 1' file1 file2

/^>/ k=$1 FS $2 - on encountering header line(i.e. >Feature ...) - compose a key k from the 1st $1 and 2nd $2 fields

NR==FNR ... - processing the 1st input file (file1):
- if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0 - accumulate non-header lines into array a using current key k
- next - jump to next record

k in a - if current key based on file2 record is in array a(based on file1 records):
- print $0 ORS a[k] - print related records
- delete a[k] - delete processed item(s)

The output:

>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r

edited Jan 4 at 15:40

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

edited Jan 4 at 15:40

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

answered Jan 4 at 8:30

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
4
down vote

Another approach and to make it simpler.

grep -v '^scaffold' <(awk -v RS='>Feature ' 
 'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

add a commentÂ |Â

up vote
4
down vote

Another approach and to make it simpler.

grep -v '^scaffold' <(awk -v RS='>Feature ' 
 'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

add a commentÂ |Â

up vote
4
down vote

Another approach and to make it simpler.

grep -v '^scaffold' <(awk -v RS='>Feature ' 
 'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

Another approach and to make it simpler.

grep -v '^scaffold' <(awk -v RS='>Feature ' 
 'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

answered Jan 4 at 9:46

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.3k92462

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu