How can I merge the lines of two files by having common headers?
Clash Royale CLAN TAG#URR8PPP
up vote
8
down vote
favorite
I want to merge two files based on the common data present in them as header.
Following is the example
File1
>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g
File 2
>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r
And here's the kind of output I want:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
I have tried some awk and sed but clearly have not been successful, how can I do this?
text-processing awk sed
add a comment |Â
up vote
8
down vote
favorite
I want to merge two files based on the common data present in them as header.
Following is the example
File1
>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g
File 2
>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r
And here's the kind of output I want:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
I have tried some awk and sed but clearly have not been successful, how can I do this?
text-processing awk sed
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
I want to merge two files based on the common data present in them as header.
Following is the example
File1
>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g
File 2
>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r
And here's the kind of output I want:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
I have tried some awk and sed but clearly have not been successful, how can I do this?
text-processing awk sed
I want to merge two files based on the common data present in them as header.
Following is the example
File1
>Feature scaffold1
1 100 g
101 200 g
201 300 g
>Feature scaffold2
1 100 g
01 500 g
>Feature scaffold3
10 500 g
>Feature scaffold4
10 300 g
File 2
>Feature scaffold1
500 500 r
900 1000 r
>Feature scaffold2
200 300 r
>Feature scaffold3
100 200 r
>Feature scaffold4
500 600 r
>Feature scaffold5
1 1000 r
And here's the kind of output I want:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
I have tried some awk and sed but clearly have not been successful, how can I do this?
text-processing awk sed
edited Jan 4 at 9:52
ñÃÂsýù÷
15.3k92462
15.3k92462
asked Jan 4 at 8:00
Namrata Patel
484
484
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
7
down vote
accepted
Awk
solution:
awk '/^>/ k=$1 FS $2
NR==FNR
if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
k in a
print $0 ORS a[k]; delete a[k]; next
1' file1 file2
/^>/ k=$1 FS $2
- on encountering header line(i.e.>Feature ...
) - compose a keyk
from the 1st$1
and 2nd$2
fieldsNR==FNR ...
- processing the 1st input file (file1
):if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0
- accumulate non-header lines into arraya
using current keyk
next
- jump to next record
k in a
- if current key based onfile2
record is in arraya
(based onfile1
records):print $0 ORS a[k]
- print related recordsdelete a[k]
- delete processed item(s)
The output:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
add a comment |Â
up vote
4
down vote
Another approach and to make it simpler.
grep -v '^scaffold' <(awk -v RS='>Feature '
'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
accepted
Awk
solution:
awk '/^>/ k=$1 FS $2
NR==FNR
if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
k in a
print $0 ORS a[k]; delete a[k]; next
1' file1 file2
/^>/ k=$1 FS $2
- on encountering header line(i.e.>Feature ...
) - compose a keyk
from the 1st$1
and 2nd$2
fieldsNR==FNR ...
- processing the 1st input file (file1
):if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0
- accumulate non-header lines into arraya
using current keyk
next
- jump to next record
k in a
- if current key based onfile2
record is in arraya
(based onfile1
records):print $0 ORS a[k]
- print related recordsdelete a[k]
- delete processed item(s)
The output:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
add a comment |Â
up vote
7
down vote
accepted
Awk
solution:
awk '/^>/ k=$1 FS $2
NR==FNR
if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
k in a
print $0 ORS a[k]; delete a[k]; next
1' file1 file2
/^>/ k=$1 FS $2
- on encountering header line(i.e.>Feature ...
) - compose a keyk
from the 1st$1
and 2nd$2
fieldsNR==FNR ...
- processing the 1st input file (file1
):if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0
- accumulate non-header lines into arraya
using current keyk
next
- jump to next record
k in a
- if current key based onfile2
record is in arraya
(based onfile1
records):print $0 ORS a[k]
- print related recordsdelete a[k]
- delete processed item(s)
The output:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
add a comment |Â
up vote
7
down vote
accepted
up vote
7
down vote
accepted
Awk
solution:
awk '/^>/ k=$1 FS $2
NR==FNR
if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
k in a
print $0 ORS a[k]; delete a[k]; next
1' file1 file2
/^>/ k=$1 FS $2
- on encountering header line(i.e.>Feature ...
) - compose a keyk
from the 1st$1
and 2nd$2
fieldsNR==FNR ...
- processing the 1st input file (file1
):if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0
- accumulate non-header lines into arraya
using current keyk
next
- jump to next record
k in a
- if current key based onfile2
record is in arraya
(based onfile1
records):print $0 ORS a[k]
- print related recordsdelete a[k]
- delete processed item(s)
The output:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
Awk
solution:
awk '/^>/ k=$1 FS $2
NR==FNR
if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
k in a
print $0 ORS a[k]; delete a[k]; next
1' file1 file2
/^>/ k=$1 FS $2
- on encountering header line(i.e.>Feature ...
) - compose a keyk
from the 1st$1
and 2nd$2
fieldsNR==FNR ...
- processing the 1st input file (file1
):if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0
- accumulate non-header lines into arraya
using current keyk
next
- jump to next record
k in a
- if current key based onfile2
record is in arraya
(based onfile1
records):print $0 ORS a[k]
- print related recordsdelete a[k]
- delete processed item(s)
The output:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
edited Jan 4 at 15:40
answered Jan 4 at 8:30
RomanPerekhrest
22.4k12145
22.4k12145
add a comment |Â
add a comment |Â
up vote
4
down vote
Another approach and to make it simpler.
grep -v '^scaffold' <(awk -v RS='>Feature '
'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])
add a comment |Â
up vote
4
down vote
Another approach and to make it simpler.
grep -v '^scaffold' <(awk -v RS='>Feature '
'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Another approach and to make it simpler.
grep -v '^scaffold' <(awk -v RS='>Feature '
'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])
Another approach and to make it simpler.
grep -v '^scaffold' <(awk -v RS='>Feature '
'NFs[$1]=s[$1]$0 ENDfor (x in s)print RS""s[x]' file[12])
answered Jan 4 at 9:46
ñÃÂsýù÷
15.3k92462
15.3k92462
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414718%2fhow-can-i-merge-the-lines-of-two-files-by-having-common-headers%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password