Collecting specific genome data from a file and collect it in the same title

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












I have genomes data in a file, genomes-seq.txt. The titles of the sequences begin with >, and then the genome name:



>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


I want to collect the similar data for genome.1 in one file so it looks like this:



>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc


But every time I do it using sed I get:



>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


That is, multiple genome.1s. How can I do it correctly so on large data set I don't need to remove all the repetitions?










share|improve this question









New contributor




paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 2




    Hi @paul, what is your sed command that you used?
    – Goro
    10 hours ago











  • I tried but it didn't work
    – paul
    9 hours ago






  • 2




    Show what you tried and we can help fix your errors.
    – glenn jackman
    9 hours ago






  • 1




    Re "...every time I do it using sed...": You ought to include the sed line in the question. Otherwise, this amounts to a work order (using this site as a script-writing service).
    – Peter Mortensen
    7 hours ago











  • The file format is FASTA.
    – Peter Mortensen
    7 hours ago














up vote
6
down vote

favorite












I have genomes data in a file, genomes-seq.txt. The titles of the sequences begin with >, and then the genome name:



>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


I want to collect the similar data for genome.1 in one file so it looks like this:



>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc


But every time I do it using sed I get:



>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


That is, multiple genome.1s. How can I do it correctly so on large data set I don't need to remove all the repetitions?










share|improve this question









New contributor




paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 2




    Hi @paul, what is your sed command that you used?
    – Goro
    10 hours ago











  • I tried but it didn't work
    – paul
    9 hours ago






  • 2




    Show what you tried and we can help fix your errors.
    – glenn jackman
    9 hours ago






  • 1




    Re "...every time I do it using sed...": You ought to include the sed line in the question. Otherwise, this amounts to a work order (using this site as a script-writing service).
    – Peter Mortensen
    7 hours ago











  • The file format is FASTA.
    – Peter Mortensen
    7 hours ago












up vote
6
down vote

favorite









up vote
6
down vote

favorite











I have genomes data in a file, genomes-seq.txt. The titles of the sequences begin with >, and then the genome name:



>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


I want to collect the similar data for genome.1 in one file so it looks like this:



>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc


But every time I do it using sed I get:



>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


That is, multiple genome.1s. How can I do it correctly so on large data set I don't need to remove all the repetitions?










share|improve this question









New contributor




paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have genomes data in a file, genomes-seq.txt. The titles of the sequences begin with >, and then the genome name:



>genome.1
atcg
atcg
atcggtc

>genome.2
atct
tgcgtgctt
attttt

>genome.
sdkf
sdf;ksdf
sdlfkjdslc
edsfsfv

>genome.3
as;ldkhaskjd
asdkljdsl
asdkljasdk;l

>genome.4
ekjfhdhsa
dsfkjskajd
asdknasd


>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.234
efijhusidh
siduhygfhuji

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


I want to collect the similar data for genome.1 in one file so it looks like this:



>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc


But every time I do it using sed I get:



>genome.1
atcg
atcg
atcggtc

>genome.1
iruuwi
sdkljbh
sdfljnsdl

>genome.1
ljhdcj
sdljhsdil
fweusfhygc


That is, multiple genome.1s. How can I do it correctly so on large data set I don't need to remove all the repetitions?







bash






share|improve this question









New contributor




paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 5 hours ago









Peter Mortensen

82758




82758






New contributor




paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 10 hours ago









paul

333




333




New contributor




paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






paul is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 2




    Hi @paul, what is your sed command that you used?
    – Goro
    10 hours ago











  • I tried but it didn't work
    – paul
    9 hours ago






  • 2




    Show what you tried and we can help fix your errors.
    – glenn jackman
    9 hours ago






  • 1




    Re "...every time I do it using sed...": You ought to include the sed line in the question. Otherwise, this amounts to a work order (using this site as a script-writing service).
    – Peter Mortensen
    7 hours ago











  • The file format is FASTA.
    – Peter Mortensen
    7 hours ago












  • 2




    Hi @paul, what is your sed command that you used?
    – Goro
    10 hours ago











  • I tried but it didn't work
    – paul
    9 hours ago






  • 2




    Show what you tried and we can help fix your errors.
    – glenn jackman
    9 hours ago






  • 1




    Re "...every time I do it using sed...": You ought to include the sed line in the question. Otherwise, this amounts to a work order (using this site as a script-writing service).
    – Peter Mortensen
    7 hours ago











  • The file format is FASTA.
    – Peter Mortensen
    7 hours ago







2




2




Hi @paul, what is your sed command that you used?
– Goro
10 hours ago





Hi @paul, what is your sed command that you used?
– Goro
10 hours ago













I tried but it didn't work
– paul
9 hours ago




I tried but it didn't work
– paul
9 hours ago




2




2




Show what you tried and we can help fix your errors.
– glenn jackman
9 hours ago




Show what you tried and we can help fix your errors.
– glenn jackman
9 hours ago




1




1




Re "...every time I do it using sed...": You ought to include the sed line in the question. Otherwise, this amounts to a work order (using this site as a script-writing service).
– Peter Mortensen
7 hours ago





Re "...every time I do it using sed...": You ought to include the sed line in the question. Otherwise, this amounts to a work order (using this site as a script-writing service).
– Peter Mortensen
7 hours ago













The file format is FASTA.
– Peter Mortensen
7 hours ago




The file format is FASTA.
– Peter Mortensen
7 hours ago










3 Answers
3






active

oldest

votes

















up vote
10
down vote



accepted










$sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

>genome.1
atcg
atcggtc

iruuwi
sdkljbh
sdfljnsdl
ljhdcj
sdljhsdil
fweusfhygc


genome.1 is the key word, change depending on the list you would like to generate.






share|improve this answer






















  • Hi Goro, can I be your sed friend? Where can I find such sed knowledge?
    – schweik
    8 hours ago










  • 1) > matches end-of-word, not the literal character >. 2) Please use . to escape the literal dot in the regular expression that is supposed to match >genome.1. 3) You should also anchor it at the line start and end to avoid false matches: /^>genome.1$/. 4) The -r flag on the first sed command is not required but harmless since you don't use any characters affected by it. 5) As a rule of thumb you should escape sed commands provided through a shell to avoid easily overlooked issues.
    – David Foerster
    6 hours ago


















up vote
6
down vote













With perl



perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file





share|improve this answer



























    up vote
    0
    down vote













    With Awk:




    if (/^>/)
    in_section = 0;
    if ($0 == ">genome.1")
    in_section = 1;
    if (!section_count++)
    print;
    else if (in_section)
    print;



    Usage:



    awk ' if (/^>/) in_section = 0; if ($0 == ">genome.1") in_section = 1; if (!section_count++) print; else if (in_section) print; ' genome.txt





    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );






      paul is a new contributor. Be nice, and check out our Code of Conduct.









       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f474268%2fcollecting-specific-genome-data-from-a-file-and-collect-it-in-the-same-title%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      10
      down vote



      accepted










      $sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

      >genome.1
      atcg
      atcggtc

      iruuwi
      sdkljbh
      sdfljnsdl
      ljhdcj
      sdljhsdil
      fweusfhygc


      genome.1 is the key word, change depending on the list you would like to generate.






      share|improve this answer






















      • Hi Goro, can I be your sed friend? Where can I find such sed knowledge?
        – schweik
        8 hours ago










      • 1) > matches end-of-word, not the literal character >. 2) Please use . to escape the literal dot in the regular expression that is supposed to match >genome.1. 3) You should also anchor it at the line start and end to avoid false matches: /^>genome.1$/. 4) The -r flag on the first sed command is not required but harmless since you don't use any characters affected by it. 5) As a rule of thumb you should escape sed commands provided through a shell to avoid easily overlooked issues.
        – David Foerster
        6 hours ago















      up vote
      10
      down vote



      accepted










      $sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

      >genome.1
      atcg
      atcggtc

      iruuwi
      sdkljbh
      sdfljnsdl
      ljhdcj
      sdljhsdil
      fweusfhygc


      genome.1 is the key word, change depending on the list you would like to generate.






      share|improve this answer






















      • Hi Goro, can I be your sed friend? Where can I find such sed knowledge?
        – schweik
        8 hours ago










      • 1) > matches end-of-word, not the literal character >. 2) Please use . to escape the literal dot in the regular expression that is supposed to match >genome.1. 3) You should also anchor it at the line start and end to avoid false matches: /^>genome.1$/. 4) The -r flag on the first sed command is not required but harmless since you don't use any characters affected by it. 5) As a rule of thumb you should escape sed commands provided through a shell to avoid easily overlooked issues.
        – David Foerster
        6 hours ago













      up vote
      10
      down vote



      accepted







      up vote
      10
      down vote



      accepted






      $sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

      >genome.1
      atcg
      atcggtc

      iruuwi
      sdkljbh
      sdfljnsdl
      ljhdcj
      sdljhsdil
      fweusfhygc


      genome.1 is the key word, change depending on the list you would like to generate.






      share|improve this answer














      $sed -nr />genome.1/,/^$/p file | sed '2,$/^>genome.1$/d'

      >genome.1
      atcg
      atcggtc

      iruuwi
      sdkljbh
      sdfljnsdl
      ljhdcj
      sdljhsdil
      fweusfhygc


      genome.1 is the key word, change depending on the list you would like to generate.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 9 hours ago

























      answered 9 hours ago









      Goro

      8,14153878




      8,14153878











      • Hi Goro, can I be your sed friend? Where can I find such sed knowledge?
        – schweik
        8 hours ago










      • 1) > matches end-of-word, not the literal character >. 2) Please use . to escape the literal dot in the regular expression that is supposed to match >genome.1. 3) You should also anchor it at the line start and end to avoid false matches: /^>genome.1$/. 4) The -r flag on the first sed command is not required but harmless since you don't use any characters affected by it. 5) As a rule of thumb you should escape sed commands provided through a shell to avoid easily overlooked issues.
        – David Foerster
        6 hours ago

















      • Hi Goro, can I be your sed friend? Where can I find such sed knowledge?
        – schweik
        8 hours ago










      • 1) > matches end-of-word, not the literal character >. 2) Please use . to escape the literal dot in the regular expression that is supposed to match >genome.1. 3) You should also anchor it at the line start and end to avoid false matches: /^>genome.1$/. 4) The -r flag on the first sed command is not required but harmless since you don't use any characters affected by it. 5) As a rule of thumb you should escape sed commands provided through a shell to avoid easily overlooked issues.
        – David Foerster
        6 hours ago
















      Hi Goro, can I be your sed friend? Where can I find such sed knowledge?
      – schweik
      8 hours ago




      Hi Goro, can I be your sed friend? Where can I find such sed knowledge?
      – schweik
      8 hours ago












      1) > matches end-of-word, not the literal character >. 2) Please use . to escape the literal dot in the regular expression that is supposed to match >genome.1. 3) You should also anchor it at the line start and end to avoid false matches: /^>genome.1$/. 4) The -r flag on the first sed command is not required but harmless since you don't use any characters affected by it. 5) As a rule of thumb you should escape sed commands provided through a shell to avoid easily overlooked issues.
      – David Foerster
      6 hours ago





      1) > matches end-of-word, not the literal character >. 2) Please use . to escape the literal dot in the regular expression that is supposed to match >genome.1. 3) You should also anchor it at the line start and end to avoid false matches: /^>genome.1$/. 4) The -r flag on the first sed command is not required but harmless since you don't use any characters affected by it. 5) As a rule of thumb you should escape sed commands provided through a shell to avoid easily overlooked issues.
      – David Foerster
      6 hours ago













      up vote
      6
      down vote













      With perl



      perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file





      share|improve this answer
























        up vote
        6
        down vote













        With perl



        perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file





        share|improve this answer






















          up vote
          6
          down vote










          up vote
          6
          down vote









          With perl



          perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file





          share|improve this answer












          With perl



          perl -00 -ne 'if (/^>genome.1n/) s/// if $. > 1; print' file






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 9 hours ago









          glenn jackman

          48.6k366105




          48.6k366105




















              up vote
              0
              down vote













              With Awk:




              if (/^>/)
              in_section = 0;
              if ($0 == ">genome.1")
              in_section = 1;
              if (!section_count++)
              print;
              else if (in_section)
              print;



              Usage:



              awk ' if (/^>/) in_section = 0; if ($0 == ">genome.1") in_section = 1; if (!section_count++) print; else if (in_section) print; ' genome.txt





              share|improve this answer
























                up vote
                0
                down vote













                With Awk:




                if (/^>/)
                in_section = 0;
                if ($0 == ">genome.1")
                in_section = 1;
                if (!section_count++)
                print;
                else if (in_section)
                print;



                Usage:



                awk ' if (/^>/) in_section = 0; if ($0 == ">genome.1") in_section = 1; if (!section_count++) print; else if (in_section) print; ' genome.txt





                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  With Awk:




                  if (/^>/)
                  in_section = 0;
                  if ($0 == ">genome.1")
                  in_section = 1;
                  if (!section_count++)
                  print;
                  else if (in_section)
                  print;



                  Usage:



                  awk ' if (/^>/) in_section = 0; if ($0 == ">genome.1") in_section = 1; if (!section_count++) print; else if (in_section) print; ' genome.txt





                  share|improve this answer












                  With Awk:




                  if (/^>/)
                  in_section = 0;
                  if ($0 == ">genome.1")
                  in_section = 1;
                  if (!section_count++)
                  print;
                  else if (in_section)
                  print;



                  Usage:



                  awk ' if (/^>/) in_section = 0; if ($0 == ">genome.1") in_section = 1; if (!section_count++) print; else if (in_section) print; ' genome.txt






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 6 hours ago









                  David Foerster

                  948616




                  948616




















                      paul is a new contributor. Be nice, and check out our Code of Conduct.









                       

                      draft saved


                      draft discarded


















                      paul is a new contributor. Be nice, and check out our Code of Conduct.












                      paul is a new contributor. Be nice, and check out our Code of Conduct.











                      paul is a new contributor. Be nice, and check out our Code of Conduct.













                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f474268%2fcollecting-specific-genome-data-from-a-file-and-collect-it-in-the-same-title%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay