Text Processing - Get 2 lines with exact text between them

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.



I am processing input like this:



Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End


, and my desired output is this:



Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End


I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.



I am running Ubuntu 17.10.



Looking forward to any help.



edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.







share|improve this question


























    up vote
    1
    down vote

    favorite












    I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.



    I am processing input like this:



    Server1:Start
    Server1:End
    Server2:Start
    Disk1
    Disk2
    Server2:End
    Server3:Start
    Disk1
    Server3:End


    , and my desired output is this:



    Server2:Start
    Disk1
    Disk2
    Server2:End
    Server3:Start
    Disk1
    Server3:End


    I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.



    I am running Ubuntu 17.10.



    Looking forward to any help.



    edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.







    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.



      I am processing input like this:



      Server1:Start
      Server1:End
      Server2:Start
      Disk1
      Disk2
      Server2:End
      Server3:Start
      Disk1
      Server3:End


      , and my desired output is this:



      Server2:Start
      Disk1
      Disk2
      Server2:End
      Server3:Start
      Disk1
      Server3:End


      I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.



      I am running Ubuntu 17.10.



      Looking forward to any help.



      edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.







      share|improve this question














      I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.



      I am processing input like this:



      Server1:Start
      Server1:End
      Server2:Start
      Disk1
      Disk2
      Server2:End
      Server3:Start
      Disk1
      Server3:End


      , and my desired output is this:



      Server2:Start
      Disk1
      Disk2
      Server2:End
      Server3:Start
      Disk1
      Server3:End


      I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.



      I am running Ubuntu 17.10.



      Looking forward to any help.



      edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.









      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 14 at 21:46









      don_crissti

      46.6k15124153




      46.6k15124153










      asked Jan 14 at 18:19









      mikro45

      83




      83




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          To delete back-to-back Start and End lines, this should do in GNU sed:



          $ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input


          if we see Start, load the next line with N, then see if the contents of the buffer are just Somename:StartnSomename:End with Somename same on both lines (n is a newline). If so, delete it. Here, 1 is a reference to the first group within (..), and matches the same string that was encountered there. .* just means any number (*) of any characters (.).



          Using sed -e '/Start/,/End/d' would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start and End, so everything is deleted.






          share|improve this answer






















          • Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
            – mikro45
            Jan 14 at 20:44










          • @mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
            – ilkkachu
            Jan 14 at 21:01










          • It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
            – mikro45
            Jan 14 at 21:33










          • @mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
            – ilkkachu
            Jan 14 at 21:44

















          up vote
          1
          down vote













          another solution, as I like trying to do these in awk.



          BEGIN 
          RS="Endn"
          ORS="Endn"

          NF > 2


          using the built in RS or record separator variable, awk will treat between each Endn as a record, and presuming that the servername:Start and servername:End are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2 line. If this is true, the whole line will be printed, with Endn used as the output record separator (ORS)



          ~$>echo '
          Server1:Start
          Server1:End
          Server2:Start
          Disk1
          Disk2
          Server2:End
          Server3:Start
          Disk1
          Server3:End
          ' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
          Server2:Start
          Disk1
          Disk2
          Server2:End
          Server3:Start
          Disk1
          Server3:End





          share|improve this answer




















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "106"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f417080%2ftext-processing-get-2-lines-with-exact-text-between-them%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote



            accepted










            To delete back-to-back Start and End lines, this should do in GNU sed:



            $ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input


            if we see Start, load the next line with N, then see if the contents of the buffer are just Somename:StartnSomename:End with Somename same on both lines (n is a newline). If so, delete it. Here, 1 is a reference to the first group within (..), and matches the same string that was encountered there. .* just means any number (*) of any characters (.).



            Using sed -e '/Start/,/End/d' would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start and End, so everything is deleted.






            share|improve this answer






















            • Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
              – mikro45
              Jan 14 at 20:44










            • @mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
              – ilkkachu
              Jan 14 at 21:01










            • It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
              – mikro45
              Jan 14 at 21:33










            • @mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
              – ilkkachu
              Jan 14 at 21:44














            up vote
            2
            down vote



            accepted










            To delete back-to-back Start and End lines, this should do in GNU sed:



            $ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input


            if we see Start, load the next line with N, then see if the contents of the buffer are just Somename:StartnSomename:End with Somename same on both lines (n is a newline). If so, delete it. Here, 1 is a reference to the first group within (..), and matches the same string that was encountered there. .* just means any number (*) of any characters (.).



            Using sed -e '/Start/,/End/d' would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start and End, so everything is deleted.






            share|improve this answer






















            • Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
              – mikro45
              Jan 14 at 20:44










            • @mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
              – ilkkachu
              Jan 14 at 21:01










            • It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
              – mikro45
              Jan 14 at 21:33










            • @mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
              – ilkkachu
              Jan 14 at 21:44












            up vote
            2
            down vote



            accepted







            up vote
            2
            down vote



            accepted






            To delete back-to-back Start and End lines, this should do in GNU sed:



            $ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input


            if we see Start, load the next line with N, then see if the contents of the buffer are just Somename:StartnSomename:End with Somename same on both lines (n is a newline). If so, delete it. Here, 1 is a reference to the first group within (..), and matches the same string that was encountered there. .* just means any number (*) of any characters (.).



            Using sed -e '/Start/,/End/d' would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start and End, so everything is deleted.






            share|improve this answer














            To delete back-to-back Start and End lines, this should do in GNU sed:



            $ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input


            if we see Start, load the next line with N, then see if the contents of the buffer are just Somename:StartnSomename:End with Somename same on both lines (n is a newline). If so, delete it. Here, 1 is a reference to the first group within (..), and matches the same string that was encountered there. .* just means any number (*) of any characters (.).



            Using sed -e '/Start/,/End/d' would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start and End, so everything is deleted.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 14 at 21:43

























            answered Jan 14 at 20:39









            ilkkachu

            49.8k674137




            49.8k674137











            • Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
              – mikro45
              Jan 14 at 20:44










            • @mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
              – ilkkachu
              Jan 14 at 21:01










            • It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
              – mikro45
              Jan 14 at 21:33










            • @mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
              – ilkkachu
              Jan 14 at 21:44
















            • Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
              – mikro45
              Jan 14 at 20:44










            • @mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
              – ilkkachu
              Jan 14 at 21:01










            • It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
              – mikro45
              Jan 14 at 21:33










            • @mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
              – ilkkachu
              Jan 14 at 21:44















            Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
            – mikro45
            Jan 14 at 20:44




            Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
            – mikro45
            Jan 14 at 20:44












            @mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
            – ilkkachu
            Jan 14 at 21:01




            @mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
            – ilkkachu
            Jan 14 at 21:01












            It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
            – mikro45
            Jan 14 at 21:33




            It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
            – mikro45
            Jan 14 at 21:33












            @mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
            – ilkkachu
            Jan 14 at 21:44




            @mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
            – ilkkachu
            Jan 14 at 21:44












            up vote
            1
            down vote













            another solution, as I like trying to do these in awk.



            BEGIN 
            RS="Endn"
            ORS="Endn"

            NF > 2


            using the built in RS or record separator variable, awk will treat between each Endn as a record, and presuming that the servername:Start and servername:End are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2 line. If this is true, the whole line will be printed, with Endn used as the output record separator (ORS)



            ~$>echo '
            Server1:Start
            Server1:End
            Server2:Start
            Disk1
            Disk2
            Server2:End
            Server3:Start
            Disk1
            Server3:End
            ' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
            Server2:Start
            Disk1
            Disk2
            Server2:End
            Server3:Start
            Disk1
            Server3:End





            share|improve this answer
























              up vote
              1
              down vote













              another solution, as I like trying to do these in awk.



              BEGIN 
              RS="Endn"
              ORS="Endn"

              NF > 2


              using the built in RS or record separator variable, awk will treat between each Endn as a record, and presuming that the servername:Start and servername:End are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2 line. If this is true, the whole line will be printed, with Endn used as the output record separator (ORS)



              ~$>echo '
              Server1:Start
              Server1:End
              Server2:Start
              Disk1
              Disk2
              Server2:End
              Server3:Start
              Disk1
              Server3:End
              ' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
              Server2:Start
              Disk1
              Disk2
              Server2:End
              Server3:Start
              Disk1
              Server3:End





              share|improve this answer






















                up vote
                1
                down vote










                up vote
                1
                down vote









                another solution, as I like trying to do these in awk.



                BEGIN 
                RS="Endn"
                ORS="Endn"

                NF > 2


                using the built in RS or record separator variable, awk will treat between each Endn as a record, and presuming that the servername:Start and servername:End are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2 line. If this is true, the whole line will be printed, with Endn used as the output record separator (ORS)



                ~$>echo '
                Server1:Start
                Server1:End
                Server2:Start
                Disk1
                Disk2
                Server2:End
                Server3:Start
                Disk1
                Server3:End
                ' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
                Server2:Start
                Disk1
                Disk2
                Server2:End
                Server3:Start
                Disk1
                Server3:End





                share|improve this answer












                another solution, as I like trying to do these in awk.



                BEGIN 
                RS="Endn"
                ORS="Endn"

                NF > 2


                using the built in RS or record separator variable, awk will treat between each Endn as a record, and presuming that the servername:Start and servername:End are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2 line. If this is true, the whole line will be printed, with Endn used as the output record separator (ORS)



                ~$>echo '
                Server1:Start
                Server1:End
                Server2:Start
                Disk1
                Disk2
                Server2:End
                Server3:Start
                Disk1
                Server3:End
                ' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
                Server2:Start
                Disk1
                Disk2
                Server2:End
                Server3:Start
                Disk1
                Server3:End






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 17 at 23:23









                Guy

                7231318




                7231318






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f417080%2ftext-processing-get-2-lines-with-exact-text-between-them%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    How to check contact read email or not when send email to Individual?

                    Displaying single band from multi-band raster using QGIS

                    How many registers does an x86_64 CPU actually have?