Removing (possibly nested) text quotes in command line

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












2















I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].



Example input with nested quotes could be something like:



text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


And expected output would be:



text part 1 text part 2 text part 3


With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b' but middle part ([^[/]] is problematic since quotes can contain characters like [ or ].



That being said, my sed command doesn't work if input is eg.



text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


One problem is that sed doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.



I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?



Now the final question is that what would be the best and most efficient way to solve this?










share|improve this question


























    2















    I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].



    Example input with nested quotes could be something like:



    text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


    And expected output would be:



    text part 1 text part 2 text part 3


    With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b' but middle part ([^[/]] is problematic since quotes can contain characters like [ or ].



    That being said, my sed command doesn't work if input is eg.



    text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


    One problem is that sed doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.



    I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?



    Now the final question is that what would be the best and most efficient way to solve this?










    share|improve this question
























      2












      2








      2








      I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].



      Example input with nested quotes could be something like:



      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


      And expected output would be:



      text part 1 text part 2 text part 3


      With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b' but middle part ([^[/]] is problematic since quotes can contain characters like [ or ].



      That being said, my sed command doesn't work if input is eg.



      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


      One problem is that sed doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.



      I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?



      Now the final question is that what would be the best and most efficient way to solve this?










      share|improve this question














      I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].



      Example input with nested quotes could be something like:



      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


      And expected output would be:



      text part 1 text part 2 text part 3


      With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b' but middle part ([^[/]] is problematic since quotes can contain characters like [ or ].



      That being said, my sed command doesn't work if input is eg.



      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3


      One problem is that sed doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.



      I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?



      Now the final question is that what would be the best and most efficient way to solve this?







      bash text-processing sed regular-expression






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 1 at 11:19









      pipopipo

      1133




      1133




















          4 Answers
          4






          active

          oldest

          votes


















          3














          If you know the input doesn't contain < or > characters, you could do:



          sed '
          # replace opening quote with <
          s|[quote=[^]]*]|<|g
          # and closing quotes with >
          s|[/quote]|>|g
          :1
          # work our way from the inner quotes
          s|<[^<>]*>||g
          t1'


          If it may contain < or > characters, you can escape them using a scheme like:



          sed '
          # escape < and > (and the escaping character _ itself)
          s/_/_u/g; s/</_l/g; s/>/_r/g

          <code-above>

          # undo escaping after the work has been done
          s/_r/>/g; s/_l/</g; s/_u/_/g'


          With perl, using recursive regexps:



          perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'


          Or even, as you mention:



          perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'


          With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:



          :0
          $!
          N;b0



          So as to load the whole input into the pattern space.






          share|improve this answer




















          • 1





            Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

            – pipo
            Mar 1 at 12:45












          • The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

            – Freddy
            Mar 1 at 12:56











          • @Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

            – Stéphane Chazelas
            Mar 1 at 13:01












          • @StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

            – Freddy
            Mar 1 at 14:06







          • 1





            @Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

            – Stéphane Chazelas
            Mar 1 at 14:50



















          0














          I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar. Without it sed deleted everything between tags leaving just text part 1 text part 3



          sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile


          instead if testfile you may just pipe it with cat






          share|improve this answer
































            0














            A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.



            #!/bin/bash

            # disable pathname expansion
            set -f
            cnt=0
            for i in $(<$1); do
            # start quote
            if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
            ((++cnt))
            elif [ "$i" = "[/quote]" ]; then
            ((--cnt))
            elif [ $cnt -eq 0 ]; then
            echo -n "$i "
            fi
            done
            echo


            Output:



            $ cat q1
            text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
            $ ./parse.sh q1
            text part 1 text part 2 text part 3
            $ cat q2
            text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
            $ ./parse.sh q2
            text part 1 text part 2 text part 3





            share|improve this answer

























            • Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

              – Stéphane Chazelas
              Mar 1 at 13:09











            • Good point, thanks! Added "set -f" to fix that.

              – Freddy
              Mar 1 at 13:33


















            0














            You can do this with POSIX sed as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
            transformation required.



            $ sed -e '
            :top
            /[/quote]/!b
            s//
            &/
            s/[quote=/

            &/

            :loop
            s/(nn)([quote=.*)([quote=.*n)/213/
            tloop

            s/nn.*n[/quote]//
            btop
            ' input.txt





            share|improve this answer























              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503755%2fremoving-possibly-nested-text-quotes-in-command-line%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              If you know the input doesn't contain < or > characters, you could do:



              sed '
              # replace opening quote with <
              s|[quote=[^]]*]|<|g
              # and closing quotes with >
              s|[/quote]|>|g
              :1
              # work our way from the inner quotes
              s|<[^<>]*>||g
              t1'


              If it may contain < or > characters, you can escape them using a scheme like:



              sed '
              # escape < and > (and the escaping character _ itself)
              s/_/_u/g; s/</_l/g; s/>/_r/g

              <code-above>

              # undo escaping after the work has been done
              s/_r/>/g; s/_l/</g; s/_u/_/g'


              With perl, using recursive regexps:



              perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'


              Or even, as you mention:



              perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'


              With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:



              :0
              $!
              N;b0



              So as to load the whole input into the pattern space.






              share|improve this answer




















              • 1





                Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

                – pipo
                Mar 1 at 12:45












              • The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

                – Freddy
                Mar 1 at 12:56











              • @Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

                – Stéphane Chazelas
                Mar 1 at 13:01












              • @StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

                – Freddy
                Mar 1 at 14:06







              • 1





                @Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

                – Stéphane Chazelas
                Mar 1 at 14:50
















              3














              If you know the input doesn't contain < or > characters, you could do:



              sed '
              # replace opening quote with <
              s|[quote=[^]]*]|<|g
              # and closing quotes with >
              s|[/quote]|>|g
              :1
              # work our way from the inner quotes
              s|<[^<>]*>||g
              t1'


              If it may contain < or > characters, you can escape them using a scheme like:



              sed '
              # escape < and > (and the escaping character _ itself)
              s/_/_u/g; s/</_l/g; s/>/_r/g

              <code-above>

              # undo escaping after the work has been done
              s/_r/>/g; s/_l/</g; s/_u/_/g'


              With perl, using recursive regexps:



              perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'


              Or even, as you mention:



              perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'


              With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:



              :0
              $!
              N;b0



              So as to load the whole input into the pattern space.






              share|improve this answer




















              • 1





                Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

                – pipo
                Mar 1 at 12:45












              • The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

                – Freddy
                Mar 1 at 12:56











              • @Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

                – Stéphane Chazelas
                Mar 1 at 13:01












              • @StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

                – Freddy
                Mar 1 at 14:06







              • 1





                @Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

                – Stéphane Chazelas
                Mar 1 at 14:50














              3












              3








              3







              If you know the input doesn't contain < or > characters, you could do:



              sed '
              # replace opening quote with <
              s|[quote=[^]]*]|<|g
              # and closing quotes with >
              s|[/quote]|>|g
              :1
              # work our way from the inner quotes
              s|<[^<>]*>||g
              t1'


              If it may contain < or > characters, you can escape them using a scheme like:



              sed '
              # escape < and > (and the escaping character _ itself)
              s/_/_u/g; s/</_l/g; s/>/_r/g

              <code-above>

              # undo escaping after the work has been done
              s/_r/>/g; s/_l/</g; s/_u/_/g'


              With perl, using recursive regexps:



              perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'


              Or even, as you mention:



              perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'


              With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:



              :0
              $!
              N;b0



              So as to load the whole input into the pattern space.






              share|improve this answer















              If you know the input doesn't contain < or > characters, you could do:



              sed '
              # replace opening quote with <
              s|[quote=[^]]*]|<|g
              # and closing quotes with >
              s|[/quote]|>|g
              :1
              # work our way from the inner quotes
              s|<[^<>]*>||g
              t1'


              If it may contain < or > characters, you can escape them using a scheme like:



              sed '
              # escape < and > (and the escaping character _ itself)
              s/_/_u/g; s/</_l/g; s/>/_r/g

              <code-above>

              # undo escaping after the work has been done
              s/_r/>/g; s/_l/</g; s/_u/_/g'


              With perl, using recursive regexps:



              perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'


              Or even, as you mention:



              perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'


              With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:



              :0
              $!
              N;b0



              So as to load the whole input into the pattern space.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Mar 1 at 14:53

























              answered Mar 1 at 12:27









              Stéphane ChazelasStéphane Chazelas

              312k57589946




              312k57589946







              • 1





                Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

                – pipo
                Mar 1 at 12:45












              • The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

                – Freddy
                Mar 1 at 12:56











              • @Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

                – Stéphane Chazelas
                Mar 1 at 13:01












              • @StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

                – Freddy
                Mar 1 at 14:06







              • 1





                @Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

                – Stéphane Chazelas
                Mar 1 at 14:50













              • 1





                Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

                – pipo
                Mar 1 at 12:45












              • The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

                – Freddy
                Mar 1 at 12:56











              • @Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

                – Stéphane Chazelas
                Mar 1 at 13:01












              • @StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

                – Freddy
                Mar 1 at 14:06







              • 1





                @Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

                – Stéphane Chazelas
                Mar 1 at 14:50








              1




              1





              Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

              – pipo
              Mar 1 at 12:45






              Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

              – pipo
              Mar 1 at 12:45














              The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

              – Freddy
              Mar 1 at 12:56





              The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

              – Freddy
              Mar 1 at 12:56













              @Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

              – Stéphane Chazelas
              Mar 1 at 13:01






              @Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

              – Stéphane Chazelas
              Mar 1 at 13:01














              @StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

              – Freddy
              Mar 1 at 14:06






              @StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

              – Freddy
              Mar 1 at 14:06





              1




              1





              @Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

              – Stéphane Chazelas
              Mar 1 at 14:50






              @Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

              – Stéphane Chazelas
              Mar 1 at 14:50














              0














              I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar. Without it sed deleted everything between tags leaving just text part 1 text part 3



              sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile


              instead if testfile you may just pipe it with cat






              share|improve this answer





























                0














                I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar. Without it sed deleted everything between tags leaving just text part 1 text part 3



                sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile


                instead if testfile you may just pipe it with cat






                share|improve this answer



























                  0












                  0








                  0







                  I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar. Without it sed deleted everything between tags leaving just text part 1 text part 3



                  sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile


                  instead if testfile you may just pipe it with cat






                  share|improve this answer















                  I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar. Without it sed deleted everything between tags leaving just text part 1 text part 3



                  sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile


                  instead if testfile you may just pipe it with cat







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 1 at 12:32

























                  answered Mar 1 at 12:20









                  Igor VoltaicIgor Voltaic

                  11




                  11





















                      0














                      A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.



                      #!/bin/bash

                      # disable pathname expansion
                      set -f
                      cnt=0
                      for i in $(<$1); do
                      # start quote
                      if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
                      ((++cnt))
                      elif [ "$i" = "[/quote]" ]; then
                      ((--cnt))
                      elif [ $cnt -eq 0 ]; then
                      echo -n "$i "
                      fi
                      done
                      echo


                      Output:



                      $ cat q1
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q1
                      text part 1 text part 2 text part 3
                      $ cat q2
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q2
                      text part 1 text part 2 text part 3





                      share|improve this answer

























                      • Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

                        – Stéphane Chazelas
                        Mar 1 at 13:09











                      • Good point, thanks! Added "set -f" to fix that.

                        – Freddy
                        Mar 1 at 13:33















                      0














                      A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.



                      #!/bin/bash

                      # disable pathname expansion
                      set -f
                      cnt=0
                      for i in $(<$1); do
                      # start quote
                      if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
                      ((++cnt))
                      elif [ "$i" = "[/quote]" ]; then
                      ((--cnt))
                      elif [ $cnt -eq 0 ]; then
                      echo -n "$i "
                      fi
                      done
                      echo


                      Output:



                      $ cat q1
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q1
                      text part 1 text part 2 text part 3
                      $ cat q2
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q2
                      text part 1 text part 2 text part 3





                      share|improve this answer

























                      • Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

                        – Stéphane Chazelas
                        Mar 1 at 13:09











                      • Good point, thanks! Added "set -f" to fix that.

                        – Freddy
                        Mar 1 at 13:33













                      0












                      0








                      0







                      A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.



                      #!/bin/bash

                      # disable pathname expansion
                      set -f
                      cnt=0
                      for i in $(<$1); do
                      # start quote
                      if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
                      ((++cnt))
                      elif [ "$i" = "[/quote]" ]; then
                      ((--cnt))
                      elif [ $cnt -eq 0 ]; then
                      echo -n "$i "
                      fi
                      done
                      echo


                      Output:



                      $ cat q1
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q1
                      text part 1 text part 2 text part 3
                      $ cat q2
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q2
                      text part 1 text part 2 text part 3





                      share|improve this answer















                      A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.



                      #!/bin/bash

                      # disable pathname expansion
                      set -f
                      cnt=0
                      for i in $(<$1); do
                      # start quote
                      if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
                      ((++cnt))
                      elif [ "$i" = "[/quote]" ]; then
                      ((--cnt))
                      elif [ $cnt -eq 0 ]; then
                      echo -n "$i "
                      fi
                      done
                      echo


                      Output:



                      $ cat q1
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q1
                      text part 1 text part 2 text part 3
                      $ cat q2
                      text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
                      $ ./parse.sh q2
                      text part 1 text part 2 text part 3






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Mar 1 at 13:31

























                      answered Mar 1 at 12:19









                      FreddyFreddy

                      1,414210




                      1,414210












                      • Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

                        – Stéphane Chazelas
                        Mar 1 at 13:09











                      • Good point, thanks! Added "set -f" to fix that.

                        – Freddy
                        Mar 1 at 13:33

















                      • Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

                        – Stéphane Chazelas
                        Mar 1 at 13:09











                      • Good point, thanks! Added "set -f" to fix that.

                        – Freddy
                        Mar 1 at 13:33
















                      Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

                      – Stéphane Chazelas
                      Mar 1 at 13:09





                      Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

                      – Stéphane Chazelas
                      Mar 1 at 13:09













                      Good point, thanks! Added "set -f" to fix that.

                      – Freddy
                      Mar 1 at 13:33





                      Good point, thanks! Added "set -f" to fix that.

                      – Freddy
                      Mar 1 at 13:33











                      0














                      You can do this with POSIX sed as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
                      transformation required.



                      $ sed -e '
                      :top
                      /[/quote]/!b
                      s//
                      &/
                      s/[quote=/

                      &/

                      :loop
                      s/(nn)([quote=.*)([quote=.*n)/213/
                      tloop

                      s/nn.*n[/quote]//
                      btop
                      ' input.txt





                      share|improve this answer



























                        0














                        You can do this with POSIX sed as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
                        transformation required.



                        $ sed -e '
                        :top
                        /[/quote]/!b
                        s//
                        &/
                        s/[quote=/

                        &/

                        :loop
                        s/(nn)([quote=.*)([quote=.*n)/213/
                        tloop

                        s/nn.*n[/quote]//
                        btop
                        ' input.txt





                        share|improve this answer

























                          0












                          0








                          0







                          You can do this with POSIX sed as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
                          transformation required.



                          $ sed -e '
                          :top
                          /[/quote]/!b
                          s//
                          &/
                          s/[quote=/

                          &/

                          :loop
                          s/(nn)([quote=.*)([quote=.*n)/213/
                          tloop

                          s/nn.*n[/quote]//
                          btop
                          ' input.txt





                          share|improve this answer













                          You can do this with POSIX sed as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
                          transformation required.



                          $ sed -e '
                          :top
                          /[/quote]/!b
                          s//
                          &/
                          s/[quote=/

                          &/

                          :loop
                          s/(nn)([quote=.*)([quote=.*n)/213/
                          tloop

                          s/nn.*n[/quote]//
                          btop
                          ' input.txt






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Mar 3 at 4:24









                          Rakesh SharmaRakesh Sharma

                          392115




                          392115



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Unix & Linux Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503755%2fremoving-possibly-nested-text-quotes-in-command-line%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown






                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Bahrain

                              Postfix configuration issue with fips on centos 7; mailgun relay