Removing (possibly nested) text quotes in command line

I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].

Example input with nested quotes could be something like:

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

And expected output would be:

text part 1 text part 2 text part 3

With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b' but middle part ([^[/]] is problematic since quotes can contain characters like [ or ].

That being said, my sed command doesn't work if input is eg.

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

One problem is that sed doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.

I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?

Now the final question is that what would be the best and most efficient way to solve this?

asked Mar 1 at 11:19

pipo

1133

add a comment |

I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].

Example input with nested quotes could be something like:

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

And expected output would be:

text part 1 text part 2 text part 3

That being said, my sed command doesn't work if input is eg.

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?

Now the final question is that what would be the best and most efficient way to solve this?

asked Mar 1 at 11:19

pipo

1133

add a comment |

I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].

Example input with nested quotes could be something like:

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

And expected output would be:

text part 1 text part 2 text part 3

That being said, my sed command doesn't work if input is eg.

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?

Now the final question is that what would be the best and most efficient way to solve this?

asked Mar 1 at 11:19

pipo

1133

I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote].

Example input with nested quotes could be something like:

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

And expected output would be:

text part 1 text part 2 text part 3

That being said, my sed command doesn't work if input is eg.

text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3

I also guess that sed is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl or awk could work better?

Now the final question is that what would be the best and most efficient way to solve this?

bash text-processing sed regular-expression

asked Mar 1 at 11:19

pipo

1133

asked Mar 1 at 11:19

pipo

1133

asked Mar 1 at 11:19

pipo

1133

asked Mar 1 at 11:19

pipo

1133

asked Mar 1 at 11:19

pipo

1133

add a comment |

4 Answers
4

active

oldest

votes

If you know the input doesn't contain < or > characters, you could do:

sed '
 # replace opening quote with <
 s|[quote=[^]]*]|<|g
 # and closing quotes with >
 s|[/quote]|>|g
 :1
 # work our way from the inner quotes
 s|<[^<>]*>||g
 t1'

If it may contain < or > characters, you can escape them using a scheme like:

sed '
 # escape < and > (and the escaping character _ itself)
 s/_/_u/g; s/</_l/g; s/>/_r/g

 <code-above>

 # undo escaping after the work has been done
 s/_r/>/g; s/_l/</g; s/_u/_/g'

With perl, using recursive regexps:

perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'

Or even, as you mention:

perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:

:0
$!
 N;b0

So as to load the whole input into the pattern space.

edited Mar 1 at 14:53

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

1

Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

– pipo
Mar 1 at 12:45

The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

– Freddy
Mar 1 at 12:56

@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

– Stéphane Chazelas
Mar 1 at 13:01

@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

– Freddy
Mar 1 at 14:06

1

@Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

– Stéphane Chazelas
Mar 1 at 14:50

|
show 2 more comments

I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar. Without it sed deleted everything between tags leaving just text part 1 text part 3

sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile

instead if testfile you may just pipe it with cat

edited Mar 1 at 12:32

answered Mar 1 at 12:20

Igor Voltaic

add a comment |

A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.

#!/bin/bash

# disable pathname expansion
set -f 
cnt=0
for i in $(<$1); do
 # start quote
 if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
 ((++cnt))
 elif [ "$i" = "[/quote]" ]; then
 ((--cnt))
 elif [ $cnt -eq 0 ]; then
 echo -n "$i "
 fi
done
echo

Output:

$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3

edited Mar 1 at 13:31

answered Mar 1 at 12:19

Freddy

1,414210

Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

– Stéphane Chazelas
Mar 1 at 13:09

Good point, thanks! Added "set -f" to fix that.

– Freddy
Mar 1 at 13:33

add a comment |

You can do this with POSIX sed as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
transformation required.

$ sed -e '
 :top
 /[/quote]/!b
 s//
&/
 s/[quote=/

&/

 :loop
 s/(nn)([quote=.*)([quote=.*n)/213/
 tloop

 s/nn.*n[/quote]//
 btop
 ' input.txt

answered Mar 3 at 4:24

Rakesh Sharma

392115

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503755%2fremoving-possibly-nested-text-quotes-in-command-line%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

If you know the input doesn't contain < or > characters, you could do:

sed '
 # replace opening quote with <
 s|[quote=[^]]*]|<|g
 # and closing quotes with >
 s|[/quote]|>|g
 :1
 # work our way from the inner quotes
 s|<[^<>]*>||g
 t1'

If it may contain < or > characters, you can escape them using a scheme like:

sed '
 # escape < and > (and the escaping character _ itself)
 s/_/_u/g; s/</_l/g; s/>/_r/g

 <code-above>

 # undo escaping after the work has been done
 s/_r/>/g; s/_l/</g; s/_u/_/g'

With perl, using recursive regexps:

perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'

Or even, as you mention:

perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:

:0
$!
 N;b0

So as to load the whole input into the pattern space.

edited Mar 1 at 14:53

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

1

Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

– pipo
Mar 1 at 12:45

The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

– Freddy
Mar 1 at 12:56

@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

– Stéphane Chazelas
Mar 1 at 13:01

@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

– Freddy
Mar 1 at 14:06

1

@Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

– Stéphane Chazelas
Mar 1 at 14:50

|
show 2 more comments

If you know the input doesn't contain < or > characters, you could do:

sed '
 # replace opening quote with <
 s|[quote=[^]]*]|<|g
 # and closing quotes with >
 s|[/quote]|>|g
 :1
 # work our way from the inner quotes
 s|<[^<>]*>||g
 t1'

If it may contain < or > characters, you can escape them using a scheme like:

sed '
 # escape < and > (and the escaping character _ itself)
 s/_/_u/g; s/</_l/g; s/>/_r/g

 <code-above>

 # undo escaping after the work has been done
 s/_r/>/g; s/_l/</g; s/_u/_/g'

With perl, using recursive regexps:

perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'

Or even, as you mention:

perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:

:0
$!
 N;b0

So as to load the whole input into the pattern space.

edited Mar 1 at 14:53

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

1

Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

– pipo
Mar 1 at 12:45

The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

– Freddy
Mar 1 at 12:56

@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

– Stéphane Chazelas
Mar 1 at 13:01

@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

– Freddy
Mar 1 at 14:06

1

@Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

– Stéphane Chazelas
Mar 1 at 14:50

|
show 2 more comments

If you know the input doesn't contain < or > characters, you could do:

sed '
 # replace opening quote with <
 s|[quote=[^]]*]|<|g
 # and closing quotes with >
 s|[/quote]|>|g
 :1
 # work our way from the inner quotes
 s|<[^<>]*>||g
 t1'

If it may contain < or > characters, you can escape them using a scheme like:

sed '
 # escape < and > (and the escaping character _ itself)
 s/_/_u/g; s/</_l/g; s/>/_r/g

 <code-above>

 # undo escaping after the work has been done
 s/_r/>/g; s/_l/</g; s/_u/_/g'

With perl, using recursive regexps:

perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'

Or even, as you mention:

perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:

:0
$!
 N;b0

So as to load the whole input into the pattern space.

edited Mar 1 at 14:53

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

If you know the input doesn't contain < or > characters, you could do:

sed '
 # replace opening quote with <
 s|[quote=[^]]*]|<|g
 # and closing quotes with >
 s|[/quote]|>|g
 :1
 # work our way from the inner quotes
 s|<[^<>]*>||g
 t1'

If it may contain < or > characters, you can escape them using a scheme like:

sed '
 # escape < and > (and the escaping character _ itself)
 s/_/_u/g; s/</_l/g; s/>/_r/g

 <code-above>

 # undo escaping after the work has been done
 s/_r/>/g; s/_l/</g; s/_u/_/g'

With perl, using recursive regexps:

perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'

Or even, as you mention:

perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

With perl, you can handle multiline input by adding the -0777 option. With sed, you'd need to prefix the code with:

:0
$!
 N;b0

So as to load the whole input into the pattern space.

edited Mar 1 at 14:53

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

edited Mar 1 at 14:53

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

answered Mar 1 at 12:27

Stéphane Chazelas

312k57589946

1

Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

– pipo
Mar 1 at 12:45

The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

– Freddy
Mar 1 at 12:56

@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

– Stéphane Chazelas
Mar 1 at 13:01

@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

– Freddy
Mar 1 at 14:06

1

@Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

– Stéphane Chazelas
Mar 1 at 14:50

|
show 2 more comments

1

Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

– pipo
Mar 1 at 12:45

The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

– Freddy
Mar 1 at 12:56

@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

– Stéphane Chazelas
Mar 1 at 13:01

@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

– Freddy
Mar 1 at 14:06

1

@Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

– Stéphane Chazelas
Mar 1 at 14:50

Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced [^]]* with .*? and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'

– pipo
Mar 1 at 12:45

The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".

– Freddy
Mar 1 at 12:56

@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The perl one would also have problems with [quote=foo] [quote= [/quote] and would struggle for mismatched quotes.

– Stéphane Chazelas
Mar 1 at 13:01

@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains [foo] I can see no reason why [quote should be invalid input.

– Freddy
Mar 1 at 14:06

@Freddy, but then at some point we need to decide where we stop. Is [quote=x] [quot= [/quote] valid for instance? Is [quote=some [quote] user] valid? Does the format have a way to escape [s or [quote?... Anyway, I've added the = in the sed regexp so [quote=foo] [quote [/quote] would no longer be a problem. [quote=foo] [quote= [/quote] would still be.

– Stéphane Chazelas
Mar 1 at 14:50

|
show 2 more comments

sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile

instead if testfile you may just pipe it with cat

edited Mar 1 at 12:32

answered Mar 1 at 12:20

Igor Voltaic

add a comment |

sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile

instead if testfile you may just pipe it with cat

edited Mar 1 at 12:32

answered Mar 1 at 12:20

Igor Voltaic

add a comment |

sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile

instead if testfile you may just pipe it with cat

edited Mar 1 at 12:32

answered Mar 1 at 12:20

Igor Voltaic

sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile

instead if testfile you may just pipe it with cat

edited Mar 1 at 12:32

answered Mar 1 at 12:20

Igor Voltaic

edited Mar 1 at 12:32

answered Mar 1 at 12:20

Igor Voltaic

answered Mar 1 at 12:20

Igor Voltaic

answered Mar 1 at 12:20

Igor Voltaic

add a comment |

A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.

#!/bin/bash

# disable pathname expansion
set -f 
cnt=0
for i in $(<$1); do
 # start quote
 if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
 ((++cnt))
 elif [ "$i" = "[/quote]" ]; then
 ((--cnt))
 elif [ $cnt -eq 0 ]; then
 echo -n "$i "
 fi
done
echo

Output:

$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3

edited Mar 1 at 13:31

answered Mar 1 at 12:19

Freddy

1,414210

Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

– Stéphane Chazelas
Mar 1 at 13:09

Good point, thanks! Added "set -f" to fix that.

– Freddy
Mar 1 at 13:33

add a comment |

A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.

#!/bin/bash

# disable pathname expansion
set -f 
cnt=0
for i in $(<$1); do
 # start quote
 if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
 ((++cnt))
 elif [ "$i" = "[/quote]" ]; then
 ((--cnt))
 elif [ $cnt -eq 0 ]; then
 echo -n "$i "
 fi
done
echo

Output:

$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3

edited Mar 1 at 13:31

answered Mar 1 at 12:19

Freddy

1,414210

Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

– Stéphane Chazelas
Mar 1 at 13:09

Good point, thanks! Added "set -f" to fix that.

– Freddy
Mar 1 at 13:33

add a comment |

A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.

#!/bin/bash

# disable pathname expansion
set -f 
cnt=0
for i in $(<$1); do
 # start quote
 if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
 ((++cnt))
 elif [ "$i" = "[/quote]" ]; then
 ((--cnt))
 elif [ $cnt -eq 0 ]; then
 echo -n "$i "
 fi
done
echo

Output:

$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3

edited Mar 1 at 13:31

answered Mar 1 at 12:19

Freddy

1,414210

A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0, then text snippets are skipped.

#!/bin/bash

# disable pathname expansion
set -f 
cnt=0
for i in $(<$1); do
 # start quote
 if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
 ((++cnt))
 elif [ "$i" = "[/quote]" ]; then
 ((--cnt))
 elif [ $cnt -eq 0 ]; then
 echo -n "$i "
 fi
done
echo

Output:

$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3

edited Mar 1 at 13:31

answered Mar 1 at 12:19

Freddy

1,414210

edited Mar 1 at 13:31

answered Mar 1 at 12:19

Freddy

1,414210

answered Mar 1 at 12:19

Freddy

1,414210

answered Mar 1 at 12:19

Freddy

1,414210

Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

– Stéphane Chazelas
Mar 1 at 13:09

Good point, thanks! Added "set -f" to fix that.

– Freddy
Mar 1 at 13:33

add a comment |

Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

– Stéphane Chazelas
Mar 1 at 13:09

Good point, thanks! Added "set -f" to fix that.

– Freddy
Mar 1 at 13:33

Leaving that $(<$1) unquoted is the split+glob operator in bash. [quote=foo] happens to be a glob (expands to the filenames in the current directory that are either q, u, o, t, e, = or f). So, for instance, if there were a f and o files in the current directory, [quote=foo] would be expanded to two words f and o. It would be worse if there were * words in the input for instance.

– Stéphane Chazelas
Mar 1 at 13:09

Good point, thanks! Added "set -f" to fix that.

– Freddy
Mar 1 at 13:33

add a comment |

$ sed -e '
 :top
 /[/quote]/!b
 s//
&/
 s/[quote=/

&/

 :loop
 s/(nn)([quote=.*)([quote=.*n)/213/
 tloop

 s/nn.*n[/quote]//
 btop
 ' input.txt

answered Mar 3 at 4:24

Rakesh Sharma

392115

add a comment |

$ sed -e '
 :top
 /[/quote]/!b
 s//
&/
 s/[quote=/

&/

 :loop
 s/(nn)([quote=.*)([quote=.*n)/213/
 tloop

 s/nn.*n[/quote]//
 btop
 ' input.txt

answered Mar 3 at 4:24

Rakesh Sharma

392115

add a comment |

$ sed -e '
 :top
 /[/quote]/!b
 s//
&/
 s/[quote=/

&/

 :loop
 s/(nn)([quote=.*)([quote=.*n)/213/
 tloop

 s/nn.*n[/quote]//
 btop
 ' input.txt

answered Mar 3 at 4:24

Rakesh Sharma

392115

$ sed -e '
 :top
 /[/quote]/!b
 s//
&/
 s/[quote=/

&/

 :loop
 s/(nn)([quote=.*)([quote=.*n)/213/
 tloop

 s/nn.*n[/quote]//
 btop
 ' input.txt

answered Mar 3 at 4:24

Rakesh Sharma

392115

answered Mar 3 at 4:24

Rakesh Sharma

392115

answered Mar 3 at 4:24

Rakesh Sharma

392115

answered Mar 3 at 4:24

Rakesh Sharma

392115

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu