Removing (possibly nested) text quotes in command line
Clash Royale CLAN TAG#URR8PPP
I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote]
.
Example input with nested quotes could be something like:
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
And expected output would be:
text part 1 text part 2 text part 3
With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b'
but middle part ([^[/]
] is problematic since quotes can contain characters like [
or ]
.
That being said, my sed
command doesn't work if input is eg.
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
One problem is that sed
doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.
I also guess that sed
is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl
or awk
could work better?
Now the final question is that what would be the best and most efficient way to solve this?
bash text-processing sed regular-expression
add a comment |
I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote]
.
Example input with nested quotes could be something like:
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
And expected output would be:
text part 1 text part 2 text part 3
With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b'
but middle part ([^[/]
] is problematic since quotes can contain characters like [
or ]
.
That being said, my sed
command doesn't work if input is eg.
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
One problem is that sed
doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.
I also guess that sed
is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl
or awk
could work better?
Now the final question is that what would be the best and most efficient way to solve this?
bash text-processing sed regular-expression
add a comment |
I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote]
.
Example input with nested quotes could be something like:
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
And expected output would be:
text part 1 text part 2 text part 3
With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b'
but middle part ([^[/]
] is problematic since quotes can contain characters like [
or ]
.
That being said, my sed
command doesn't work if input is eg.
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
One problem is that sed
doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.
I also guess that sed
is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl
or awk
could work better?
Now the final question is that what would be the best and most efficient way to solve this?
bash text-processing sed regular-expression
I need to parse large amounts text in command line and replace all (possibly nested) text quotes with spaces. Quotes are marked with specific syntax: [quote=username]quoted text[/quote]
.
Example input with nested quotes could be something like:
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
And expected output would be:
text part 1 text part 2 text part 3
With help of this question I got it somehow work (got output above) with sed ':b; s/[quote=[^]]*][^[/]*[/quote]/ /g; t b'
but middle part ([^[/]
] is problematic since quotes can contain characters like [
or ]
.
That being said, my sed
command doesn't work if input is eg.
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
One problem is that sed
doesn't seem to support non-greedy qualifier and thus catches always longest possible match from the input. That makes it hard to deal with a) usernames and b) quoted texts in general.
I also guess that sed
is not the best tool to solve this and it might not be even capable of doing things like that. Maybe eg. perl
or awk
could work better?
Now the final question is that what would be the best and most efficient way to solve this?
bash text-processing sed regular-expression
bash text-processing sed regular-expression
asked Mar 1 at 11:19
pipopipo
1133
1133
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
If you know the input doesn't contain <
or >
characters, you could do:
sed '
# replace opening quote with <
s|[quote=[^]]*]|<|g
# and closing quotes with >
s|[/quote]|>|g
:1
# work our way from the inner quotes
s|<[^<>]*>||g
t1'
If it may contain <
or >
characters, you can escape them using a scheme like:
sed '
# escape < and > (and the escaping character _ itself)
s/_/_u/g; s/</_l/g; s/>/_r/g
<code-above>
# undo escaping after the work has been done
s/_r/>/g; s/_l/</g; s/_u/_/g'
With perl
, using recursive regexps:
perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'
Or even, as you mention:
perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
With perl
, you can handle multiline input by adding the -0777
option. With sed
, you'd need to prefix the code with:
:0
$!
N;b0
So as to load the whole input into the pattern space.
1
Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced[^]]*
with.*?
and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up toperl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
– pipo
Mar 1 at 12:45
The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".
– Freddy
Mar 1 at 12:56
@Freddy, that doesn't appear to be valid input as per the OP's description of its format. Theperl
one would also have problems with[quote=foo] [quote= [/quote]
and would struggle for mismatched quotes.
– Stéphane Chazelas
Mar 1 at 13:01
@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains[foo]
I can see no reason why[quote
should be invalid input.
– Freddy
Mar 1 at 14:06
1
@Freddy, but then at some point we need to decide where we stop. Is[quote=x] [quot= [/quote]
valid for instance? Is[quote=some [quote] user]
valid? Does the format have a way to escape[
s or[quote
?... Anyway, I've added the=
in the sed regexp so[quote=foo] [quote [/quote]
would no longer be a problem.[quote=foo] [quote= [/quote]
would still be.
– Stéphane Chazelas
Mar 1 at 14:50
|
show 2 more comments
I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar
. Without it sed
deleted everything between tags leaving just text part 1 text part 3
sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile
instead if testfile
you may just pipe it with cat
add a comment |
A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0
, then text snippets are skipped.
#!/bin/bash
# disable pathname expansion
set -f
cnt=0
for i in $(<$1); do
# start quote
if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
((++cnt))
elif [ "$i" = "[/quote]" ]; then
((--cnt))
elif [ $cnt -eq 0 ]; then
echo -n "$i "
fi
done
echo
Output:
$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3
Leaving that$(<$1)
unquoted is the split+glob operator in bash.[quote=foo]
happens to be a glob (expands to the filenames in the current directory that are eitherq
,u
,o
,t
,e
,=
orf
). So, for instance, if there were af
ando
files in the current directory,[quote=foo]
would be expanded to two wordsf
ando
. It would be worse if there were*
words in the input for instance.
– Stéphane Chazelas
Mar 1 at 13:09
Good point, thanks! Added "set -f" to fix that.
– Freddy
Mar 1 at 13:33
add a comment |
You can do this with POSIX sed
as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
transformation required.
$ sed -e '
:top
/[/quote]/!b
s//
&/
s/[quote=/
&/
:loop
s/(nn)([quote=.*)([quote=.*n)/213/
tloop
s/nn.*n[/quote]//
btop
' input.txt
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503755%2fremoving-possibly-nested-text-quotes-in-command-line%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you know the input doesn't contain <
or >
characters, you could do:
sed '
# replace opening quote with <
s|[quote=[^]]*]|<|g
# and closing quotes with >
s|[/quote]|>|g
:1
# work our way from the inner quotes
s|<[^<>]*>||g
t1'
If it may contain <
or >
characters, you can escape them using a scheme like:
sed '
# escape < and > (and the escaping character _ itself)
s/_/_u/g; s/</_l/g; s/>/_r/g
<code-above>
# undo escaping after the work has been done
s/_r/>/g; s/_l/</g; s/_u/_/g'
With perl
, using recursive regexps:
perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'
Or even, as you mention:
perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
With perl
, you can handle multiline input by adding the -0777
option. With sed
, you'd need to prefix the code with:
:0
$!
N;b0
So as to load the whole input into the pattern space.
1
Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced[^]]*
with.*?
and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up toperl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
– pipo
Mar 1 at 12:45
The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".
– Freddy
Mar 1 at 12:56
@Freddy, that doesn't appear to be valid input as per the OP's description of its format. Theperl
one would also have problems with[quote=foo] [quote= [/quote]
and would struggle for mismatched quotes.
– Stéphane Chazelas
Mar 1 at 13:01
@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains[foo]
I can see no reason why[quote
should be invalid input.
– Freddy
Mar 1 at 14:06
1
@Freddy, but then at some point we need to decide where we stop. Is[quote=x] [quot= [/quote]
valid for instance? Is[quote=some [quote] user]
valid? Does the format have a way to escape[
s or[quote
?... Anyway, I've added the=
in the sed regexp so[quote=foo] [quote [/quote]
would no longer be a problem.[quote=foo] [quote= [/quote]
would still be.
– Stéphane Chazelas
Mar 1 at 14:50
|
show 2 more comments
If you know the input doesn't contain <
or >
characters, you could do:
sed '
# replace opening quote with <
s|[quote=[^]]*]|<|g
# and closing quotes with >
s|[/quote]|>|g
:1
# work our way from the inner quotes
s|<[^<>]*>||g
t1'
If it may contain <
or >
characters, you can escape them using a scheme like:
sed '
# escape < and > (and the escaping character _ itself)
s/_/_u/g; s/</_l/g; s/>/_r/g
<code-above>
# undo escaping after the work has been done
s/_r/>/g; s/_l/</g; s/_u/_/g'
With perl
, using recursive regexps:
perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'
Or even, as you mention:
perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
With perl
, you can handle multiline input by adding the -0777
option. With sed
, you'd need to prefix the code with:
:0
$!
N;b0
So as to load the whole input into the pattern space.
1
Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced[^]]*
with.*?
and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up toperl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
– pipo
Mar 1 at 12:45
The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".
– Freddy
Mar 1 at 12:56
@Freddy, that doesn't appear to be valid input as per the OP's description of its format. Theperl
one would also have problems with[quote=foo] [quote= [/quote]
and would struggle for mismatched quotes.
– Stéphane Chazelas
Mar 1 at 13:01
@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains[foo]
I can see no reason why[quote
should be invalid input.
– Freddy
Mar 1 at 14:06
1
@Freddy, but then at some point we need to decide where we stop. Is[quote=x] [quot= [/quote]
valid for instance? Is[quote=some [quote] user]
valid? Does the format have a way to escape[
s or[quote
?... Anyway, I've added the=
in the sed regexp so[quote=foo] [quote [/quote]
would no longer be a problem.[quote=foo] [quote= [/quote]
would still be.
– Stéphane Chazelas
Mar 1 at 14:50
|
show 2 more comments
If you know the input doesn't contain <
or >
characters, you could do:
sed '
# replace opening quote with <
s|[quote=[^]]*]|<|g
# and closing quotes with >
s|[/quote]|>|g
:1
# work our way from the inner quotes
s|<[^<>]*>||g
t1'
If it may contain <
or >
characters, you can escape them using a scheme like:
sed '
# escape < and > (and the escaping character _ itself)
s/_/_u/g; s/</_l/g; s/>/_r/g
<code-above>
# undo escaping after the work has been done
s/_r/>/g; s/_l/</g; s/_u/_/g'
With perl
, using recursive regexps:
perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'
Or even, as you mention:
perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
With perl
, you can handle multiline input by adding the -0777
option. With sed
, you'd need to prefix the code with:
:0
$!
N;b0
So as to load the whole input into the pattern space.
If you know the input doesn't contain <
or >
characters, you could do:
sed '
# replace opening quote with <
s|[quote=[^]]*]|<|g
# and closing quotes with >
s|[/quote]|>|g
:1
# work our way from the inner quotes
s|<[^<>]*>||g
t1'
If it may contain <
or >
characters, you can escape them using a scheme like:
sed '
# escape < and > (and the escaping character _ itself)
s/_/_u/g; s/</_l/g; s/>/_r/g
<code-above>
# undo escaping after the work has been done
s/_r/>/g; s/_l/</g; s/_u/_/g'
With perl
, using recursive regexps:
perl -pe 's@([quote=[^]]*](?:(?1)|.)*?[/quote])@@g'
Or even, as you mention:
perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
With perl
, you can handle multiline input by adding the -0777
option. With sed
, you'd need to prefix the code with:
:0
$!
N;b0
So as to load the whole input into the pattern space.
edited Mar 1 at 14:53
answered Mar 1 at 12:27
Stéphane ChazelasStéphane Chazelas
312k57589946
312k57589946
1
Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced[^]]*
with.*?
and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up toperl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
– pipo
Mar 1 at 12:45
The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".
– Freddy
Mar 1 at 12:56
@Freddy, that doesn't appear to be valid input as per the OP's description of its format. Theperl
one would also have problems with[quote=foo] [quote= [/quote]
and would struggle for mismatched quotes.
– Stéphane Chazelas
Mar 1 at 13:01
@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains[foo]
I can see no reason why[quote
should be invalid input.
– Freddy
Mar 1 at 14:06
1
@Freddy, but then at some point we need to decide where we stop. Is[quote=x] [quot= [/quote]
valid for instance? Is[quote=some [quote] user]
valid? Does the format have a way to escape[
s or[quote
?... Anyway, I've added the=
in the sed regexp so[quote=foo] [quote [/quote]
would no longer be a problem.[quote=foo] [quote= [/quote]
would still be.
– Stéphane Chazelas
Mar 1 at 14:50
|
show 2 more comments
1
Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced[^]]*
with.*?
and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up toperl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
– pipo
Mar 1 at 12:45
The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".
– Freddy
Mar 1 at 12:56
@Freddy, that doesn't appear to be valid input as per the OP's description of its format. Theperl
one would also have problems with[quote=foo] [quote= [/quote]
and would struggle for mismatched quotes.
– Stéphane Chazelas
Mar 1 at 13:01
@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains[foo]
I can see no reason why[quote
should be invalid input.
– Freddy
Mar 1 at 14:06
1
@Freddy, but then at some point we need to decide where we stop. Is[quote=x] [quot= [/quote]
valid for instance? Is[quote=some [quote] user]
valid? Does the format have a way to escape[
s or[quote
?... Anyway, I've added the=
in the sed regexp so[quote=foo] [quote [/quote]
would no longer be a problem.[quote=foo] [quote= [/quote]
would still be.
– Stéphane Chazelas
Mar 1 at 14:50
1
1
Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced
[^]]*
with .*?
and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
– pipo
Mar 1 at 12:45
Thanks, your perl solution here looks clean and simple and seems to work nicely. I replaced
[^]]*
with .*?
and since perl's non-greedy quantifier solves the issue I was trying to tackle with original version. So I ended up to perl -pe 's@([quote=.*?](?:(?1)|.)*?[/quote])@@g'
– pipo
Mar 1 at 12:45
The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".
– Freddy
Mar 1 at 12:56
The sed script outputs "< <" with input "[quote=foo] [quote [/quote]".
– Freddy
Mar 1 at 12:56
@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The
perl
one would also have problems with [quote=foo] [quote= [/quote]
and would struggle for mismatched quotes.– Stéphane Chazelas
Mar 1 at 13:01
@Freddy, that doesn't appear to be valid input as per the OP's description of its format. The
perl
one would also have problems with [quote=foo] [quote= [/quote]
and would struggle for mismatched quotes.– Stéphane Chazelas
Mar 1 at 13:01
@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains
[foo]
I can see no reason why [quote
should be invalid input.– Freddy
Mar 1 at 14:06
@StéphaneChazelas OP said "... quotes can contain characters like [ or ]" and since the example text contains
[foo]
I can see no reason why [quote
should be invalid input.– Freddy
Mar 1 at 14:06
1
1
@Freddy, but then at some point we need to decide where we stop. Is
[quote=x] [quot= [/quote]
valid for instance? Is [quote=some [quote] user]
valid? Does the format have a way to escape [
s or [quote
?... Anyway, I've added the =
in the sed regexp so [quote=foo] [quote [/quote]
would no longer be a problem. [quote=foo] [quote= [/quote]
would still be.– Stéphane Chazelas
Mar 1 at 14:50
@Freddy, but then at some point we need to decide where we stop. Is
[quote=x] [quot= [/quote]
valid for instance? Is [quote=some [quote] user]
valid? Does the format have a way to escape [
s or [quote
?... Anyway, I've added the =
in the sed regexp so [quote=foo] [quote [/quote]
would no longer be a problem. [quote=foo] [quote= [/quote]
would still be.– Stéphane Chazelas
Mar 1 at 14:50
|
show 2 more comments
I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar
. Without it sed
deleted everything between tags leaving just text part 1 text part 3
sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile
instead if testfile
you may just pipe it with cat
add a comment |
I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar
. Without it sed
deleted everything between tags leaving just text part 1 text part 3
sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile
instead if testfile
you may just pipe it with cat
add a comment |
I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar
. Without it sed
deleted everything between tags leaving just text part 1 text part 3
sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile
instead if testfile
you may just pipe it with cat
I checked this one and it worked for me. You might want to choose another temporary pattern instead of foobar
. Without it sed
deleted everything between tags leaving just text part 1 text part 3
sed -e 's//quote]/foobar]/3' -e 's/[.*/quote]//' -e 's/[.*foobar]//' testfile
instead if testfile
you may just pipe it with cat
edited Mar 1 at 12:32
answered Mar 1 at 12:20
Igor VoltaicIgor Voltaic
11
11
add a comment |
add a comment |
A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0
, then text snippets are skipped.
#!/bin/bash
# disable pathname expansion
set -f
cnt=0
for i in $(<$1); do
# start quote
if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
((++cnt))
elif [ "$i" = "[/quote]" ]; then
((--cnt))
elif [ $cnt -eq 0 ]; then
echo -n "$i "
fi
done
echo
Output:
$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3
Leaving that$(<$1)
unquoted is the split+glob operator in bash.[quote=foo]
happens to be a glob (expands to the filenames in the current directory that are eitherq
,u
,o
,t
,e
,=
orf
). So, for instance, if there were af
ando
files in the current directory,[quote=foo]
would be expanded to two wordsf
ando
. It would be worse if there were*
words in the input for instance.
– Stéphane Chazelas
Mar 1 at 13:09
Good point, thanks! Added "set -f" to fix that.
– Freddy
Mar 1 at 13:33
add a comment |
A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0
, then text snippets are skipped.
#!/bin/bash
# disable pathname expansion
set -f
cnt=0
for i in $(<$1); do
# start quote
if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
((++cnt))
elif [ "$i" = "[/quote]" ]; then
((--cnt))
elif [ $cnt -eq 0 ]; then
echo -n "$i "
fi
done
echo
Output:
$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3
Leaving that$(<$1)
unquoted is the split+glob operator in bash.[quote=foo]
happens to be a glob (expands to the filenames in the current directory that are eitherq
,u
,o
,t
,e
,=
orf
). So, for instance, if there were af
ando
files in the current directory,[quote=foo]
would be expanded to two wordsf
ando
. It would be worse if there were*
words in the input for instance.
– Stéphane Chazelas
Mar 1 at 13:09
Good point, thanks! Added "set -f" to fix that.
– Freddy
Mar 1 at 13:33
add a comment |
A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0
, then text snippets are skipped.
#!/bin/bash
# disable pathname expansion
set -f
cnt=0
for i in $(<$1); do
# start quote
if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
((++cnt))
elif [ "$i" = "[/quote]" ]; then
((--cnt))
elif [ $cnt -eq 0 ]; then
echo -n "$i "
fi
done
echo
Output:
$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3
A little script that increments a counter variable on each start-quote and decrements it on each end-quote. If the counter variable is greater 0
, then text snippets are skipped.
#!/bin/bash
# disable pathname expansion
set -f
cnt=0
for i in $(<$1); do
# start quote
if [ "$i##[quote=" != "$i" ] && [ "$i: -1" = "]" ]; then
((++cnt))
elif [ "$i" = "[/quote]" ]; then
((--cnt))
elif [ $cnt -eq 0 ]; then
echo -n "$i "
fi
done
echo
Output:
$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3
edited Mar 1 at 13:31
answered Mar 1 at 12:19
FreddyFreddy
1,414210
1,414210
Leaving that$(<$1)
unquoted is the split+glob operator in bash.[quote=foo]
happens to be a glob (expands to the filenames in the current directory that are eitherq
,u
,o
,t
,e
,=
orf
). So, for instance, if there were af
ando
files in the current directory,[quote=foo]
would be expanded to two wordsf
ando
. It would be worse if there were*
words in the input for instance.
– Stéphane Chazelas
Mar 1 at 13:09
Good point, thanks! Added "set -f" to fix that.
– Freddy
Mar 1 at 13:33
add a comment |
Leaving that$(<$1)
unquoted is the split+glob operator in bash.[quote=foo]
happens to be a glob (expands to the filenames in the current directory that are eitherq
,u
,o
,t
,e
,=
orf
). So, for instance, if there were af
ando
files in the current directory,[quote=foo]
would be expanded to two wordsf
ando
. It would be worse if there were*
words in the input for instance.
– Stéphane Chazelas
Mar 1 at 13:09
Good point, thanks! Added "set -f" to fix that.
– Freddy
Mar 1 at 13:33
Leaving that
$(<$1)
unquoted is the split+glob operator in bash. [quote=foo]
happens to be a glob (expands to the filenames in the current directory that are either q
, u
, o
, t
, e
, =
or f
). So, for instance, if there were a f
and o
files in the current directory, [quote=foo]
would be expanded to two words f
and o
. It would be worse if there were *
words in the input for instance.– Stéphane Chazelas
Mar 1 at 13:09
Leaving that
$(<$1)
unquoted is the split+glob operator in bash. [quote=foo]
happens to be a glob (expands to the filenames in the current directory that are either q
, u
, o
, t
, e
, =
or f
). So, for instance, if there were a f
and o
files in the current directory, [quote=foo]
would be expanded to two words f
and o
. It would be worse if there were *
words in the input for instance.– Stéphane Chazelas
Mar 1 at 13:09
Good point, thanks! Added "set -f" to fix that.
– Freddy
Mar 1 at 13:33
Good point, thanks! Added "set -f" to fix that.
– Freddy
Mar 1 at 13:33
add a comment |
You can do this with POSIX sed
as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
transformation required.
$ sed -e '
:top
/[/quote]/!b
s//
&/
s/[quote=/
&/
:loop
s/(nn)([quote=.*)([quote=.*n)/213/
tloop
s/nn.*n[/quote]//
btop
' input.txt
add a comment |
You can do this with POSIX sed
as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
transformation required.
$ sed -e '
:top
/[/quote]/!b
s//
&/
s/[quote=/
&/
:loop
s/(nn)([quote=.*)([quote=.*n)/213/
tloop
s/nn.*n[/quote]//
btop
' input.txt
add a comment |
You can do this with POSIX sed
as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
transformation required.
$ sed -e '
:top
/[/quote]/!b
s//
&/
s/[quote=/
&/
:loop
s/(nn)([quote=.*)([quote=.*n)/213/
tloop
s/nn.*n[/quote]//
btop
' input.txt
You can do this with POSIX sed
as detailed here. Note this solution applies to both kind of inputs shown by you. The limitations the input is not mulitiline, as we make use of newlines as markers to effect
transformation required.
$ sed -e '
:top
/[/quote]/!b
s//
&/
s/[quote=/
&/
:loop
s/(nn)([quote=.*)([quote=.*n)/213/
tloop
s/nn.*n[/quote]//
btop
' input.txt
answered Mar 3 at 4:24
Rakesh SharmaRakesh Sharma
392115
392115
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503755%2fremoving-possibly-nested-text-quotes-in-command-line%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown