How to ensure that string interpolated into `sed` substitution escapes all metachars
Clash Royale CLAN TAG#URR8PPP
up vote
15
down vote
favorite
I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f
. The generated sed commands are like:
s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g
Assume the script which generates the sed
commands is something like:
while read cid fileid
do
cidpat="$(echo $cid | sed -e s/\./\\./g)"
echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
done
How can I improve the script to ensure all regex metacharacters in the cid
string are escaped and interpolated properly?
sed quoting
add a comment |Â
up vote
15
down vote
favorite
I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f
. The generated sed commands are like:
s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g
Assume the script which generates the sed
commands is something like:
while read cid fileid
do
cidpat="$(echo $cid | sed -e s/\./\\./g)"
echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
done
How can I improve the script to ensure all regex metacharacters in the cid
string are escaped and interpolated properly?
sed quoting
add a comment |Â
up vote
15
down vote
favorite
up vote
15
down vote
favorite
I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f
. The generated sed commands are like:
s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g
Assume the script which generates the sed
commands is something like:
while read cid fileid
do
cidpat="$(echo $cid | sed -e s/\./\\./g)"
echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
done
How can I improve the script to ensure all regex metacharacters in the cid
string are escaped and interpolated properly?
sed quoting
I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f
. The generated sed commands are like:
s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g
Assume the script which generates the sed
commands is something like:
while read cid fileid
do
cidpat="$(echo $cid | sed -e s/\./\\./g)"
echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
done
How can I improve the script to ensure all regex metacharacters in the cid
string are escaped and interpolated properly?
sed quoting
edited May 12 '14 at 23:12
Gilles
506k11910011529
506k11910011529
asked May 12 '14 at 14:26
dan
1,46721625
1,46721625
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
19
down vote
accepted
To escape variables to be used on the left hand side and right hand side of a s
command in sed
(here $lhs
and $rhs
respectively), you'd do:
escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')
sed "s/$escaped_lhs/$escaped_rhs/"
Note that $lhs
cannot contain a newline character.
That is, on the LHS, escape all the regexp operators (][.^$*
), the escaping character itself (), and the separator (
/
).
On the RHS, you only need to escape &
, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/
)).
That assumes you use /
as a separator in your sed
s
commands and that you don't enable Extended REs with -r
(GNU sed
/ssed
/ast
/busybox sed
) or -E
(BSDs, ast
, recent GNU, recent busybox) or PCREs with -R
(ssed
) or Augmented REs with -A
/-X
(ast
) which all have extra RE operators.
A few ground rules when dealing with arbitrary data:
- Don't use
echo
- quote your variables
- consider the impact of the locale (especially its character set: it's important that the escaping
sed
commands are run in the same locale as thesed
command using the escaped strings (and with the samesed
command) for instance) - don't forget about the newline character (here you may want to check if
$lhs
contains any and take action).
Another option is to use perl
instead of sed
and pass the strings in the environment and use the Q
/E
perl
regexp operators for taking strings literally:
A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'
perl
(by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed
, you could achieve the same by fixing the locale to C
with LC_ALL=C
for all sed
commands (though that will also affect the language of error messages, if any).
What if I need to escape double quotes?
â Menon
May 8 '15 at 7:31
@Menon, double quotes are not special tosed
, you don't need to escape them.
â Stéphane Chazelas
May 8 '15 at 8:26
This cannot be used for pattern matching using wildcard, can it?
â Menon
May 13 '15 at 7:16
@Menon, no, wildcard pattern matching as withfind
's-name
is different from regular expressions. There you only need to escape?
,*
backslash and[
â Stéphane Chazelas
May 13 '15 at 7:35
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
19
down vote
accepted
To escape variables to be used on the left hand side and right hand side of a s
command in sed
(here $lhs
and $rhs
respectively), you'd do:
escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')
sed "s/$escaped_lhs/$escaped_rhs/"
Note that $lhs
cannot contain a newline character.
That is, on the LHS, escape all the regexp operators (][.^$*
), the escaping character itself (), and the separator (
/
).
On the RHS, you only need to escape &
, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/
)).
That assumes you use /
as a separator in your sed
s
commands and that you don't enable Extended REs with -r
(GNU sed
/ssed
/ast
/busybox sed
) or -E
(BSDs, ast
, recent GNU, recent busybox) or PCREs with -R
(ssed
) or Augmented REs with -A
/-X
(ast
) which all have extra RE operators.
A few ground rules when dealing with arbitrary data:
- Don't use
echo
- quote your variables
- consider the impact of the locale (especially its character set: it's important that the escaping
sed
commands are run in the same locale as thesed
command using the escaped strings (and with the samesed
command) for instance) - don't forget about the newline character (here you may want to check if
$lhs
contains any and take action).
Another option is to use perl
instead of sed
and pass the strings in the environment and use the Q
/E
perl
regexp operators for taking strings literally:
A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'
perl
(by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed
, you could achieve the same by fixing the locale to C
with LC_ALL=C
for all sed
commands (though that will also affect the language of error messages, if any).
What if I need to escape double quotes?
â Menon
May 8 '15 at 7:31
@Menon, double quotes are not special tosed
, you don't need to escape them.
â Stéphane Chazelas
May 8 '15 at 8:26
This cannot be used for pattern matching using wildcard, can it?
â Menon
May 13 '15 at 7:16
@Menon, no, wildcard pattern matching as withfind
's-name
is different from regular expressions. There you only need to escape?
,*
backslash and[
â Stéphane Chazelas
May 13 '15 at 7:35
add a comment |Â
up vote
19
down vote
accepted
To escape variables to be used on the left hand side and right hand side of a s
command in sed
(here $lhs
and $rhs
respectively), you'd do:
escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')
sed "s/$escaped_lhs/$escaped_rhs/"
Note that $lhs
cannot contain a newline character.
That is, on the LHS, escape all the regexp operators (][.^$*
), the escaping character itself (), and the separator (
/
).
On the RHS, you only need to escape &
, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/
)).
That assumes you use /
as a separator in your sed
s
commands and that you don't enable Extended REs with -r
(GNU sed
/ssed
/ast
/busybox sed
) or -E
(BSDs, ast
, recent GNU, recent busybox) or PCREs with -R
(ssed
) or Augmented REs with -A
/-X
(ast
) which all have extra RE operators.
A few ground rules when dealing with arbitrary data:
- Don't use
echo
- quote your variables
- consider the impact of the locale (especially its character set: it's important that the escaping
sed
commands are run in the same locale as thesed
command using the escaped strings (and with the samesed
command) for instance) - don't forget about the newline character (here you may want to check if
$lhs
contains any and take action).
Another option is to use perl
instead of sed
and pass the strings in the environment and use the Q
/E
perl
regexp operators for taking strings literally:
A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'
perl
(by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed
, you could achieve the same by fixing the locale to C
with LC_ALL=C
for all sed
commands (though that will also affect the language of error messages, if any).
What if I need to escape double quotes?
â Menon
May 8 '15 at 7:31
@Menon, double quotes are not special tosed
, you don't need to escape them.
â Stéphane Chazelas
May 8 '15 at 8:26
This cannot be used for pattern matching using wildcard, can it?
â Menon
May 13 '15 at 7:16
@Menon, no, wildcard pattern matching as withfind
's-name
is different from regular expressions. There you only need to escape?
,*
backslash and[
â Stéphane Chazelas
May 13 '15 at 7:35
add a comment |Â
up vote
19
down vote
accepted
up vote
19
down vote
accepted
To escape variables to be used on the left hand side and right hand side of a s
command in sed
(here $lhs
and $rhs
respectively), you'd do:
escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')
sed "s/$escaped_lhs/$escaped_rhs/"
Note that $lhs
cannot contain a newline character.
That is, on the LHS, escape all the regexp operators (][.^$*
), the escaping character itself (), and the separator (
/
).
On the RHS, you only need to escape &
, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/
)).
That assumes you use /
as a separator in your sed
s
commands and that you don't enable Extended REs with -r
(GNU sed
/ssed
/ast
/busybox sed
) or -E
(BSDs, ast
, recent GNU, recent busybox) or PCREs with -R
(ssed
) or Augmented REs with -A
/-X
(ast
) which all have extra RE operators.
A few ground rules when dealing with arbitrary data:
- Don't use
echo
- quote your variables
- consider the impact of the locale (especially its character set: it's important that the escaping
sed
commands are run in the same locale as thesed
command using the escaped strings (and with the samesed
command) for instance) - don't forget about the newline character (here you may want to check if
$lhs
contains any and take action).
Another option is to use perl
instead of sed
and pass the strings in the environment and use the Q
/E
perl
regexp operators for taking strings literally:
A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'
perl
(by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed
, you could achieve the same by fixing the locale to C
with LC_ALL=C
for all sed
commands (though that will also affect the language of error messages, if any).
To escape variables to be used on the left hand side and right hand side of a s
command in sed
(here $lhs
and $rhs
respectively), you'd do:
escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')
sed "s/$escaped_lhs/$escaped_rhs/"
Note that $lhs
cannot contain a newline character.
That is, on the LHS, escape all the regexp operators (][.^$*
), the escaping character itself (), and the separator (
/
).
On the RHS, you only need to escape &
, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/
)).
That assumes you use /
as a separator in your sed
s
commands and that you don't enable Extended REs with -r
(GNU sed
/ssed
/ast
/busybox sed
) or -E
(BSDs, ast
, recent GNU, recent busybox) or PCREs with -R
(ssed
) or Augmented REs with -A
/-X
(ast
) which all have extra RE operators.
A few ground rules when dealing with arbitrary data:
- Don't use
echo
- quote your variables
- consider the impact of the locale (especially its character set: it's important that the escaping
sed
commands are run in the same locale as thesed
command using the escaped strings (and with the samesed
command) for instance) - don't forget about the newline character (here you may want to check if
$lhs
contains any and take action).
Another option is to use perl
instead of sed
and pass the strings in the environment and use the Q
/E
perl
regexp operators for taking strings literally:
A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'
perl
(by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed
, you could achieve the same by fixing the locale to C
with LC_ALL=C
for all sed
commands (though that will also affect the language of error messages, if any).
edited Jun 11 at 14:57
answered May 12 '14 at 14:46
Stéphane Chazelas
281k53518849
281k53518849
What if I need to escape double quotes?
â Menon
May 8 '15 at 7:31
@Menon, double quotes are not special tosed
, you don't need to escape them.
â Stéphane Chazelas
May 8 '15 at 8:26
This cannot be used for pattern matching using wildcard, can it?
â Menon
May 13 '15 at 7:16
@Menon, no, wildcard pattern matching as withfind
's-name
is different from regular expressions. There you only need to escape?
,*
backslash and[
â Stéphane Chazelas
May 13 '15 at 7:35
add a comment |Â
What if I need to escape double quotes?
â Menon
May 8 '15 at 7:31
@Menon, double quotes are not special tosed
, you don't need to escape them.
â Stéphane Chazelas
May 8 '15 at 8:26
This cannot be used for pattern matching using wildcard, can it?
â Menon
May 13 '15 at 7:16
@Menon, no, wildcard pattern matching as withfind
's-name
is different from regular expressions. There you only need to escape?
,*
backslash and[
â Stéphane Chazelas
May 13 '15 at 7:35
What if I need to escape double quotes?
â Menon
May 8 '15 at 7:31
What if I need to escape double quotes?
â Menon
May 8 '15 at 7:31
@Menon, double quotes are not special to
sed
, you don't need to escape them.â Stéphane Chazelas
May 8 '15 at 8:26
@Menon, double quotes are not special to
sed
, you don't need to escape them.â Stéphane Chazelas
May 8 '15 at 8:26
This cannot be used for pattern matching using wildcard, can it?
â Menon
May 13 '15 at 7:16
This cannot be used for pattern matching using wildcard, can it?
â Menon
May 13 '15 at 7:16
@Menon, no, wildcard pattern matching as with
find
's -name
is different from regular expressions. There you only need to escape ?
, *
backslash and [
â Stéphane Chazelas
May 13 '15 at 7:35
@Menon, no, wildcard pattern matching as with
find
's -name
is different from regular expressions. There you only need to escape ?
, *
backslash and [
â Stéphane Chazelas
May 13 '15 at 7:35
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f129059%2fhow-to-ensure-that-string-interpolated-into-sed-substitution-escapes-all-metac%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password