How to ensure that string interpolated into `sed` substitution escapes all metachars

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
15
down vote

favorite
5












I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f. The generated sed commands are like:



s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g


Assume the script which generates the sed commands is something like:



while read cid fileid
do
cidpat="$(echo $cid | sed -e s/\./\\./g)"
echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
done


How can I improve the script to ensure all regex metacharacters in the cid string are escaped and interpolated properly?







share|improve this question


























    up vote
    15
    down vote

    favorite
    5












    I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f. The generated sed commands are like:



    s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
    s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
    s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g


    Assume the script which generates the sed commands is something like:



    while read cid fileid
    do
    cidpat="$(echo $cid | sed -e s/\./\\./g)"
    echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
    done


    How can I improve the script to ensure all regex metacharacters in the cid string are escaped and interpolated properly?







    share|improve this question
























      up vote
      15
      down vote

      favorite
      5









      up vote
      15
      down vote

      favorite
      5






      5





      I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f. The generated sed commands are like:



      s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
      s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
      s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g


      Assume the script which generates the sed commands is something like:



      while read cid fileid
      do
      cidpat="$(echo $cid | sed -e s/\./\\./g)"
      echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
      done


      How can I improve the script to ensure all regex metacharacters in the cid string are escaped and interpolated properly?







      share|improve this question














      I have a script that reads a text stream and generates a file of sed commands that is later run with sed -f. The generated sed commands are like:



      s/cid:image002.gif@01CC3D46.926E77E0/https://mysite.com/files/1922/g
      s/cid:image003.gif@01CC3D46.926E77E0/https://mysite.com/files/1923/g
      s/cid:image004.jpg@01CC3D46.926E77E0/https://mysite.com/files/1924/g


      Assume the script which generates the sed commands is something like:



      while read cid fileid
      do
      cidpat="$(echo $cid | sed -e s/\./\\./g)"
      echo 's/'"$cidpat"'/https://mysite.com/files/'"$fileid"'/g' >> sedscr
      done


      How can I improve the script to ensure all regex metacharacters in the cid string are escaped and interpolated properly?









      share|improve this question













      share|improve this question




      share|improve this question








      edited May 12 '14 at 23:12









      Gilles

      506k11910011529




      506k11910011529










      asked May 12 '14 at 14:26









      dan

      1,46721625




      1,46721625




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          19
          down vote



          accepted










          To escape variables to be used on the left hand side and right hand side of a s command in sed (here $lhs and $rhs respectively), you'd do:



          escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
          escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')

          sed "s/$escaped_lhs/$escaped_rhs/"


          Note that $lhs cannot contain a newline character.



          That is, on the LHS, escape all the regexp operators (][.^$*), the escaping character itself (), and the separator (/).



          On the RHS, you only need to escape &, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/)).



          That assumes you use / as a separator in your sed s commands and that you don't enable Extended REs with -r (GNU sed/ssed/ast/busybox sed) or -E (BSDs, ast, recent GNU, recent busybox) or PCREs with -R (ssed) or Augmented REs with -A/-X (ast) which all have extra RE operators.



          A few ground rules when dealing with arbitrary data:



          • Don't use echo

          • quote your variables

          • consider the impact of the locale (especially its character set: it's important that the escaping sed commands are run in the same locale as the sed command using the escaped strings (and with the same sed command) for instance)

          • don't forget about the newline character (here you may want to check if $lhs contains any and take action).

          Another option is to use perl instead of sed and pass the strings in the environment and use the Q/E perl regexp operators for taking strings literally:



          A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'


          perl (by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed, you could achieve the same by fixing the locale to C with LC_ALL=C for all sed commands (though that will also affect the language of error messages, if any).






          share|improve this answer






















          • What if I need to escape double quotes?
            – Menon
            May 8 '15 at 7:31











          • @Menon, double quotes are not special to sed, you don't need to escape them.
            – Stéphane Chazelas
            May 8 '15 at 8:26










          • This cannot be used for pattern matching using wildcard, can it?
            – Menon
            May 13 '15 at 7:16










          • @Menon, no, wildcard pattern matching as with find's -name is different from regular expressions. There you only need to escape ?, * backslash and [
            – Stéphane Chazelas
            May 13 '15 at 7:35










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f129059%2fhow-to-ensure-that-string-interpolated-into-sed-substitution-escapes-all-metac%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          19
          down vote



          accepted










          To escape variables to be used on the left hand side and right hand side of a s command in sed (here $lhs and $rhs respectively), you'd do:



          escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
          escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')

          sed "s/$escaped_lhs/$escaped_rhs/"


          Note that $lhs cannot contain a newline character.



          That is, on the LHS, escape all the regexp operators (][.^$*), the escaping character itself (), and the separator (/).



          On the RHS, you only need to escape &, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/)).



          That assumes you use / as a separator in your sed s commands and that you don't enable Extended REs with -r (GNU sed/ssed/ast/busybox sed) or -E (BSDs, ast, recent GNU, recent busybox) or PCREs with -R (ssed) or Augmented REs with -A/-X (ast) which all have extra RE operators.



          A few ground rules when dealing with arbitrary data:



          • Don't use echo

          • quote your variables

          • consider the impact of the locale (especially its character set: it's important that the escaping sed commands are run in the same locale as the sed command using the escaped strings (and with the same sed command) for instance)

          • don't forget about the newline character (here you may want to check if $lhs contains any and take action).

          Another option is to use perl instead of sed and pass the strings in the environment and use the Q/E perl regexp operators for taking strings literally:



          A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'


          perl (by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed, you could achieve the same by fixing the locale to C with LC_ALL=C for all sed commands (though that will also affect the language of error messages, if any).






          share|improve this answer






















          • What if I need to escape double quotes?
            – Menon
            May 8 '15 at 7:31











          • @Menon, double quotes are not special to sed, you don't need to escape them.
            – Stéphane Chazelas
            May 8 '15 at 8:26










          • This cannot be used for pattern matching using wildcard, can it?
            – Menon
            May 13 '15 at 7:16










          • @Menon, no, wildcard pattern matching as with find's -name is different from regular expressions. There you only need to escape ?, * backslash and [
            – Stéphane Chazelas
            May 13 '15 at 7:35














          up vote
          19
          down vote



          accepted










          To escape variables to be used on the left hand side and right hand side of a s command in sed (here $lhs and $rhs respectively), you'd do:



          escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
          escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')

          sed "s/$escaped_lhs/$escaped_rhs/"


          Note that $lhs cannot contain a newline character.



          That is, on the LHS, escape all the regexp operators (][.^$*), the escaping character itself (), and the separator (/).



          On the RHS, you only need to escape &, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/)).



          That assumes you use / as a separator in your sed s commands and that you don't enable Extended REs with -r (GNU sed/ssed/ast/busybox sed) or -E (BSDs, ast, recent GNU, recent busybox) or PCREs with -R (ssed) or Augmented REs with -A/-X (ast) which all have extra RE operators.



          A few ground rules when dealing with arbitrary data:



          • Don't use echo

          • quote your variables

          • consider the impact of the locale (especially its character set: it's important that the escaping sed commands are run in the same locale as the sed command using the escaped strings (and with the same sed command) for instance)

          • don't forget about the newline character (here you may want to check if $lhs contains any and take action).

          Another option is to use perl instead of sed and pass the strings in the environment and use the Q/E perl regexp operators for taking strings literally:



          A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'


          perl (by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed, you could achieve the same by fixing the locale to C with LC_ALL=C for all sed commands (though that will also affect the language of error messages, if any).






          share|improve this answer






















          • What if I need to escape double quotes?
            – Menon
            May 8 '15 at 7:31











          • @Menon, double quotes are not special to sed, you don't need to escape them.
            – Stéphane Chazelas
            May 8 '15 at 8:26










          • This cannot be used for pattern matching using wildcard, can it?
            – Menon
            May 13 '15 at 7:16










          • @Menon, no, wildcard pattern matching as with find's -name is different from regular expressions. There you only need to escape ?, * backslash and [
            – Stéphane Chazelas
            May 13 '15 at 7:35












          up vote
          19
          down vote



          accepted







          up vote
          19
          down vote



          accepted






          To escape variables to be used on the left hand side and right hand side of a s command in sed (here $lhs and $rhs respectively), you'd do:



          escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
          escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')

          sed "s/$escaped_lhs/$escaped_rhs/"


          Note that $lhs cannot contain a newline character.



          That is, on the LHS, escape all the regexp operators (][.^$*), the escaping character itself (), and the separator (/).



          On the RHS, you only need to escape &, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/)).



          That assumes you use / as a separator in your sed s commands and that you don't enable Extended REs with -r (GNU sed/ssed/ast/busybox sed) or -E (BSDs, ast, recent GNU, recent busybox) or PCREs with -R (ssed) or Augmented REs with -A/-X (ast) which all have extra RE operators.



          A few ground rules when dealing with arbitrary data:



          • Don't use echo

          • quote your variables

          • consider the impact of the locale (especially its character set: it's important that the escaping sed commands are run in the same locale as the sed command using the escaped strings (and with the same sed command) for instance)

          • don't forget about the newline character (here you may want to check if $lhs contains any and take action).

          Another option is to use perl instead of sed and pass the strings in the environment and use the Q/E perl regexp operators for taking strings literally:



          A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'


          perl (by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed, you could achieve the same by fixing the locale to C with LC_ALL=C for all sed commands (though that will also affect the language of error messages, if any).






          share|improve this answer














          To escape variables to be used on the left hand side and right hand side of a s command in sed (here $lhs and $rhs respectively), you'd do:



          escaped_lhs=$(printf '%sn' "$lhs" | sed 's:[/.^$*]:\&:g')
          escaped_rhs=$(printf '%sn' "$rhs" | sed 's:[/&]:\&:g;$!s/$/\/')

          sed "s/$escaped_lhs/$escaped_rhs/"


          Note that $lhs cannot contain a newline character.



          That is, on the LHS, escape all the regexp operators (][.^$*), the escaping character itself (), and the separator (/).



          On the RHS, you only need to escape &, the separator, backslash and the newline character (which you do by inserting a backslash at the end of each line except the last one ($!s/$/\/)).



          That assumes you use / as a separator in your sed s commands and that you don't enable Extended REs with -r (GNU sed/ssed/ast/busybox sed) or -E (BSDs, ast, recent GNU, recent busybox) or PCREs with -R (ssed) or Augmented REs with -A/-X (ast) which all have extra RE operators.



          A few ground rules when dealing with arbitrary data:



          • Don't use echo

          • quote your variables

          • consider the impact of the locale (especially its character set: it's important that the escaping sed commands are run in the same locale as the sed command using the escaped strings (and with the same sed command) for instance)

          • don't forget about the newline character (here you may want to check if $lhs contains any and take action).

          Another option is to use perl instead of sed and pass the strings in the environment and use the Q/E perl regexp operators for taking strings literally:



          A=lhs B=rhs perl -pe 's/Q$ENVAE/$ENVB/g'


          perl (by default) will not be affected by the locale's character set as, in the above, it only considers the strings as arrays of bytes without caring about what characters (if any) they may represent for the user. With sed, you could achieve the same by fixing the locale to C with LC_ALL=C for all sed commands (though that will also affect the language of error messages, if any).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jun 11 at 14:57

























          answered May 12 '14 at 14:46









          Stéphane Chazelas

          281k53518849




          281k53518849











          • What if I need to escape double quotes?
            – Menon
            May 8 '15 at 7:31











          • @Menon, double quotes are not special to sed, you don't need to escape them.
            – Stéphane Chazelas
            May 8 '15 at 8:26










          • This cannot be used for pattern matching using wildcard, can it?
            – Menon
            May 13 '15 at 7:16










          • @Menon, no, wildcard pattern matching as with find's -name is different from regular expressions. There you only need to escape ?, * backslash and [
            – Stéphane Chazelas
            May 13 '15 at 7:35
















          • What if I need to escape double quotes?
            – Menon
            May 8 '15 at 7:31











          • @Menon, double quotes are not special to sed, you don't need to escape them.
            – Stéphane Chazelas
            May 8 '15 at 8:26










          • This cannot be used for pattern matching using wildcard, can it?
            – Menon
            May 13 '15 at 7:16










          • @Menon, no, wildcard pattern matching as with find's -name is different from regular expressions. There you only need to escape ?, * backslash and [
            – Stéphane Chazelas
            May 13 '15 at 7:35















          What if I need to escape double quotes?
          – Menon
          May 8 '15 at 7:31





          What if I need to escape double quotes?
          – Menon
          May 8 '15 at 7:31













          @Menon, double quotes are not special to sed, you don't need to escape them.
          – Stéphane Chazelas
          May 8 '15 at 8:26




          @Menon, double quotes are not special to sed, you don't need to escape them.
          – Stéphane Chazelas
          May 8 '15 at 8:26












          This cannot be used for pattern matching using wildcard, can it?
          – Menon
          May 13 '15 at 7:16




          This cannot be used for pattern matching using wildcard, can it?
          – Menon
          May 13 '15 at 7:16












          @Menon, no, wildcard pattern matching as with find's -name is different from regular expressions. There you only need to escape ?, * backslash and [
          – Stéphane Chazelas
          May 13 '15 at 7:35




          @Menon, no, wildcard pattern matching as with find's -name is different from regular expressions. There you only need to escape ?, * backslash and [
          – Stéphane Chazelas
          May 13 '15 at 7:35












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f129059%2fhow-to-ensure-that-string-interpolated-into-sed-substitution-escapes-all-metac%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?