How to improve this 'sed' search & replace command?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












1















How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?



There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.



After some searching and careful study of manpages for grep, xargs and sed, this is the best "search and replace" command I've been able to build for Bash:



grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'


(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)



This one-liner has multiple features:



  • Explicitly ignores binary files, for safety (not sure if this is really needed, though)

  • Uses grep | xargs to filter out candidate files and provide good performance in huge directories

  • Accepts patterns that start with a dash (-)

  • Accepts paths with spaces

  • Accepts regex capture groups in the search patterns

But due to deficiencies in the sed feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).



Resorting to a while loop makes it run as many times as needed to really cover all possible substitutions:



while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done


But now Bash cannot store null bytes, so options grep -Z and xargs -0 had to be dropped. I believe this drops compatibility with paths that contain spaces.



  • Is it possible to combine the while loop solution with the -Z, -0 options to support paths with spaces?


  • Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)



EDIT: Adding an example where the greedy regexp in sed is a problem for the non-loop version.



With this input line:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")


The pattern (gst.*)1.5 would match this:



set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")


Because it is greedy, it gets the from the first gst to the last 1.5. Say the substitution is 1AAA: the 1 will keep the (capture group), and the AAA will just print these letters instead of the original 1.5. The result will be:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")


So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.










share|improve this question



















  • 1





    @j1elo the null delimiter (-d '') is much more significant than the IFS= (which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)

    – steeldriver
    Feb 15 at 18:47






  • 1





    bash not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0 works perfectly fine. That said, this looks at the end of the day to be a sed question; the main way I work around greediness is with a smart pattern that eschews .*_ in favor of [^_]*_ (presuming you're looking for a nongreedy match up to an excluding a _). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5, for instance.

    – DopeGhoti
    Feb 15 at 19:08







  • 1





    Fundamentally what's confusing me here is why you believe that how you pass file arguments to sed has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?

    – steeldriver
    Feb 15 at 19:44






  • 2





    Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)

    – glenn jackman
    Feb 15 at 20:11






  • 1





    Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)

    – DopeGhoti
    Feb 15 at 21:56















1















How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?



There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.



After some searching and careful study of manpages for grep, xargs and sed, this is the best "search and replace" command I've been able to build for Bash:



grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'


(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)



This one-liner has multiple features:



  • Explicitly ignores binary files, for safety (not sure if this is really needed, though)

  • Uses grep | xargs to filter out candidate files and provide good performance in huge directories

  • Accepts patterns that start with a dash (-)

  • Accepts paths with spaces

  • Accepts regex capture groups in the search patterns

But due to deficiencies in the sed feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).



Resorting to a while loop makes it run as many times as needed to really cover all possible substitutions:



while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done


But now Bash cannot store null bytes, so options grep -Z and xargs -0 had to be dropped. I believe this drops compatibility with paths that contain spaces.



  • Is it possible to combine the while loop solution with the -Z, -0 options to support paths with spaces?


  • Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)



EDIT: Adding an example where the greedy regexp in sed is a problem for the non-loop version.



With this input line:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")


The pattern (gst.*)1.5 would match this:



set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")


Because it is greedy, it gets the from the first gst to the last 1.5. Say the substitution is 1AAA: the 1 will keep the (capture group), and the AAA will just print these letters instead of the original 1.5. The result will be:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")


So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.










share|improve this question



















  • 1





    @j1elo the null delimiter (-d '') is much more significant than the IFS= (which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)

    – steeldriver
    Feb 15 at 18:47






  • 1





    bash not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0 works perfectly fine. That said, this looks at the end of the day to be a sed question; the main way I work around greediness is with a smart pattern that eschews .*_ in favor of [^_]*_ (presuming you're looking for a nongreedy match up to an excluding a _). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5, for instance.

    – DopeGhoti
    Feb 15 at 19:08







  • 1





    Fundamentally what's confusing me here is why you believe that how you pass file arguments to sed has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?

    – steeldriver
    Feb 15 at 19:44






  • 2





    Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)

    – glenn jackman
    Feb 15 at 20:11






  • 1





    Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)

    – DopeGhoti
    Feb 15 at 21:56













1












1








1








How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?



There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.



After some searching and careful study of manpages for grep, xargs and sed, this is the best "search and replace" command I've been able to build for Bash:



grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'


(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)



This one-liner has multiple features:



  • Explicitly ignores binary files, for safety (not sure if this is really needed, though)

  • Uses grep | xargs to filter out candidate files and provide good performance in huge directories

  • Accepts patterns that start with a dash (-)

  • Accepts paths with spaces

  • Accepts regex capture groups in the search patterns

But due to deficiencies in the sed feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).



Resorting to a while loop makes it run as many times as needed to really cover all possible substitutions:



while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done


But now Bash cannot store null bytes, so options grep -Z and xargs -0 had to be dropped. I believe this drops compatibility with paths that contain spaces.



  • Is it possible to combine the while loop solution with the -Z, -0 options to support paths with spaces?


  • Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)



EDIT: Adding an example where the greedy regexp in sed is a problem for the non-loop version.



With this input line:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")


The pattern (gst.*)1.5 would match this:



set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")


Because it is greedy, it gets the from the first gst to the last 1.5. Say the substitution is 1AAA: the 1 will keep the (capture group), and the AAA will just print these letters instead of the original 1.5. The result will be:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")


So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.










share|improve this question
















How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?



There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.



After some searching and careful study of manpages for grep, xargs and sed, this is the best "search and replace" command I've been able to build for Bash:



grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'


(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)



This one-liner has multiple features:



  • Explicitly ignores binary files, for safety (not sure if this is really needed, though)

  • Uses grep | xargs to filter out candidate files and provide good performance in huge directories

  • Accepts patterns that start with a dash (-)

  • Accepts paths with spaces

  • Accepts regex capture groups in the search patterns

But due to deficiencies in the sed feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).



Resorting to a while loop makes it run as many times as needed to really cover all possible substitutions:



while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done


But now Bash cannot store null bytes, so options grep -Z and xargs -0 had to be dropped. I believe this drops compatibility with paths that contain spaces.



  • Is it possible to combine the while loop solution with the -Z, -0 options to support paths with spaces?


  • Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)



EDIT: Adding an example where the greedy regexp in sed is a problem for the non-loop version.



With this input line:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")


The pattern (gst.*)1.5 would match this:



set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")


Because it is greedy, it gets the from the first gst to the last 1.5. Say the substitution is 1AAA: the 1 will keep the (capture group), and the AAA will just print these letters instead of the original 1.5. The result will be:



set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")


So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.







sed grep regular-expression xargs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 15 at 18:03







j1elo

















asked Feb 15 at 17:36









j1eloj1elo

1165




1165







  • 1





    @j1elo the null delimiter (-d '') is much more significant than the IFS= (which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)

    – steeldriver
    Feb 15 at 18:47






  • 1





    bash not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0 works perfectly fine. That said, this looks at the end of the day to be a sed question; the main way I work around greediness is with a smart pattern that eschews .*_ in favor of [^_]*_ (presuming you're looking for a nongreedy match up to an excluding a _). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5, for instance.

    – DopeGhoti
    Feb 15 at 19:08







  • 1





    Fundamentally what's confusing me here is why you believe that how you pass file arguments to sed has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?

    – steeldriver
    Feb 15 at 19:44






  • 2





    Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)

    – glenn jackman
    Feb 15 at 20:11






  • 1





    Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)

    – DopeGhoti
    Feb 15 at 21:56












  • 1





    @j1elo the null delimiter (-d '') is much more significant than the IFS= (which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)

    – steeldriver
    Feb 15 at 18:47






  • 1





    bash not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0 works perfectly fine. That said, this looks at the end of the day to be a sed question; the main way I work around greediness is with a smart pattern that eschews .*_ in favor of [^_]*_ (presuming you're looking for a nongreedy match up to an excluding a _). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5, for instance.

    – DopeGhoti
    Feb 15 at 19:08







  • 1





    Fundamentally what's confusing me here is why you believe that how you pass file arguments to sed has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?

    – steeldriver
    Feb 15 at 19:44






  • 2





    Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)

    – glenn jackman
    Feb 15 at 20:11






  • 1





    Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)

    – DopeGhoti
    Feb 15 at 21:56







1




1





@j1elo the null delimiter (-d '') is much more significant than the IFS= (which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)

– steeldriver
Feb 15 at 18:47





@j1elo the null delimiter (-d '') is much more significant than the IFS= (which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)

– steeldriver
Feb 15 at 18:47




1




1





bash not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0 works perfectly fine. That said, this looks at the end of the day to be a sed question; the main way I work around greediness is with a smart pattern that eschews .*_ in favor of [^_]*_ (presuming you're looking for a nongreedy match up to an excluding a _). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5, for instance.

– DopeGhoti
Feb 15 at 19:08






bash not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0 works perfectly fine. That said, this looks at the end of the day to be a sed question; the main way I work around greediness is with a smart pattern that eschews .*_ in favor of [^_]*_ (presuming you're looking for a nongreedy match up to an excluding a _). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5, for instance.

– DopeGhoti
Feb 15 at 19:08





1




1





Fundamentally what's confusing me here is why you believe that how you pass file arguments to sed has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?

– steeldriver
Feb 15 at 19:44





Fundamentally what's confusing me here is why you believe that how you pass file arguments to sed has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?

– steeldriver
Feb 15 at 19:44




2




2





Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)

– glenn jackman
Feb 15 at 20:11





Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)

– glenn jackman
Feb 15 at 20:11




1




1





Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)

– DopeGhoti
Feb 15 at 21:56





Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)

– DopeGhoti
Feb 15 at 21:56










0






active

oldest

votes











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500920%2fhow-to-improve-this-sed-search-replace-command%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500920%2fhow-to-improve-this-sed-search-replace-command%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?