How to improve this 'sed' search & replace command?
Clash Royale CLAN TAG#URR8PPP
How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?
There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.
After some searching and careful study of manpages for grep
, xargs
and sed
, this is the best "search and replace" command I've been able to build for Bash:
grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)
This one-liner has multiple features:
- Explicitly ignores binary files, for safety (not sure if this is really needed, though)
- Uses
grep | xargs
to filter out candidate files and provide good performance in huge directories - Accepts patterns that start with a dash (
-
) - Accepts paths with spaces
- Accepts regex capture groups in the search patterns
But due to deficiencies in the sed
feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).
Resorting to a while
loop makes it run as many times as needed to really cover all possible substitutions:
while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done
But now Bash cannot store null bytes, so options grep -Z
and xargs -0
had to be dropped. I believe this drops compatibility with paths that contain spaces.
Is it possible to combine the
while
loop solution with the-Z
,-0
options to support paths with spaces?Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)
EDIT: Adding an example where the greedy regexp in sed
is a problem for the non-loop version.
With this input line:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")
The pattern (gst.*)1.5
would match this:
set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")
Because it is greedy, it gets the from the first gst
to the last 1.5
. Say the substitution is 1AAA
: the 1
will keep the (capture group), and the AAA
will just print these letters instead of the original 1.5
. The result will be:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")
So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while
loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.
sed grep regular-expression xargs
|
show 5 more comments
How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?
There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.
After some searching and careful study of manpages for grep
, xargs
and sed
, this is the best "search and replace" command I've been able to build for Bash:
grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)
This one-liner has multiple features:
- Explicitly ignores binary files, for safety (not sure if this is really needed, though)
- Uses
grep | xargs
to filter out candidate files and provide good performance in huge directories - Accepts patterns that start with a dash (
-
) - Accepts paths with spaces
- Accepts regex capture groups in the search patterns
But due to deficiencies in the sed
feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).
Resorting to a while
loop makes it run as many times as needed to really cover all possible substitutions:
while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done
But now Bash cannot store null bytes, so options grep -Z
and xargs -0
had to be dropped. I believe this drops compatibility with paths that contain spaces.
Is it possible to combine the
while
loop solution with the-Z
,-0
options to support paths with spaces?Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)
EDIT: Adding an example where the greedy regexp in sed
is a problem for the non-loop version.
With this input line:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")
The pattern (gst.*)1.5
would match this:
set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")
Because it is greedy, it gets the from the first gst
to the last 1.5
. Say the substitution is 1AAA
: the 1
will keep the (capture group), and the AAA
will just print these letters instead of the original 1.5
. The result will be:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")
So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while
loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.
sed grep regular-expression xargs
1
@j1elo the null delimiter (-d ''
) is much more significant than theIFS=
(which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)
– steeldriver
Feb 15 at 18:47
1
bash
not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline.find -print0 | xargs -0
works perfectly fine. That said, this looks at the end of the day to be ased
question; the main way I work around greediness is with a smart pattern that eschews.*_
in favor of[^_]*_
(presuming you're looking for a nongreedy match up to an excluding a_
). In your example case, I'd use a pattern like(gst[-a-zA-Z]*)1.5
, for instance.
– DopeGhoti
Feb 15 at 19:08
1
Fundamentally what's confusing me here is why you believe that how you pass file arguments tosed
has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?
– steeldriver
Feb 15 at 19:44
2
Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)
– glenn jackman
Feb 15 at 20:11
1
Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)
– DopeGhoti
Feb 15 at 21:56
|
show 5 more comments
How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?
There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.
After some searching and careful study of manpages for grep
, xargs
and sed
, this is the best "search and replace" command I've been able to build for Bash:
grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)
This one-liner has multiple features:
- Explicitly ignores binary files, for safety (not sure if this is really needed, though)
- Uses
grep | xargs
to filter out candidate files and provide good performance in huge directories - Accepts patterns that start with a dash (
-
) - Accepts paths with spaces
- Accepts regex capture groups in the search patterns
But due to deficiencies in the sed
feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).
Resorting to a while
loop makes it run as many times as needed to really cover all possible substitutions:
while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done
But now Bash cannot store null bytes, so options grep -Z
and xargs -0
had to be dropped. I believe this drops compatibility with paths that contain spaces.
Is it possible to combine the
while
loop solution with the-Z
,-0
options to support paths with spaces?Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)
EDIT: Adding an example where the greedy regexp in sed
is a problem for the non-loop version.
With this input line:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")
The pattern (gst.*)1.5
would match this:
set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")
Because it is greedy, it gets the from the first gst
to the last 1.5
. Say the substitution is 1AAA
: the 1
will keep the (capture group), and the AAA
will just print these letters instead of the original 1.5
. The result will be:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")
So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while
loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.
sed grep regular-expression xargs
How to search and replace recursively through multiple files in a directory, using essential tools installed in mostly any Debian/Ubuntu machine?
There are multiple answers in Stack* where to find answers to this question, such as here or here. But all are somewhat lacking in essential ways. They don't provide a correct solution except for some "easy" subset of possible inputs.
After some searching and careful study of manpages for grep
, xargs
and sed
, this is the best "search and replace" command I've been able to build for Bash:
grep -ErlIZ -- '<OldPattern>' . | xargs -0rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
(note I want being able to use helpful and advanced shell features where possible so I'm not worried much yet about POSIX or portability -- and I don't care much either about mostly outdated versions of GNU tools in Mac)
This one-liner has multiple features:
- Explicitly ignores binary files, for safety (not sure if this is really needed, though)
- Uses
grep | xargs
to filter out candidate files and provide good performance in huge directories - Accepts patterns that start with a dash (
-
) - Accepts paths with spaces
- Accepts regex capture groups in the search patterns
But due to deficiencies in the sed
feature set, the regex engine is always greedy and there is no option to disable this behavior (only ugly workarounds). This means only one substitution can be done per line, at least for some cases (I can show some examples if requested).
Resorting to a while
loop makes it run as many times as needed to really cover all possible substitutions:
while FILES="$(grep -ErlI -- '<OldPattern>' .)"; do
echo "$FILES" | xargs -rL1 sed -ri 's/<OldPattern>/<NewPattern>/g'
done
But now Bash cannot store null bytes, so options grep -Z
and xargs -0
had to be dropped. I believe this drops compatibility with paths that contain spaces.
Is it possible to combine the
while
loop solution with the-Z
,-0
options to support paths with spaces?Or maybe... is there any other, different but better way to build a robust and reliable search-and-replace command? (succinct is a feature so, as close to a one-liner as possible)
EDIT: Adding an example where the greedy regexp in sed
is a problem for the non-loop version.
With this input line:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5 libjsonrpc")
The pattern (gst.*)1.5
would match this:
set(requires "[gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-1.5] libjsonrpc")
Because it is greedy, it gets the from the first gst
to the last 1.5
. Say the substitution is 1AAA
: the 1
will keep the (capture group), and the AAA
will just print these letters instead of the original 1.5
. The result will be:
set(requires "gstreamer-1.5 gstreamer-base-1.5 gstreamer-sdp-AAA libjsonrpc")
So this command would need to be run 3 times in total to actually substitute all possible matches in this line. The while
loop version just runs everything once and again until the search pattern cannot be found any more, which is when the replace work has actually been finished.
sed grep regular-expression xargs
sed grep regular-expression xargs
edited Feb 15 at 18:03
j1elo
asked Feb 15 at 17:36
j1eloj1elo
1165
1165
1
@j1elo the null delimiter (-d ''
) is much more significant than theIFS=
(which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)
– steeldriver
Feb 15 at 18:47
1
bash
not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline.find -print0 | xargs -0
works perfectly fine. That said, this looks at the end of the day to be ased
question; the main way I work around greediness is with a smart pattern that eschews.*_
in favor of[^_]*_
(presuming you're looking for a nongreedy match up to an excluding a_
). In your example case, I'd use a pattern like(gst[-a-zA-Z]*)1.5
, for instance.
– DopeGhoti
Feb 15 at 19:08
1
Fundamentally what's confusing me here is why you believe that how you pass file arguments tosed
has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?
– steeldriver
Feb 15 at 19:44
2
Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)
– glenn jackman
Feb 15 at 20:11
1
Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)
– DopeGhoti
Feb 15 at 21:56
|
show 5 more comments
1
@j1elo the null delimiter (-d ''
) is much more significant than theIFS=
(which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)
– steeldriver
Feb 15 at 18:47
1
bash
not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline.find -print0 | xargs -0
works perfectly fine. That said, this looks at the end of the day to be ased
question; the main way I work around greediness is with a smart pattern that eschews.*_
in favor of[^_]*_
(presuming you're looking for a nongreedy match up to an excluding a_
). In your example case, I'd use a pattern like(gst[-a-zA-Z]*)1.5
, for instance.
– DopeGhoti
Feb 15 at 19:08
1
Fundamentally what's confusing me here is why you believe that how you pass file arguments tosed
has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?
– steeldriver
Feb 15 at 19:44
2
Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)
– glenn jackman
Feb 15 at 20:11
1
Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)
– DopeGhoti
Feb 15 at 21:56
1
1
@j1elo the null delimiter (
-d ''
) is much more significant than the IFS=
(which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)– steeldriver
Feb 15 at 18:47
@j1elo the null delimiter (
-d ''
) is much more significant than the IFS=
(which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)– steeldriver
Feb 15 at 18:47
1
1
bash
not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0
works perfectly fine. That said, this looks at the end of the day to be a sed
question; the main way I work around greediness is with a smart pattern that eschews .*_
in favor of [^_]*_
(presuming you're looking for a nongreedy match up to an excluding a _
). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5
, for instance.– DopeGhoti
Feb 15 at 19:08
bash
not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline. find -print0 | xargs -0
works perfectly fine. That said, this looks at the end of the day to be a sed
question; the main way I work around greediness is with a smart pattern that eschews .*_
in favor of [^_]*_
(presuming you're looking for a nongreedy match up to an excluding a _
). In your example case, I'd use a pattern like (gst[-a-zA-Z]*)1.5
, for instance.– DopeGhoti
Feb 15 at 19:08
1
1
Fundamentally what's confusing me here is why you believe that how you pass file arguments to
sed
has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?– steeldriver
Feb 15 at 19:44
Fundamentally what's confusing me here is why you believe that how you pass file arguments to
sed
has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?– steeldriver
Feb 15 at 19:44
2
2
Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)
– glenn jackman
Feb 15 at 20:11
Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)
– glenn jackman
Feb 15 at 20:11
1
1
Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)
– DopeGhoti
Feb 15 at 21:56
Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)
– DopeGhoti
Feb 15 at 21:56
|
show 5 more comments
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500920%2fhow-to-improve-this-sed-search-replace-command%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f500920%2fhow-to-improve-this-sed-search-replace-command%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
@j1elo the null delimiter (
-d ''
) is much more significant than theIFS=
(which AFAIK only helps with edge cases, for example where the null-delimited tokens have leading or trailing whitespace)– steeldriver
Feb 15 at 18:47
1
bash
not being able to store null bytes in a variable doesn't mean it can't send them through a pipeline.find -print0 | xargs -0
works perfectly fine. That said, this looks at the end of the day to be ased
question; the main way I work around greediness is with a smart pattern that eschews.*_
in favor of[^_]*_
(presuming you're looking for a nongreedy match up to an excluding a_
). In your example case, I'd use a pattern like(gst[-a-zA-Z]*)1.5
, for instance.– DopeGhoti
Feb 15 at 19:08
1
Fundamentally what's confusing me here is why you believe that how you pass file arguments to
sed
has any effect on how a regular expression is matched within those files (greedy or otherwise) - can you give an example where that happens?– steeldriver
Feb 15 at 19:44
2
Regarding greediness, the only regex flavour that makes non-greediness anything close to simple is perl (or pcre): have you thought about using perl in place of sed (or in place of the whole pipeline?)
– glenn jackman
Feb 15 at 20:11
1
Possible duplicate of Non-greedy match with SED regex (emulate perl's .*?)
– DopeGhoti
Feb 15 at 21:56