Nailing down vim regex substitution
Clash Royale CLAN TAG#URR8PPP
I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.
A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.
* [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March
I started out by doing a simple match:
/[[d+[_s]Week[_s]d+]],sd+w+.*/g
Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.
:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g
E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)
I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?
regular-expression vim
add a comment |
I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.
A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.
* [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March
I started out by doing a simple match:
/[[d+[_s]Week[_s]d+]],sd+w+.*/g
Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.
:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g
E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)
I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?
regular-expression vim
I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was/[[d+[_s]Week[_s]d+]],sd+w+.*/g
. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be:/[[d+[_s]Week[_s]d+]],sd+w+.*
Also, the second regex,(2018[_s]Week[_s]dd),s(d+w+)
(which should probably be(2018[_s]Week[_s]dd),s(d+w+)
is quite different from the previous search term so I’d suggest that you further edit to clarify this.
– Anthony Geoghegan
Jan 3 at 14:36
1
Thanks for your attempt to clarify Anthony.
– martshal
Jan 3 at 18:40
add a comment |
I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.
A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.
* [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March
I started out by doing a simple match:
/[[d+[_s]Week[_s]d+]],sd+w+.*/g
Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.
:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g
E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)
I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?
regular-expression vim
I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.
A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.
* [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March
I started out by doing a simple match:
/[[d+[_s]Week[_s]d+]],sd+w+.*/g
Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.
:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g
E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)
I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?
regular-expression vim
regular-expression vim
edited Jan 3 at 14:36
Anthony Geoghegan
7,66443954
7,66443954
asked Jan 3 at 12:25
martshalmartshal
384
384
I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was/[[d+[_s]Week[_s]d+]],sd+w+.*/g
. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be:/[[d+[_s]Week[_s]d+]],sd+w+.*
Also, the second regex,(2018[_s]Week[_s]dd),s(d+w+)
(which should probably be(2018[_s]Week[_s]dd),s(d+w+)
is quite different from the previous search term so I’d suggest that you further edit to clarify this.
– Anthony Geoghegan
Jan 3 at 14:36
1
Thanks for your attempt to clarify Anthony.
– martshal
Jan 3 at 18:40
add a comment |
I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was/[[d+[_s]Week[_s]d+]],sd+w+.*/g
. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be:/[[d+[_s]Week[_s]d+]],sd+w+.*
Also, the second regex,(2018[_s]Week[_s]dd),s(d+w+)
(which should probably be(2018[_s]Week[_s]dd),s(d+w+)
is quite different from the previous search term so I’d suggest that you further edit to clarify this.
– Anthony Geoghegan
Jan 3 at 14:36
1
Thanks for your attempt to clarify Anthony.
– martshal
Jan 3 at 18:40
I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was
/[[d+[_s]Week[_s]d+]],sd+w+.*/g
. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.*
Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+)
(which should probably be (2018[_s]Week[_s]dd),s(d+w+)
is quite different from the previous search term so I’d suggest that you further edit to clarify this.– Anthony Geoghegan
Jan 3 at 14:36
I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was
/[[d+[_s]Week[_s]d+]],sd+w+.*/g
. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.*
Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+)
(which should probably be (2018[_s]Week[_s]dd),s(d+w+)
is quite different from the previous search term so I’d suggest that you further edit to clarify this.– Anthony Geoghegan
Jan 3 at 14:36
1
1
Thanks for your attempt to clarify Anthony.
– martshal
Jan 3 at 18:40
Thanks for your attempt to clarify Anthony.
– martshal
Jan 3 at 18:40
add a comment |
3 Answers
3
active
oldest
votes
:%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/
You can still improve this expression by padding with s*
where appropriate, to better catch inconsistencies that invariably occur in manually typed text.
Some issues with your proposed solution:
The regular expression:
(2018[_s]Week[_s]dd),s(d+w+)
does not match, because:- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
.
[_s]
matches either an underscore, a backslash, or ans
character.
You can use_|s
instead in these situations. - The
+
character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal+
sign. - The
,s(d+w+)
part is preceded by a sequence matching]]
in the text to be matched, but]]
is missing from the pattern.
- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by
]]
, but you only matched until the part that specifies the day after the comma, usingd+w+
. This means that if the substitution succeeded, your lines would end in text that looks like:29th]] April through 5th May
, having the]]
sequence that was supposed to terminate the line somewhere in the middle.The substitution string:
[[1|12]]
is not a regular expression, therefore, characters like[
and]
need not be escaped.Also,
d+w+
, although not erroneous, is redundant, sincew
already covers everythingd
does and the way you specified its context with the preceding part of the expression, it always matches stuff like9th
, etc. and never matches anything bad.
EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":
:%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/
@user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.
Phenomenally complete answer. Thank you much.
– martshal
Jan 3 at 18:36
@martshal this answer won't work for the 6th example,(.*)
should be(,.*)?
– user1133275
Jan 3 at 19:47
Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.
– martshal
Jan 3 at 20:06
@user1133275 No, your example is missing a backslash before the?
, and would duplicate the,
as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.
– Larry
Jan 3 at 20:13
Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.
– Larry
Jan 3 at 21:06
add a comment |
If I've understood your question correctly, the following substitution should do what you want:
%s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/
Note: the ([_ ])
capture groups preserves the separator (space or underscore) for the components that appear before the |
(the separator is a space for line 5 while underscores are used in lines 8-13).
add a comment |
vim regex is nonstandard so just use the industry leading perl in vim instead;
:%!perl -pe '$RE'
you can test is outside of vim to;
> echo "[[2018_Week_18]], 29th April through 5th May"
| perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
[[2018_Week_18|Week 18, 29th April through 5th May]]
Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)
I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.
– martshal
Jan 3 at 18:45
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492225%2fnailing-down-vim-regex-substitution%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
:%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/
You can still improve this expression by padding with s*
where appropriate, to better catch inconsistencies that invariably occur in manually typed text.
Some issues with your proposed solution:
The regular expression:
(2018[_s]Week[_s]dd),s(d+w+)
does not match, because:- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
.
[_s]
matches either an underscore, a backslash, or ans
character.
You can use_|s
instead in these situations. - The
+
character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal+
sign. - The
,s(d+w+)
part is preceded by a sequence matching]]
in the text to be matched, but]]
is missing from the pattern.
- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by
]]
, but you only matched until the part that specifies the day after the comma, usingd+w+
. This means that if the substitution succeeded, your lines would end in text that looks like:29th]] April through 5th May
, having the]]
sequence that was supposed to terminate the line somewhere in the middle.The substitution string:
[[1|12]]
is not a regular expression, therefore, characters like[
and]
need not be escaped.Also,
d+w+
, although not erroneous, is redundant, sincew
already covers everythingd
does and the way you specified its context with the preceding part of the expression, it always matches stuff like9th
, etc. and never matches anything bad.
EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":
:%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/
@user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.
Phenomenally complete answer. Thank you much.
– martshal
Jan 3 at 18:36
@martshal this answer won't work for the 6th example,(.*)
should be(,.*)?
– user1133275
Jan 3 at 19:47
Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.
– martshal
Jan 3 at 20:06
@user1133275 No, your example is missing a backslash before the?
, and would duplicate the,
as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.
– Larry
Jan 3 at 20:13
Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.
– Larry
Jan 3 at 21:06
add a comment |
:%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/
You can still improve this expression by padding with s*
where appropriate, to better catch inconsistencies that invariably occur in manually typed text.
Some issues with your proposed solution:
The regular expression:
(2018[_s]Week[_s]dd),s(d+w+)
does not match, because:- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
.
[_s]
matches either an underscore, a backslash, or ans
character.
You can use_|s
instead in these situations. - The
+
character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal+
sign. - The
,s(d+w+)
part is preceded by a sequence matching]]
in the text to be matched, but]]
is missing from the pattern.
- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by
]]
, but you only matched until the part that specifies the day after the comma, usingd+w+
. This means that if the substitution succeeded, your lines would end in text that looks like:29th]] April through 5th May
, having the]]
sequence that was supposed to terminate the line somewhere in the middle.The substitution string:
[[1|12]]
is not a regular expression, therefore, characters like[
and]
need not be escaped.Also,
d+w+
, although not erroneous, is redundant, sincew
already covers everythingd
does and the way you specified its context with the preceding part of the expression, it always matches stuff like9th
, etc. and never matches anything bad.
EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":
:%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/
@user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.
Phenomenally complete answer. Thank you much.
– martshal
Jan 3 at 18:36
@martshal this answer won't work for the 6th example,(.*)
should be(,.*)?
– user1133275
Jan 3 at 19:47
Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.
– martshal
Jan 3 at 20:06
@user1133275 No, your example is missing a backslash before the?
, and would duplicate the,
as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.
– Larry
Jan 3 at 20:13
Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.
– Larry
Jan 3 at 21:06
add a comment |
:%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/
You can still improve this expression by padding with s*
where appropriate, to better catch inconsistencies that invariably occur in manually typed text.
Some issues with your proposed solution:
The regular expression:
(2018[_s]Week[_s]dd),s(d+w+)
does not match, because:- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
.
[_s]
matches either an underscore, a backslash, or ans
character.
You can use_|s
instead in these situations. - The
+
character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal+
sign. - The
,s(d+w+)
part is preceded by a sequence matching]]
in the text to be matched, but]]
is missing from the pattern.
- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by
]]
, but you only matched until the part that specifies the day after the comma, usingd+w+
. This means that if the substitution succeeded, your lines would end in text that looks like:29th]] April through 5th May
, having the]]
sequence that was supposed to terminate the line somewhere in the middle.The substitution string:
[[1|12]]
is not a regular expression, therefore, characters like[
and]
need not be escaped.Also,
d+w+
, although not erroneous, is redundant, sincew
already covers everythingd
does and the way you specified its context with the preceding part of the expression, it always matches stuff like9th
, etc. and never matches anything bad.
EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":
:%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/
@user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.
:%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/
You can still improve this expression by padding with s*
where appropriate, to better catch inconsistencies that invariably occur in manually typed text.
Some issues with your proposed solution:
The regular expression:
(2018[_s]Week[_s]dd),s(d+w+)
does not match, because:- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
.
[_s]
matches either an underscore, a backslash, or ans
character.
You can use_|s
instead in these situations. - The
+
character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal+
sign. - The
,s(d+w+)
part is preceded by a sequence matching]]
in the text to be matched, but]]
is missing from the pattern.
- Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by
Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by
]]
, but you only matched until the part that specifies the day after the comma, usingd+w+
. This means that if the substitution succeeded, your lines would end in text that looks like:29th]] April through 5th May
, having the]]
sequence that was supposed to terminate the line somewhere in the middle.The substitution string:
[[1|12]]
is not a regular expression, therefore, characters like[
and]
need not be escaped.Also,
d+w+
, although not erroneous, is redundant, sincew
already covers everythingd
does and the way you specified its context with the preceding part of the expression, it always matches stuff like9th
, etc. and never matches anything bad.
EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":
:%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/
@user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.
edited Jan 4 at 12:50
answered Jan 3 at 16:08
LarryLarry
1165
1165
Phenomenally complete answer. Thank you much.
– martshal
Jan 3 at 18:36
@martshal this answer won't work for the 6th example,(.*)
should be(,.*)?
– user1133275
Jan 3 at 19:47
Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.
– martshal
Jan 3 at 20:06
@user1133275 No, your example is missing a backslash before the?
, and would duplicate the,
as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.
– Larry
Jan 3 at 20:13
Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.
– Larry
Jan 3 at 21:06
add a comment |
Phenomenally complete answer. Thank you much.
– martshal
Jan 3 at 18:36
@martshal this answer won't work for the 6th example,(.*)
should be(,.*)?
– user1133275
Jan 3 at 19:47
Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.
– martshal
Jan 3 at 20:06
@user1133275 No, your example is missing a backslash before the?
, and would duplicate the,
as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.
– Larry
Jan 3 at 20:13
Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.
– Larry
Jan 3 at 21:06
Phenomenally complete answer. Thank you much.
– martshal
Jan 3 at 18:36
Phenomenally complete answer. Thank you much.
– martshal
Jan 3 at 18:36
@martshal this answer won't work for the 6th example
,(.*)
should be (,.*)?
– user1133275
Jan 3 at 19:47
@martshal this answer won't work for the 6th example
,(.*)
should be (,.*)?
– user1133275
Jan 3 at 19:47
Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.
– martshal
Jan 3 at 20:06
Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.
– martshal
Jan 3 at 20:06
@user1133275 No, your example is missing a backslash before the
?
, and would duplicate the ,
as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.– Larry
Jan 3 at 20:13
@user1133275 No, your example is missing a backslash before the
?
, and would duplicate the ,
as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.– Larry
Jan 3 at 20:13
Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.
– Larry
Jan 3 at 21:06
Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.
– Larry
Jan 3 at 21:06
add a comment |
If I've understood your question correctly, the following substitution should do what you want:
%s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/
Note: the ([_ ])
capture groups preserves the separator (space or underscore) for the components that appear before the |
(the separator is a space for line 5 while underscores are used in lines 8-13).
add a comment |
If I've understood your question correctly, the following substitution should do what you want:
%s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/
Note: the ([_ ])
capture groups preserves the separator (space or underscore) for the components that appear before the |
(the separator is a space for line 5 while underscores are used in lines 8-13).
add a comment |
If I've understood your question correctly, the following substitution should do what you want:
%s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/
Note: the ([_ ])
capture groups preserves the separator (space or underscore) for the components that appear before the |
(the separator is a space for line 5 while underscores are used in lines 8-13).
If I've understood your question correctly, the following substitution should do what you want:
%s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/
Note: the ([_ ])
capture groups preserves the separator (space or underscore) for the components that appear before the |
(the separator is a space for line 5 while underscores are used in lines 8-13).
edited Jan 3 at 14:53
answered Jan 3 at 14:48
Anthony GeogheganAnthony Geoghegan
7,66443954
7,66443954
add a comment |
add a comment |
vim regex is nonstandard so just use the industry leading perl in vim instead;
:%!perl -pe '$RE'
you can test is outside of vim to;
> echo "[[2018_Week_18]], 29th April through 5th May"
| perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
[[2018_Week_18|Week 18, 29th April through 5th May]]
Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)
I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.
– martshal
Jan 3 at 18:45
add a comment |
vim regex is nonstandard so just use the industry leading perl in vim instead;
:%!perl -pe '$RE'
you can test is outside of vim to;
> echo "[[2018_Week_18]], 29th April through 5th May"
| perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
[[2018_Week_18|Week 18, 29th April through 5th May]]
Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)
I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.
– martshal
Jan 3 at 18:45
add a comment |
vim regex is nonstandard so just use the industry leading perl in vim instead;
:%!perl -pe '$RE'
you can test is outside of vim to;
> echo "[[2018_Week_18]], 29th April through 5th May"
| perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
[[2018_Week_18|Week 18, 29th April through 5th May]]
Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)
vim regex is nonstandard so just use the industry leading perl in vim instead;
:%!perl -pe '$RE'
you can test is outside of vim to;
> echo "[[2018_Week_18]], 29th April through 5th May"
| perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
[[2018_Week_18|Week 18, 29th April through 5th May]]
Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)
edited Jan 3 at 21:04
answered Jan 3 at 14:29
user1133275user1133275
2,864620
2,864620
I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.
– martshal
Jan 3 at 18:45
add a comment |
I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.
– martshal
Jan 3 at 18:45
I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.
– martshal
Jan 3 at 18:45
I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.
– martshal
Jan 3 at 18:45
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492225%2fnailing-down-vim-regex-substitution%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was
/[[d+[_s]Week[_s]d+]],sd+w+.*/g
. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be:/[[d+[_s]Week[_s]d+]],sd+w+.*
Also, the second regex,(2018[_s]Week[_s]dd),s(d+w+)
(which should probably be(2018[_s]Week[_s]dd),s(d+w+)
is quite different from the previous search term so I’d suggest that you further edit to clarify this.– Anthony Geoghegan
Jan 3 at 14:36
1
Thanks for your attempt to clarify Anthony.
– martshal
Jan 3 at 18:40