Nailing down vim regex substitution

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












1















I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.



A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.



 * [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March


I started out by doing a simple match:



/[[d+[_s]Week[_s]d+]],sd+w+.*/g


Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.



:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g



E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)



I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?










share|improve this question
























  • I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was /[[d+[_s]Week[_s]d+]],sd+w+.*/g. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.* Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+) (which should probably be (2018[_s]Week[_s]dd),s(d+w+) is quite different from the previous search term so I’d suggest that you further edit to clarify this.

    – Anthony Geoghegan
    Jan 3 at 14:36







  • 1





    Thanks for your attempt to clarify Anthony.

    – martshal
    Jan 3 at 18:40















1















I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.



A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.



 * [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March


I started out by doing a simple match:



/[[d+[_s]Week[_s]d+]],sd+w+.*/g


Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.



:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g



E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)



I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?










share|improve this question
























  • I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was /[[d+[_s]Week[_s]d+]],sd+w+.*/g. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.* Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+) (which should probably be (2018[_s]Week[_s]dd),s(d+w+) is quite different from the previous search term so I’d suggest that you further edit to clarify this.

    – Anthony Geoghegan
    Jan 3 at 14:36







  • 1





    Thanks for your attempt to clarify Anthony.

    – martshal
    Jan 3 at 18:40













1












1








1








I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.



A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.



 * [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March


I started out by doing a simple match:



/[[d+[_s]Week[_s]d+]],sd+w+.*/g


Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.



:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g



E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)



I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?










share|improve this question
















I started using Vimwiki to document weekly progress on my projects last year. As the year rolled on, the format of my list of links to weekly wikis changed a bit. At the end of the year I decided to quickly go through my index page and unify the format of the bullet points that I made, but I just can't get the regex correct.



A raw example of the bullets I want to update looks like the following, with the upper, more recent entries, in the format that I want.



 * [[2018_Week_25|Week 25, 17th through the 23rd June]]
* [[2018_Week_24|Week 24, 10th through 16th June]]
* [[2018_Week_23|Week 23, 3rd through 9th June]]
* [[2018 Week 22|Week 22, 27th May through 2nd June]]
* [[2018 Week 21]], 20th through 26th May
* [[2018_Week_20]]
* [[2018_Week_19]]
* [[2018_Week_18]], 29th April through 5th May
* [[2018_Week_17]], 22nd through 28th April
* [[2018_Week_16]], 15th through 21st April
* [[2018_Week_15]], 8th through 14th April
* [[2018_Week_14]], 1st through 7th April
* [[2018_Week_13]], 25th through 31st March


I started out by doing a simple match:



/[[d+[_s]Week[_s]d+]],sd+w+.*/g


Which matches the appropriate parts of lines 5, 8-13. Then I tried to throw in some pattern variables and substitutions, and everything fell apart. Using the following substitution line, Vim suddenly decided that the pattern that it was finding before was no longer to be found.



:1,13s/(2018[_s]Week[_s]dd),s(d+w+)/[[1|12]]/g



E486: Pattern not found: (2018[_s]Week[_s]dd),s(d+w+)



I've actually tried quite a few subtle variants of this, but I am beginning to believe that I have simply overlooked something glaringly obvious. Does anyone have any suggestions?







regular-expression vim






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 14:36









Anthony Geoghegan

7,66443954




7,66443954










asked Jan 3 at 12:25









martshalmartshal

384




384












  • I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was /[[d+[_s]Week[_s]d+]],sd+w+.*/g. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.* Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+) (which should probably be (2018[_s]Week[_s]dd),s(d+w+) is quite different from the previous search term so I’d suggest that you further edit to clarify this.

    – Anthony Geoghegan
    Jan 3 at 14:36







  • 1





    Thanks for your attempt to clarify Anthony.

    – martshal
    Jan 3 at 18:40

















  • I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was /[[d+[_s]Week[_s]d+]],sd+w+.*/g. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.* Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+) (which should probably be (2018[_s]Week[_s]dd),s(d+w+) is quite different from the previous search term so I’d suggest that you further edit to clarify this.

    – Anthony Geoghegan
    Jan 3 at 14:36







  • 1





    Thanks for your attempt to clarify Anthony.

    – martshal
    Jan 3 at 18:40
















I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was /[[d+[_s]Week[_s]d+]],sd+w+.*/g. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.* Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+) (which should probably be (2018[_s]Week[_s]dd),s(d+w+) is quite different from the previous search term so I’d suggest that you further edit to clarify this.

– Anthony Geoghegan
Jan 3 at 14:36






I’ve edited this question to use code formatting to make it easier to read the actual search terms. The first regex was /[[d+[_s]Week[_s]d+]],sd+w+.*/g. If I'm correctly guessing what you intend, the plus signs should actually be escaped with a backslash so I think the regex should be: /[[d+[_s]Week[_s]d+]],sd+w+.* Also, the second regex, (2018[_s]Week[_s]dd),s(d+w+) (which should probably be (2018[_s]Week[_s]dd),s(d+w+) is quite different from the previous search term so I’d suggest that you further edit to clarify this.

– Anthony Geoghegan
Jan 3 at 14:36





1




1





Thanks for your attempt to clarify Anthony.

– martshal
Jan 3 at 18:40





Thanks for your attempt to clarify Anthony.

– martshal
Jan 3 at 18:40










3 Answers
3






active

oldest

votes


















2














:%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/



You can still improve this expression by padding with s* where appropriate, to better catch inconsistencies that invariably occur in manually typed text.



Some issues with your proposed solution:




  • The regular expression: (2018[_s]Week[_s]dd),s(d+w+)
    does not match, because:



    • Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by . [_s] matches either an underscore, a backslash, or an s character.
      You can use _|s instead in these situations.

    • The + character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal + sign.

    • The ,s(d+w+) part is preceded by a sequence matching ]] in the text to be matched, but ]] is missing from the pattern.


  • Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by ]], but you only matched until the part that specifies the day after the comma, using d+w+. This means that if the substitution succeeded, your lines would end in text that looks like: 29th]] April through 5th May, having the ]] sequence that was supposed to terminate the line somewhere in the middle.


  • The substitution string: [[1|12]]
    is not a regular expression, therefore, characters like [ and ] need not be escaped.


  • Also, d+w+, although not erroneous, is redundant, since w already covers everything d does and the way you specified its context with the preceding part of the expression, it always matches stuff like 9th, etc. and never matches anything bad.


EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":



:%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/



@user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.






share|improve this answer

























  • Phenomenally complete answer. Thank you much.

    – martshal
    Jan 3 at 18:36











  • @martshal this answer won't work for the 6th example ,(.*) should be (,.*)?

    – user1133275
    Jan 3 at 19:47











  • Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.

    – martshal
    Jan 3 at 20:06











  • @user1133275 No, your example is missing a backslash before the ?, and would duplicate the , as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.

    – Larry
    Jan 3 at 20:13











  • Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.

    – Larry
    Jan 3 at 21:06


















2














If I've understood your question correctly, the following substitution should do what you want:



%s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/


Note: the ([_ ]) capture groups preserves the separator (space or underscore) for the components that appear before the | (the separator is a space for line 5 while underscores are used in lines 8-13).






share|improve this answer
































    1














    vim regex is nonstandard so just use the industry leading perl in vim instead;



    :%!perl -pe '$RE'


    you can test is outside of vim to;



    > echo "[[2018_Week_18]], 29th April through 5th May" 
    | perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
    [[2018_Week_18|Week 18, 29th April through 5th May]]


    Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)






    share|improve this answer

























    • I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.

      – martshal
      Jan 3 at 18:45










    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492225%2fnailing-down-vim-regex-substitution%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    :%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/



    You can still improve this expression by padding with s* where appropriate, to better catch inconsistencies that invariably occur in manually typed text.



    Some issues with your proposed solution:




    • The regular expression: (2018[_s]Week[_s]dd),s(d+w+)
      does not match, because:



      • Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by . [_s] matches either an underscore, a backslash, or an s character.
        You can use _|s instead in these situations.

      • The + character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal + sign.

      • The ,s(d+w+) part is preceded by a sequence matching ]] in the text to be matched, but ]] is missing from the pattern.


    • Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by ]], but you only matched until the part that specifies the day after the comma, using d+w+. This means that if the substitution succeeded, your lines would end in text that looks like: 29th]] April through 5th May, having the ]] sequence that was supposed to terminate the line somewhere in the middle.


    • The substitution string: [[1|12]]
      is not a regular expression, therefore, characters like [ and ] need not be escaped.


    • Also, d+w+, although not erroneous, is redundant, since w already covers everything d does and the way you specified its context with the preceding part of the expression, it always matches stuff like 9th, etc. and never matches anything bad.


    EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":



    :%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/



    @user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
    If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.






    share|improve this answer

























    • Phenomenally complete answer. Thank you much.

      – martshal
      Jan 3 at 18:36











    • @martshal this answer won't work for the 6th example ,(.*) should be (,.*)?

      – user1133275
      Jan 3 at 19:47











    • Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.

      – martshal
      Jan 3 at 20:06











    • @user1133275 No, your example is missing a backslash before the ?, and would duplicate the , as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.

      – Larry
      Jan 3 at 20:13











    • Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.

      – Larry
      Jan 3 at 21:06















    2














    :%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/



    You can still improve this expression by padding with s* where appropriate, to better catch inconsistencies that invariably occur in manually typed text.



    Some issues with your proposed solution:




    • The regular expression: (2018[_s]Week[_s]dd),s(d+w+)
      does not match, because:



      • Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by . [_s] matches either an underscore, a backslash, or an s character.
        You can use _|s instead in these situations.

      • The + character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal + sign.

      • The ,s(d+w+) part is preceded by a sequence matching ]] in the text to be matched, but ]] is missing from the pattern.


    • Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by ]], but you only matched until the part that specifies the day after the comma, using d+w+. This means that if the substitution succeeded, your lines would end in text that looks like: 29th]] April through 5th May, having the ]] sequence that was supposed to terminate the line somewhere in the middle.


    • The substitution string: [[1|12]]
      is not a regular expression, therefore, characters like [ and ] need not be escaped.


    • Also, d+w+, although not erroneous, is redundant, since w already covers everything d does and the way you specified its context with the preceding part of the expression, it always matches stuff like 9th, etc. and never matches anything bad.


    EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":



    :%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/



    @user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
    If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.






    share|improve this answer

























    • Phenomenally complete answer. Thank you much.

      – martshal
      Jan 3 at 18:36











    • @martshal this answer won't work for the 6th example ,(.*) should be (,.*)?

      – user1133275
      Jan 3 at 19:47











    • Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.

      – martshal
      Jan 3 at 20:06











    • @user1133275 No, your example is missing a backslash before the ?, and would duplicate the , as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.

      – Larry
      Jan 3 at 20:13











    • Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.

      – Larry
      Jan 3 at 21:06













    2












    2








    2







    :%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/



    You can still improve this expression by padding with s* where appropriate, to better catch inconsistencies that invariably occur in manually typed text.



    Some issues with your proposed solution:




    • The regular expression: (2018[_s]Week[_s]dd),s(d+w+)
      does not match, because:



      • Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by . [_s] matches either an underscore, a backslash, or an s character.
        You can use _|s instead in these situations.

      • The + character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal + sign.

      • The ,s(d+w+) part is preceded by a sequence matching ]] in the text to be matched, but ]] is missing from the pattern.


    • Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by ]], but you only matched until the part that specifies the day after the comma, using d+w+. This means that if the substitution succeeded, your lines would end in text that looks like: 29th]] April through 5th May, having the ]] sequence that was supposed to terminate the line somewhere in the middle.


    • The substitution string: [[1|12]]
      is not a regular expression, therefore, characters like [ and ] need not be escaped.


    • Also, d+w+, although not erroneous, is redundant, since w already covers everything d does and the way you specified its context with the preceding part of the expression, it always matches stuff like 9th, etc. and never matches anything bad.


    EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":



    :%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/



    @user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
    If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.






    share|improve this answer















    :%s/([[d+[_ ]+Week([_ ]+)(d+))]],(.*)/1|Week23,4]]/



    You can still improve this expression by padding with s* where appropriate, to better catch inconsistencies that invariably occur in manually typed text.



    Some issues with your proposed solution:




    • The regular expression: (2018[_s]Week[_s]dd),s(d+w+)
      does not match, because:



      • Backslash-escaped predefined character classes can not be used in user-defined character classes delimited by . [_s] matches either an underscore, a backslash, or an s character.
        You can use _|s instead in these situations.

      • The + character needs to be escaped for its special meaning as the "1 or more" quantifier to be active. Otherwise, it matches a literal + sign.

      • The ,s(d+w+) part is preceded by a sequence matching ]] in the text to be matched, but ]] is missing from the pattern.


    • Not considering the issue with backslashes in the substitution string, you are trying to terminate the resulting string by ]], but you only matched until the part that specifies the day after the comma, using d+w+. This means that if the substitution succeeded, your lines would end in text that looks like: 29th]] April through 5th May, having the ]] sequence that was supposed to terminate the line somewhere in the middle.


    • The substitution string: [[1|12]]
      is not a regular expression, therefore, characters like [ and ] need not be escaped.


    • Also, d+w+, although not erroneous, is redundant, since w already covers everything d does and the way you specified its context with the preceding part of the expression, it always matches stuff like 9th, etc. and never matches anything bad.


    EDIT: A very good suggestion from @user1133275 is (with some alterations) to use the comma in the capture group that follows it in the original solution, to also change lines where no day interval was specified, ie. no "xth to yth":



    :%s/([[d+[_ ]+Week([_ ]+)(d+))]](,.*)?/1|Week234]]/



    @user1133275 didn't provide an answer, so I put the results of our discussion in the comment section of this answer here.
    If they decide to put it in an answer and I am notified, I'll remove this edit, so the credits can go to the author of the base idea.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 4 at 12:50

























    answered Jan 3 at 16:08









    LarryLarry

    1165




    1165












    • Phenomenally complete answer. Thank you much.

      – martshal
      Jan 3 at 18:36











    • @martshal this answer won't work for the 6th example ,(.*) should be (,.*)?

      – user1133275
      Jan 3 at 19:47











    • Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.

      – martshal
      Jan 3 at 20:06











    • @user1133275 No, your example is missing a backslash before the ?, and would duplicate the , as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.

      – Larry
      Jan 3 at 20:13











    • Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.

      – Larry
      Jan 3 at 21:06

















    • Phenomenally complete answer. Thank you much.

      – martshal
      Jan 3 at 18:36











    • @martshal this answer won't work for the 6th example ,(.*) should be (,.*)?

      – user1133275
      Jan 3 at 19:47











    • Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.

      – martshal
      Jan 3 at 20:06











    • @user1133275 No, your example is missing a backslash before the ?, and would duplicate the , as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.

      – Larry
      Jan 3 at 20:13











    • Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.

      – Larry
      Jan 3 at 21:06
















    Phenomenally complete answer. Thank you much.

    – martshal
    Jan 3 at 18:36





    Phenomenally complete answer. Thank you much.

    – martshal
    Jan 3 at 18:36













    @martshal this answer won't work for the 6th example ,(.*) should be (,.*)?

    – user1133275
    Jan 3 at 19:47





    @martshal this answer won't work for the 6th example ,(.*) should be (,.*)?

    – user1133275
    Jan 3 at 19:47













    Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.

    – martshal
    Jan 3 at 20:06





    Yeah, I tangentially touched that in the problem explanation, but there are few enough of those in the list that I can easily do that manually.

    – martshal
    Jan 3 at 20:06













    @user1133275 No, your example is missing a backslash before the ?, and would duplicate the , as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.

    – Larry
    Jan 3 at 20:13





    @user1133275 No, your example is missing a backslash before the ?, and would duplicate the , as well if it worked. It can be made to work however, if the substitution string is changed accordingly, by removing the comma. If @martshal wants to have no comma and no day interval info in the output (since there is none in the mentioned example), then your suggestion, combined with the modifications is really good.

    – Larry
    Jan 3 at 20:13













    Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.

    – Larry
    Jan 3 at 21:06





    Sorry, I meant the above thing with "no day interval info" in such a way that of course, only those output lines would be missing this information where it was not present in the input either. This means that everything would work as in the previous solution, but lines like the 6-th would not have the comma and day interval added.

    – Larry
    Jan 3 at 21:06













    2














    If I've understood your question correctly, the following substitution should do what you want:



    %s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/


    Note: the ([_ ]) capture groups preserves the separator (space or underscore) for the components that appear before the | (the separator is a space for line 5 while underscores are used in lines 8-13).






    share|improve this answer





























      2














      If I've understood your question correctly, the following substitution should do what you want:



      %s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/


      Note: the ([_ ]) capture groups preserves the separator (space or underscore) for the components that appear before the | (the separator is a space for line 5 while underscores are used in lines 8-13).






      share|improve this answer



























        2












        2








        2







        If I've understood your question correctly, the following substitution should do what you want:



        %s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/


        Note: the ([_ ]) capture groups preserves the separator (space or underscore) for the components that appear before the | (the separator is a space for line 5 while underscores are used in lines 8-13).






        share|improve this answer















        If I've understood your question correctly, the following substitution should do what you want:



        %s/[[(d+)([_ ])Week([_ ])(d+)]],(sd+w+.*)/[[12Week34|Week 4,5]]/


        Note: the ([_ ]) capture groups preserves the separator (space or underscore) for the components that appear before the | (the separator is a space for line 5 while underscores are used in lines 8-13).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 3 at 14:53

























        answered Jan 3 at 14:48









        Anthony GeogheganAnthony Geoghegan

        7,66443954




        7,66443954





















            1














            vim regex is nonstandard so just use the industry leading perl in vim instead;



            :%!perl -pe '$RE'


            you can test is outside of vim to;



            > echo "[[2018_Week_18]], 29th April through 5th May" 
            | perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
            [[2018_Week_18|Week 18, 29th April through 5th May]]


            Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)






            share|improve this answer

























            • I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.

              – martshal
              Jan 3 at 18:45















            1














            vim regex is nonstandard so just use the industry leading perl in vim instead;



            :%!perl -pe '$RE'


            you can test is outside of vim to;



            > echo "[[2018_Week_18]], 29th April through 5th May" 
            | perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
            [[2018_Week_18|Week 18, 29th April through 5th May]]


            Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)






            share|improve this answer

























            • I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.

              – martshal
              Jan 3 at 18:45













            1












            1








            1







            vim regex is nonstandard so just use the industry leading perl in vim instead;



            :%!perl -pe '$RE'


            you can test is outside of vim to;



            > echo "[[2018_Week_18]], 29th April through 5th May" 
            | perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
            [[2018_Week_18|Week 18, 29th April through 5th May]]


            Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)






            share|improve this answer















            vim regex is nonstandard so just use the industry leading perl in vim instead;



            :%!perl -pe '$RE'


            you can test is outside of vim to;



            > echo "[[2018_Week_18]], 29th April through 5th May" 
            | perl -pe 's/[_ ](Week)[_ ](d+)]](, .*)?/_$1_$2|$1 $2$3]]/g'
            [[2018_Week_18|Week 18, 29th April through 5th May]]


            Apart from the perl REs being about 1/2 the length of the vim REs, the perl REs are copy/paste compatible with many other tools (grep/rename/vim/sed/awk/etc)







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 3 at 21:04

























            answered Jan 3 at 14:29









            user1133275user1133275

            2,864620




            2,864620












            • I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.

              – martshal
              Jan 3 at 18:45

















            • I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.

              – martshal
              Jan 3 at 18:45
















            I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.

            – martshal
            Jan 3 at 18:45





            I like your answer, but it's been quite a while since I've used perl (where I was actually introduced to regex). I'll have to think about this for a bit to decide if this represents more or less of a learning/relearning curve for me.

            – martshal
            Jan 3 at 18:45

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492225%2fnailing-down-vim-regex-substitution%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?