regexpr syntax in R

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












9















I am trying the following which should allow me to get everything between productUrl:// and the following ?



(?<="productUrl":"//)(.*?)(?=?)



The above works on https://regexr.com/



I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



See this example: link to example



I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.










share|improve this question




























    9















    I am trying the following which should allow me to get everything between productUrl:// and the following ?



    (?<="productUrl":"//)(.*?)(?=?)



    The above works on https://regexr.com/



    I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



    See this example: link to example



    I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.










    share|improve this question


























      9












      9








      9


      1






      I am trying the following which should allow me to get everything between productUrl:// and the following ?



      (?<="productUrl":"//)(.*?)(?=?)



      The above works on https://regexr.com/



      I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



      See this example: link to example



      I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.










      share|improve this question
















      I am trying the following which should allow me to get everything between productUrl:// and the following ?



      (?<="productUrl":"//)(.*?)(?=?)



      The above works on https://regexr.com/



      I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



      See this example: link to example



      I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.







      r regex






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 18 at 10:31









      Mark Rotteveel

      59.9k1476119




      59.9k1476119










      asked Jan 7 at 8:25









      ChapoChapo

      83611434




      83611434






















          1 Answer
          1






          active

          oldest

          votes


















          11














          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer

























          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

            – Chapo
            Jan 7 at 8:49











          • regexr.com/45ug5 is with my example

            – Chapo
            Jan 7 at 8:49











          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

            – Chapo
            Jan 7 at 9:02











          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

            – Wiktor Stribiżew
            Jan 7 at 9:02












          • I've put an input example in the link in previous comment : regexr.com/45ug5

            – Chapo
            Jan 7 at 9:03










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          11














          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer

























          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

            – Chapo
            Jan 7 at 8:49











          • regexr.com/45ug5 is with my example

            – Chapo
            Jan 7 at 8:49











          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

            – Chapo
            Jan 7 at 9:02











          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

            – Wiktor Stribiżew
            Jan 7 at 9:02












          • I've put an input example in the link in previous comment : regexr.com/45ug5

            – Chapo
            Jan 7 at 9:03















          11














          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer

























          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

            – Chapo
            Jan 7 at 8:49











          • regexr.com/45ug5 is with my example

            – Chapo
            Jan 7 at 8:49











          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

            – Chapo
            Jan 7 at 9:02











          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

            – Wiktor Stribiżew
            Jan 7 at 9:02












          • I've put an input example in the link in previous comment : regexr.com/45ug5

            – Chapo
            Jan 7 at 9:03













          11












          11








          11







          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer















          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 7 at 9:12

























          answered Jan 7 at 8:30









          Wiktor StribiżewWiktor Stribiżew

          312k16132207




          312k16132207












          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

            – Chapo
            Jan 7 at 8:49











          • regexr.com/45ug5 is with my example

            – Chapo
            Jan 7 at 8:49











          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

            – Chapo
            Jan 7 at 9:02











          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

            – Wiktor Stribiżew
            Jan 7 at 9:02












          • I've put an input example in the link in previous comment : regexr.com/45ug5

            – Chapo
            Jan 7 at 9:03

















          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

            – Chapo
            Jan 7 at 8:49











          • regexr.com/45ug5 is with my example

            – Chapo
            Jan 7 at 8:49











          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

            – Chapo
            Jan 7 at 9:02











          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

            – Wiktor Stribiżew
            Jan 7 at 9:02












          • I've put an input example in the link in previous comment : regexr.com/45ug5

            – Chapo
            Jan 7 at 9:03
















          Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

          – Chapo
          Jan 7 at 8:49





          Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

          – Chapo
          Jan 7 at 8:49













          regexr.com/45ug5 is with my example

          – Chapo
          Jan 7 at 8:49





          regexr.com/45ug5 is with my example

          – Chapo
          Jan 7 at 8:49













          I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

          – Chapo
          Jan 7 at 9:02





          I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

          – Chapo
          Jan 7 at 9:02













          What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

          – Wiktor Stribiżew
          Jan 7 at 9:02






          What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

          – Wiktor Stribiżew
          Jan 7 at 9:02














          I've put an input example in the link in previous comment : regexr.com/45ug5

          – Chapo
          Jan 7 at 9:03





          I've put an input example in the link in previous comment : regexr.com/45ug5

          – Chapo
          Jan 7 at 9:03

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?