regexpr syntax in R

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

See this example: link to example

I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

asked Jan 7 at 8:25

Chapo

83611434

add a comment |

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

See this example: link to example

I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

asked Jan 7 at 8:25

Chapo

83611434

add a comment |

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

See this example: link to example

I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

asked Jan 7 at 8:25

Chapo

83611434

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

See this example: link to example

I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

r regex

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

asked Jan 7 at 8:25

Chapo

83611434

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

asked Jan 7 at 8:25

Chapo

83611434

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

edited Jan 18 at 10:31

Mark Rotteveel

59.9k1476119

asked Jan 7 at 8:25

Chapo

83611434

asked Jan 7 at 8:25

Chapo

83611434

asked Jan 7 at 8:25

Chapo

83611434

add a comment |

1 Answer
1

active

oldest

votes

Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)
 ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited Jan 7 at 9:12

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

– Chapo
Jan 7 at 8:49

regexr.com/45ug5 is with my example

– Chapo
Jan 7 at 8:49

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

– Chapo
Jan 7 at 9:02

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

– Wiktor Stribiżew
Jan 7 at 9:02

I've put an input example in the link in previous comment : regexr.com/45ug5

– Chapo
Jan 7 at 9:03

|
show 4 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)
 ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited Jan 7 at 9:12

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

– Chapo
Jan 7 at 8:49

regexr.com/45ug5 is with my example

– Chapo
Jan 7 at 8:49

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

– Chapo
Jan 7 at 9:02

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

– Wiktor Stribiżew
Jan 7 at 9:02

I've put an input example in the link in previous comment : regexr.com/45ug5

– Chapo
Jan 7 at 9:03

|
show 4 more comments

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)
 ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited Jan 7 at 9:12

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

– Chapo
Jan 7 at 8:49

regexr.com/45ug5 is with my example

– Chapo
Jan 7 at 8:49

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

– Chapo
Jan 7 at 9:02

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

– Wiktor Stribiżew
Jan 7 at 9:02

I've put an input example in the link in previous comment : regexr.com/45ug5

– Chapo
Jan 7 at 9:03

|
show 4 more comments

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)
 ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited Jan 7 at 9:12

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)
 ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited Jan 7 at 9:12

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

edited Jan 7 at 9:12

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

answered Jan 7 at 8:30

Wiktor Stribiżew

312k16132207

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

– Chapo
Jan 7 at 8:49

regexr.com/45ug5 is with my example

– Chapo
Jan 7 at 8:49

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

– Chapo
Jan 7 at 9:02

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

– Wiktor Stribiżew
Jan 7 at 9:02

I've put an input example in the link in previous comment : regexr.com/45ug5

– Chapo
Jan 7 at 9:03

|
show 4 more comments

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

– Chapo
Jan 7 at 8:49

regexr.com/45ug5 is with my example

– Chapo
Jan 7 at 8:49

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

– Chapo
Jan 7 at 9:02

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

– Wiktor Stribiżew
Jan 7 at 9:02

I've put an input example in the link in previous comment : regexr.com/45ug5

– Chapo
Jan 7 at 9:03

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?

– Chapo
Jan 7 at 8:49

regexr.com/45ug5 is with my example

– Chapo
Jan 7 at 8:49

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice

– Chapo
Jan 7 at 9:02

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.

– Wiktor Stribiżew
Jan 7 at 9:02

I've put an input example in the link in previous comment : regexr.com/45ug5

– Chapo
Jan 7 at 9:03

|
show 4 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu