regexpr syntax in R
Clash Royale CLAN TAG#URR8PPP
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
See this example: link to example
I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
add a comment |
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
See this example: link to example
I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
add a comment |
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
See this example: link to example
I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
See this example: link to example
I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
r regex
edited Jan 18 at 10:31
Mark Rotteveel
59.9k1476119
59.9k1476119
asked Jan 7 at 8:25
ChapoChapo
83611434
83611434
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
Jan 7 at 8:49
regexr.com/45ug5 is with my example
– Chapo
Jan 7 at 8:49
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
Jan 7 at 9:02
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
Jan 7 at 9:02
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
Jan 7 at 9:03
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
Jan 7 at 8:49
regexr.com/45ug5 is with my example
– Chapo
Jan 7 at 8:49
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
Jan 7 at 9:02
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
Jan 7 at 9:02
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
Jan 7 at 9:03
|
show 4 more comments
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
Jan 7 at 8:49
regexr.com/45ug5 is with my example
– Chapo
Jan 7 at 8:49
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
Jan 7 at 9:02
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
Jan 7 at 9:02
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
Jan 7 at 9:03
|
show 4 more comments
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":["name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":["domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
edited Jan 7 at 9:12
answered Jan 7 at 8:30
Wiktor StribiżewWiktor Stribiżew
312k16132207
312k16132207
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
Jan 7 at 8:49
regexr.com/45ug5 is with my example
– Chapo
Jan 7 at 8:49
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
Jan 7 at 9:02
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
Jan 7 at 9:02
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
Jan 7 at 9:03
|
show 4 more comments
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
Jan 7 at 8:49
regexr.com/45ug5 is with my example
– Chapo
Jan 7 at 8:49
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
Jan 7 at 9:02
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
Jan 7 at 9:02
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
Jan 7 at 9:03
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
Jan 7 at 8:49
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
Jan 7 at 8:49
regexr.com/45ug5 is with my example
– Chapo
Jan 7 at 8:49
regexr.com/45ug5 is with my example
– Chapo
Jan 7 at 8:49
I tried
grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice– Chapo
Jan 7 at 9:02
I tried
grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice– Chapo
Jan 7 at 9:02
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in
grep
command, that is why I used it in the answer.– Wiktor Stribiżew
Jan 7 at 9:02
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in
grep
command, that is why I used it in the answer.– Wiktor Stribiżew
Jan 7 at 9:02
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
Jan 7 at 9:03
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
Jan 7 at 9:03
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown