Subset string by counting specific characters

I have the following strings:

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:

some_function(strings)

c("ABBSDGN", "AABSDG", "AGN", "GGG")

I tried to use the stringi, stringr and regex expressions but I can't figure it out.

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

asked Dec 27 '18 at 19:53

Nivel

9115

add a comment |

I have the following strings:

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:

some_function(strings)

c("ABBSDGN", "AABSDG", "AGN", "GGG")

I tried to use the stringi, stringr and regex expressions but I can't figure it out.

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

asked Dec 27 '18 at 19:53

Nivel

9115

add a comment |

I have the following strings:

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:

some_function(strings)

c("ABBSDGN", "AABSDG", "AGN", "GGG")

I tried to use the stringi, stringr and regex expressions but I can't figure it out.

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

asked Dec 27 '18 at 19:53

Nivel

9115

I have the following strings:

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:

some_function(strings)

c("ABBSDGN", "AABSDG", "AGN", "GGG")

I tried to use the stringi, stringr and regex expressions but I can't figure it out.

r regex gsub stringr stringi

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

asked Dec 27 '18 at 19:53

Nivel

9115

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

asked Dec 27 '18 at 19:53

Nivel

9115

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

edited Dec 27 '18 at 20:22

PoGibas

15.7k134175

asked Dec 27 '18 at 19:53

Nivel

9115

asked Dec 27 '18 at 19:53

Nivel

9115

asked Dec 27 '18 at 19:53

Nivel

9115

add a comment |

5 Answers
5

active

oldest

votes

You can accomplish your task with a simple call to str_extract from the stringr package:

library(stringr)

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

str_extract(strings, '([^AGN]*[AGN])3')
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

The [^AGN]*[AGN] portion of the regex pattern says to look for zero or more consecutive characters that are not A, G, or N, followed by one instance of A, G, or N. The additional wrapping with parenthesis and braces, like this ([^AGN]*[AGN])3, means look for that pattern three times consecutively. You can change the number of occurrences of A, G, N, that you are looking for by changing the integer in the curly braces:

str_extract(strings, '([^AGN]*[AGN])4')
# [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"

There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:

m <- regexpr('([^AGN]*[AGN])3', strings)
regmatches(strings, m)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Alternatively, you can use sub:

sub('(([^AGN]*[AGN])3).*', '\1', strings)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

edited Dec 28 '18 at 1:12

answered Dec 27 '18 at 23:41

clbieganek

620411

I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
– Marian Minar
Dec 28 '18 at 3:03

add a comment |

Here is a base R option using strsplit

sapply(strsplit(strings, ""), function(x)
 paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
#[1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Or in the tidyverse

library(tidyverse)
map_chr(str_split(strings, ""), 
 ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

add a comment |

Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.

nChars <- 3
pattern <- "A|G|N"
# Using sapply to iterate over strings vector
sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))

PS:

If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.

edited Dec 27 '18 at 20:57

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

1

Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
– clbieganek
Dec 27 '18 at 22:15

add a comment |

This is just a version without strsplit to Maurits Evers neat solution.

sapply(strings,
 function(x) 
 raw <- rawToChar(charToRaw(x), multiple = TRUE)
 idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
 paste(raw[1:idx], collapse = "")
 )
## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG 
## "ABBSDGN" "AABSDG" "AGN" "GGG"

Or, slightly different, without strsplit and paste:

test <- charToRaw("AGN")
sapply(strings,
 function(x) 
 raw <- charToRaw(x)
 idx <- which.max(cumsum(raw %in% test) == 3)
 rawToChar(raw[1:idx])
 )

edited Dec 27 '18 at 23:09

answered Dec 27 '18 at 21:21

Valentin

1,8161129

add a comment |

Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.

 reduce_strings = function(str, chars, cnt)

 # Replacing chars in str with "!"
 chars = paste0(chars, collapse = "")
 replacement = paste0(rep("!", nchar(chars)), collapse = "")
 str_alias = chartr(chars, replacement, str) 

 # Obtain indices with ! for each string
 idx = stringr::str_locate_all(pattern = '!', str_alias)

 # Reduce each string in str
 reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
 result = vapply(seq_along(str), reduce, "character")
 return(result)


# Example call
str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 
chars = c("A", "G", "N") # Characters that are counted
cnt = 3 # Count of the characters, at which the strings are cut off
reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"

answered Dec 27 '18 at 20:48

jollyplatypus

712

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53950197%2fsubset-string-by-counting-specific-characters%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

You can accomplish your task with a simple call to str_extract from the stringr package:

library(stringr)

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

str_extract(strings, '([^AGN]*[AGN])3')
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

str_extract(strings, '([^AGN]*[AGN])4')
# [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"

There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:

m <- regexpr('([^AGN]*[AGN])3', strings)
regmatches(strings, m)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Alternatively, you can use sub:

sub('(([^AGN]*[AGN])3).*', '\1', strings)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

edited Dec 28 '18 at 1:12

answered Dec 27 '18 at 23:41

clbieganek

620411

I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
– Marian Minar
Dec 28 '18 at 3:03

add a comment |

You can accomplish your task with a simple call to str_extract from the stringr package:

library(stringr)

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

str_extract(strings, '([^AGN]*[AGN])3')
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

str_extract(strings, '([^AGN]*[AGN])4')
# [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"

There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:

m <- regexpr('([^AGN]*[AGN])3', strings)
regmatches(strings, m)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Alternatively, you can use sub:

sub('(([^AGN]*[AGN])3).*', '\1', strings)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

edited Dec 28 '18 at 1:12

answered Dec 27 '18 at 23:41

clbieganek

620411

I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
– Marian Minar
Dec 28 '18 at 3:03

add a comment |

You can accomplish your task with a simple call to str_extract from the stringr package:

library(stringr)

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

str_extract(strings, '([^AGN]*[AGN])3')
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

str_extract(strings, '([^AGN]*[AGN])4')
# [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"

There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:

m <- regexpr('([^AGN]*[AGN])3', strings)
regmatches(strings, m)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Alternatively, you can use sub:

sub('(([^AGN]*[AGN])3).*', '\1', strings)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

edited Dec 28 '18 at 1:12

answered Dec 27 '18 at 23:41

clbieganek

620411

You can accomplish your task with a simple call to str_extract from the stringr package:

library(stringr)

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

str_extract(strings, '([^AGN]*[AGN])3')
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

str_extract(strings, '([^AGN]*[AGN])4')
# [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"

There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:

m <- regexpr('([^AGN]*[AGN])3', strings)
regmatches(strings, m)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Alternatively, you can use sub:

sub('(([^AGN]*[AGN])3).*', '\1', strings)
# [1] "ABBSDGN" "AABSDG" "AGN" "GGG"

edited Dec 28 '18 at 1:12

answered Dec 27 '18 at 23:41

clbieganek

620411

edited Dec 28 '18 at 1:12

answered Dec 27 '18 at 23:41

clbieganek

620411

answered Dec 27 '18 at 23:41

clbieganek

620411

answered Dec 27 '18 at 23:41

clbieganek

620411

I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
– Marian Minar
Dec 28 '18 at 3:03

add a comment |

I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
– Marian Minar
Dec 28 '18 at 3:03

I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
– Marian Minar
Dec 28 '18 at 3:03

add a comment |

Here is a base R option using strsplit

sapply(strsplit(strings, ""), function(x)
 paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
#[1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Or in the tidyverse

library(tidyverse)
map_chr(str_split(strings, ""), 
 ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

add a comment |

Here is a base R option using strsplit

sapply(strsplit(strings, ""), function(x)
 paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
#[1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Or in the tidyverse

library(tidyverse)
map_chr(str_split(strings, ""), 
 ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

add a comment |

Here is a base R option using strsplit

sapply(strsplit(strings, ""), function(x)
 paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
#[1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Or in the tidyverse

library(tidyverse)
map_chr(str_split(strings, ""), 
 ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

Here is a base R option using strsplit

sapply(strsplit(strings, ""), function(x)
 paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
#[1] "ABBSDGN" "AABSDG" "AGN" "GGG"

Or in the tidyverse

library(tidyverse)
map_chr(str_split(strings, ""), 
 ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

answered Dec 27 '18 at 20:16

Maurits Evers

26.2k41532

add a comment |

Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.

nChars <- 3
pattern <- "A|G|N"
# Using sapply to iterate over strings vector
sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))

PS:

If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.

edited Dec 27 '18 at 20:57

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

1

Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
– clbieganek
Dec 27 '18 at 22:15

add a comment |

Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.

nChars <- 3
pattern <- "A|G|N"
# Using sapply to iterate over strings vector
sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))

PS:

If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.

edited Dec 27 '18 at 20:57

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

1

Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
– clbieganek
Dec 27 '18 at 22:15

add a comment |

Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.

nChars <- 3
pattern <- "A|G|N"
# Using sapply to iterate over strings vector
sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))

PS:

If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.

edited Dec 27 '18 at 20:57

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.

nChars <- 3
pattern <- "A|G|N"
# Using sapply to iterate over strings vector
sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))

PS:

If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.

edited Dec 27 '18 at 20:57

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

edited Dec 27 '18 at 20:57

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

answered Dec 27 '18 at 20:19

PoGibas

15.7k134175

1

Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
– clbieganek
Dec 27 '18 at 22:15

add a comment |

1

Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
– clbieganek
Dec 27 '18 at 22:15

Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
– clbieganek
Dec 27 '18 at 22:15

add a comment |

This is just a version without strsplit to Maurits Evers neat solution.

sapply(strings,
 function(x) 
 raw <- rawToChar(charToRaw(x), multiple = TRUE)
 idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
 paste(raw[1:idx], collapse = "")
 )
## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG 
## "ABBSDGN" "AABSDG" "AGN" "GGG"

Or, slightly different, without strsplit and paste:

test <- charToRaw("AGN")
sapply(strings,
 function(x) 
 raw <- charToRaw(x)
 idx <- which.max(cumsum(raw %in% test) == 3)
 rawToChar(raw[1:idx])
 )

edited Dec 27 '18 at 23:09

answered Dec 27 '18 at 21:21

Valentin

1,8161129

add a comment |

This is just a version without strsplit to Maurits Evers neat solution.

sapply(strings,
 function(x) 
 raw <- rawToChar(charToRaw(x), multiple = TRUE)
 idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
 paste(raw[1:idx], collapse = "")
 )
## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG 
## "ABBSDGN" "AABSDG" "AGN" "GGG"

Or, slightly different, without strsplit and paste:

test <- charToRaw("AGN")
sapply(strings,
 function(x) 
 raw <- charToRaw(x)
 idx <- which.max(cumsum(raw %in% test) == 3)
 rawToChar(raw[1:idx])
 )

edited Dec 27 '18 at 23:09

answered Dec 27 '18 at 21:21

Valentin

1,8161129

add a comment |

This is just a version without strsplit to Maurits Evers neat solution.

sapply(strings,
 function(x) 
 raw <- rawToChar(charToRaw(x), multiple = TRUE)
 idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
 paste(raw[1:idx], collapse = "")
 )
## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG 
## "ABBSDGN" "AABSDG" "AGN" "GGG"

Or, slightly different, without strsplit and paste:

test <- charToRaw("AGN")
sapply(strings,
 function(x) 
 raw <- charToRaw(x)
 idx <- which.max(cumsum(raw %in% test) == 3)
 rawToChar(raw[1:idx])
 )

edited Dec 27 '18 at 23:09

answered Dec 27 '18 at 21:21

Valentin

1,8161129

This is just a version without strsplit to Maurits Evers neat solution.

sapply(strings,
 function(x) 
 raw <- rawToChar(charToRaw(x), multiple = TRUE)
 idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
 paste(raw[1:idx], collapse = "")
 )
## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG 
## "ABBSDGN" "AABSDG" "AGN" "GGG"

Or, slightly different, without strsplit and paste:

test <- charToRaw("AGN")
sapply(strings,
 function(x) 
 raw <- charToRaw(x)
 idx <- which.max(cumsum(raw %in% test) == 3)
 rawToChar(raw[1:idx])
 )

edited Dec 27 '18 at 23:09

answered Dec 27 '18 at 21:21

Valentin

1,8161129

edited Dec 27 '18 at 23:09

answered Dec 27 '18 at 21:21

Valentin

1,8161129

answered Dec 27 '18 at 21:21

Valentin

1,8161129

answered Dec 27 '18 at 21:21

Valentin

1,8161129

add a comment |

Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.

 reduce_strings = function(str, chars, cnt)

 # Replacing chars in str with "!"
 chars = paste0(chars, collapse = "")
 replacement = paste0(rep("!", nchar(chars)), collapse = "")
 str_alias = chartr(chars, replacement, str) 

 # Obtain indices with ! for each string
 idx = stringr::str_locate_all(pattern = '!', str_alias)

 # Reduce each string in str
 reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
 result = vapply(seq_along(str), reduce, "character")
 return(result)


# Example call
str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 
chars = c("A", "G", "N") # Characters that are counted
cnt = 3 # Count of the characters, at which the strings are cut off
reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"

answered Dec 27 '18 at 20:48

jollyplatypus

712

add a comment |

Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.

 reduce_strings = function(str, chars, cnt)

 # Replacing chars in str with "!"
 chars = paste0(chars, collapse = "")
 replacement = paste0(rep("!", nchar(chars)), collapse = "")
 str_alias = chartr(chars, replacement, str) 

 # Obtain indices with ! for each string
 idx = stringr::str_locate_all(pattern = '!', str_alias)

 # Reduce each string in str
 reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
 result = vapply(seq_along(str), reduce, "character")
 return(result)


# Example call
str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 
chars = c("A", "G", "N") # Characters that are counted
cnt = 3 # Count of the characters, at which the strings are cut off
reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"

answered Dec 27 '18 at 20:48

jollyplatypus

712

add a comment |

Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.

 reduce_strings = function(str, chars, cnt)

 # Replacing chars in str with "!"
 chars = paste0(chars, collapse = "")
 replacement = paste0(rep("!", nchar(chars)), collapse = "")
 str_alias = chartr(chars, replacement, str) 

 # Obtain indices with ! for each string
 idx = stringr::str_locate_all(pattern = '!', str_alias)

 # Reduce each string in str
 reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
 result = vapply(seq_along(str), reduce, "character")
 return(result)


# Example call
str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 
chars = c("A", "G", "N") # Characters that are counted
cnt = 3 # Count of the characters, at which the strings are cut off
reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"

answered Dec 27 '18 at 20:48

jollyplatypus

712

Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.

 reduce_strings = function(str, chars, cnt)

 # Replacing chars in str with "!"
 chars = paste0(chars, collapse = "")
 replacement = paste0(rep("!", nchar(chars)), collapse = "")
 str_alias = chartr(chars, replacement, str) 

 # Obtain indices with ! for each string
 idx = stringr::str_locate_all(pattern = '!', str_alias)

 # Reduce each string in str
 reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
 result = vapply(seq_along(str), reduce, "character")
 return(result)


# Example call
str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 
chars = c("A", "G", "N") # Characters that are counted
cnt = 3 # Count of the characters, at which the strings are cut off
reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"

answered Dec 27 '18 at 20:48

jollyplatypus

712

answered Dec 27 '18 at 20:48

jollyplatypus

712

answered Dec 27 '18 at 20:48

jollyplatypus

712

answered Dec 27 '18 at 20:48

jollyplatypus

712

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu