Subset string by counting specific characters

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












15














I have the following strings:



strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 


I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:



some_function(strings)

c("ABBSDGN", "AABSDG", "AGN", "GGG")


I tried to use the stringi, stringr and regex expressions but I can't figure it out.










share|improve this question




























    15














    I have the following strings:



    strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 


    I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:



    some_function(strings)

    c("ABBSDGN", "AABSDG", "AGN", "GGG")


    I tried to use the stringi, stringr and regex expressions but I can't figure it out.










    share|improve this question


























      15












      15








      15


      5





      I have the following strings:



      strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 


      I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:



      some_function(strings)

      c("ABBSDGN", "AABSDG", "AGN", "GGG")


      I tried to use the stringi, stringr and regex expressions but I can't figure it out.










      share|improve this question















      I have the following strings:



      strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 


      I want to cut off the string, as soon as the number of occurances of A, G and N reach a certain value, say 3. In that case, the result should be:



      some_function(strings)

      c("ABBSDGN", "AABSDG", "AGN", "GGG")


      I tried to use the stringi, stringr and regex expressions but I can't figure it out.







      r regex gsub stringr stringi






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 27 '18 at 20:22









      PoGibas

      15.7k134175




      15.7k134175










      asked Dec 27 '18 at 19:53









      NivelNivel

      9115




      9115






















          5 Answers
          5






          active

          oldest

          votes


















          9














          You can accomplish your task with a simple call to str_extract from the stringr package:



          library(stringr)

          strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

          str_extract(strings, '([^AGN]*[AGN])3')
          # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


          The [^AGN]*[AGN] portion of the regex pattern says to look for zero or more consecutive characters that are not A, G, or N, followed by one instance of A, G, or N. The additional wrapping with parenthesis and braces, like this ([^AGN]*[AGN])3, means look for that pattern three times consecutively. You can change the number of occurrences of A, G, N, that you are looking for by changing the integer in the curly braces:



          str_extract(strings, '([^AGN]*[AGN])4')
          # [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"


          There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:



          m <- regexpr('([^AGN]*[AGN])3', strings)
          regmatches(strings, m)
          # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


          Alternatively, you can use sub:



          sub('(([^AGN]*[AGN])3).*', '\1', strings)
          # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"





          share|improve this answer






















          • I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
            – Marian Minar
            Dec 28 '18 at 3:03


















          9














          Here is a base R option using strsplit



          sapply(strsplit(strings, ""), function(x)
          paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
          #[1] "ABBSDGN" "AABSDG" "AGN" "GGG"


          Or in the tidyverse



          library(tidyverse)
          map_chr(str_split(strings, ""),
          ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))





          share|improve this answer




























            6














            Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.



            nChars <- 3
            pattern <- "A|G|N"
            # Using sapply to iterate over strings vector
            sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))


            PS:



            If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.






            share|improve this answer


















            • 1




              Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
              – clbieganek
              Dec 27 '18 at 22:15



















            2














            This is just a version without strsplit to Maurits Evers neat solution.



            sapply(strings,
            function(x)
            raw <- rawToChar(charToRaw(x), multiple = TRUE)
            idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
            paste(raw[1:idx], collapse = "")
            )
            ## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG
            ## "ABBSDGN" "AABSDG" "AGN" "GGG"


            Or, slightly different, without strsplit and paste:



            test <- charToRaw("AGN")
            sapply(strings,
            function(x)
            raw <- charToRaw(x)
            idx <- which.max(cumsum(raw %in% test) == 3)
            rawToChar(raw[1:idx])
            )





            share|improve this answer






























              0














              Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.



               reduce_strings = function(str, chars, cnt)

              # Replacing chars in str with "!"
              chars = paste0(chars, collapse = "")
              replacement = paste0(rep("!", nchar(chars)), collapse = "")
              str_alias = chartr(chars, replacement, str)

              # Obtain indices with ! for each string
              idx = stringr::str_locate_all(pattern = '!', str_alias)

              # Reduce each string in str
              reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
              result = vapply(seq_along(str), reduce, "character")
              return(result)


              # Example call
              str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")
              chars = c("A", "G", "N") # Characters that are counted
              cnt = 3 # Count of the characters, at which the strings are cut off
              reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"





              share|improve this answer




















                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53950197%2fsubset-string-by-counting-specific-characters%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                9














                You can accomplish your task with a simple call to str_extract from the stringr package:



                library(stringr)

                strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

                str_extract(strings, '([^AGN]*[AGN])3')
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                The [^AGN]*[AGN] portion of the regex pattern says to look for zero or more consecutive characters that are not A, G, or N, followed by one instance of A, G, or N. The additional wrapping with parenthesis and braces, like this ([^AGN]*[AGN])3, means look for that pattern three times consecutively. You can change the number of occurrences of A, G, N, that you are looking for by changing the integer in the curly braces:



                str_extract(strings, '([^AGN]*[AGN])4')
                # [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"


                There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:



                m <- regexpr('([^AGN]*[AGN])3', strings)
                regmatches(strings, m)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                Alternatively, you can use sub:



                sub('(([^AGN]*[AGN])3).*', '\1', strings)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"





                share|improve this answer






















                • I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
                  – Marian Minar
                  Dec 28 '18 at 3:03















                9














                You can accomplish your task with a simple call to str_extract from the stringr package:



                library(stringr)

                strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

                str_extract(strings, '([^AGN]*[AGN])3')
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                The [^AGN]*[AGN] portion of the regex pattern says to look for zero or more consecutive characters that are not A, G, or N, followed by one instance of A, G, or N. The additional wrapping with parenthesis and braces, like this ([^AGN]*[AGN])3, means look for that pattern three times consecutively. You can change the number of occurrences of A, G, N, that you are looking for by changing the integer in the curly braces:



                str_extract(strings, '([^AGN]*[AGN])4')
                # [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"


                There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:



                m <- regexpr('([^AGN]*[AGN])3', strings)
                regmatches(strings, m)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                Alternatively, you can use sub:



                sub('(([^AGN]*[AGN])3).*', '\1', strings)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"





                share|improve this answer






















                • I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
                  – Marian Minar
                  Dec 28 '18 at 3:03













                9












                9








                9






                You can accomplish your task with a simple call to str_extract from the stringr package:



                library(stringr)

                strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

                str_extract(strings, '([^AGN]*[AGN])3')
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                The [^AGN]*[AGN] portion of the regex pattern says to look for zero or more consecutive characters that are not A, G, or N, followed by one instance of A, G, or N. The additional wrapping with parenthesis and braces, like this ([^AGN]*[AGN])3, means look for that pattern three times consecutively. You can change the number of occurrences of A, G, N, that you are looking for by changing the integer in the curly braces:



                str_extract(strings, '([^AGN]*[AGN])4')
                # [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"


                There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:



                m <- regexpr('([^AGN]*[AGN])3', strings)
                regmatches(strings, m)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                Alternatively, you can use sub:



                sub('(([^AGN]*[AGN])3).*', '\1', strings)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"





                share|improve this answer














                You can accomplish your task with a simple call to str_extract from the stringr package:



                library(stringr)

                strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")

                str_extract(strings, '([^AGN]*[AGN])3')
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                The [^AGN]*[AGN] portion of the regex pattern says to look for zero or more consecutive characters that are not A, G, or N, followed by one instance of A, G, or N. The additional wrapping with parenthesis and braces, like this ([^AGN]*[AGN])3, means look for that pattern three times consecutively. You can change the number of occurrences of A, G, N, that you are looking for by changing the integer in the curly braces:



                str_extract(strings, '([^AGN]*[AGN])4')
                # [1] "ABBSDGNHN" NA "AGNA" "GGGDSRTYHG"


                There are a couple ways to accomplish your task using base R functions. One is to use regexpr followed by regmatches:



                m <- regexpr('([^AGN]*[AGN])3', strings)
                regmatches(strings, m)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                Alternatively, you can use sub:



                sub('(([^AGN]*[AGN])3).*', '\1', strings)
                # [1] "ABBSDGN" "AABSDG" "AGN" "GGG"






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Dec 28 '18 at 1:12

























                answered Dec 27 '18 at 23:41









                clbieganekclbieganek

                620411




                620411











                • I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
                  – Marian Minar
                  Dec 28 '18 at 3:03
















                • I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
                  – Marian Minar
                  Dec 28 '18 at 3:03















                I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
                – Marian Minar
                Dec 28 '18 at 3:03




                I don't think it can get much better to the one-liner str_extract(strings, '([^AGN]*[AGN])3'). Nice one!
                – Marian Minar
                Dec 28 '18 at 3:03













                9














                Here is a base R option using strsplit



                sapply(strsplit(strings, ""), function(x)
                paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
                #[1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                Or in the tidyverse



                library(tidyverse)
                map_chr(str_split(strings, ""),
                ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))





                share|improve this answer

























                  9














                  Here is a base R option using strsplit



                  sapply(strsplit(strings, ""), function(x)
                  paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
                  #[1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                  Or in the tidyverse



                  library(tidyverse)
                  map_chr(str_split(strings, ""),
                  ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))





                  share|improve this answer























                    9












                    9








                    9






                    Here is a base R option using strsplit



                    sapply(strsplit(strings, ""), function(x)
                    paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
                    #[1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                    Or in the tidyverse



                    library(tidyverse)
                    map_chr(str_split(strings, ""),
                    ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))





                    share|improve this answer












                    Here is a base R option using strsplit



                    sapply(strsplit(strings, ""), function(x)
                    paste(x[1:which.max(cumsum(x %in% c("A", "G", "N")) == 3)], collapse = ""))
                    #[1] "ABBSDGN" "AABSDG" "AGN" "GGG"


                    Or in the tidyverse



                    library(tidyverse)
                    map_chr(str_split(strings, ""),
                    ~str_c(.x[1:which.max(cumsum(.x %in% c("A", "G", "N")) == 3)], collapse = ""))






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Dec 27 '18 at 20:16









                    Maurits EversMaurits Evers

                    26.2k41532




                    26.2k41532





















                        6














                        Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.



                        nChars <- 3
                        pattern <- "A|G|N"
                        # Using sapply to iterate over strings vector
                        sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))


                        PS:



                        If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.






                        share|improve this answer


















                        • 1




                          Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
                          – clbieganek
                          Dec 27 '18 at 22:15
















                        6














                        Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.



                        nChars <- 3
                        pattern <- "A|G|N"
                        # Using sapply to iterate over strings vector
                        sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))


                        PS:



                        If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.






                        share|improve this answer


















                        • 1




                          Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
                          – clbieganek
                          Dec 27 '18 at 22:15














                        6












                        6








                        6






                        Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.



                        nChars <- 3
                        pattern <- "A|G|N"
                        # Using sapply to iterate over strings vector
                        sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))


                        PS:



                        If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.






                        share|improve this answer














                        Identify positions of pattern using gregexpr then extract n-th position (3) and substring everything from 1 to this n-th position using subset.



                        nChars <- 3
                        pattern <- "A|G|N"
                        # Using sapply to iterate over strings vector
                        sapply(strings, function(x) substr(x, 1, gregexpr(pattern, x)[[1]][nChars]))


                        PS:



                        If there's a string that doesn't have 3 matches it will generate NA, so you just need to use na.omit on the final result.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Dec 27 '18 at 20:57

























                        answered Dec 27 '18 at 20:19









                        PoGibasPoGibas

                        15.7k134175




                        15.7k134175







                        • 1




                          Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
                          – clbieganek
                          Dec 27 '18 at 22:15













                        • 1




                          Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
                          – clbieganek
                          Dec 27 '18 at 22:15








                        1




                        1




                        Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
                        – clbieganek
                        Dec 27 '18 at 22:15





                        Nice! substr is vectorized, so I would simplify your last line like this: substr(strings, 1, map_int(gregexpr(pattern, strings), nChars)), where map_int from purrr is used.
                        – clbieganek
                        Dec 27 '18 at 22:15












                        2














                        This is just a version without strsplit to Maurits Evers neat solution.



                        sapply(strings,
                        function(x)
                        raw <- rawToChar(charToRaw(x), multiple = TRUE)
                        idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
                        paste(raw[1:idx], collapse = "")
                        )
                        ## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG
                        ## "ABBSDGN" "AABSDG" "AGN" "GGG"


                        Or, slightly different, without strsplit and paste:



                        test <- charToRaw("AGN")
                        sapply(strings,
                        function(x)
                        raw <- charToRaw(x)
                        idx <- which.max(cumsum(raw %in% test) == 3)
                        rawToChar(raw[1:idx])
                        )





                        share|improve this answer



























                          2














                          This is just a version without strsplit to Maurits Evers neat solution.



                          sapply(strings,
                          function(x)
                          raw <- rawToChar(charToRaw(x), multiple = TRUE)
                          idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
                          paste(raw[1:idx], collapse = "")
                          )
                          ## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG
                          ## "ABBSDGN" "AABSDG" "AGN" "GGG"


                          Or, slightly different, without strsplit and paste:



                          test <- charToRaw("AGN")
                          sapply(strings,
                          function(x)
                          raw <- charToRaw(x)
                          idx <- which.max(cumsum(raw %in% test) == 3)
                          rawToChar(raw[1:idx])
                          )





                          share|improve this answer

























                            2












                            2








                            2






                            This is just a version without strsplit to Maurits Evers neat solution.



                            sapply(strings,
                            function(x)
                            raw <- rawToChar(charToRaw(x), multiple = TRUE)
                            idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
                            paste(raw[1:idx], collapse = "")
                            )
                            ## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG
                            ## "ABBSDGN" "AABSDG" "AGN" "GGG"


                            Or, slightly different, without strsplit and paste:



                            test <- charToRaw("AGN")
                            sapply(strings,
                            function(x)
                            raw <- charToRaw(x)
                            idx <- which.max(cumsum(raw %in% test) == 3)
                            rawToChar(raw[1:idx])
                            )





                            share|improve this answer














                            This is just a version without strsplit to Maurits Evers neat solution.



                            sapply(strings,
                            function(x)
                            raw <- rawToChar(charToRaw(x), multiple = TRUE)
                            idx <- which.max(cumsum(raw %in% c("A", "G", "N")) == 3)
                            paste(raw[1:idx], collapse = "")
                            )
                            ## ABBSDGNHNGA AABSDGDRY AGNAFG GGGDSRTYHG
                            ## "ABBSDGN" "AABSDG" "AGN" "GGG"


                            Or, slightly different, without strsplit and paste:



                            test <- charToRaw("AGN")
                            sapply(strings,
                            function(x)
                            raw <- charToRaw(x)
                            idx <- which.max(cumsum(raw %in% test) == 3)
                            rawToChar(raw[1:idx])
                            )






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Dec 27 '18 at 23:09

























                            answered Dec 27 '18 at 21:21









                            ValentinValentin

                            1,8161129




                            1,8161129





















                                0














                                Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.



                                 reduce_strings = function(str, chars, cnt)

                                # Replacing chars in str with "!"
                                chars = paste0(chars, collapse = "")
                                replacement = paste0(rep("!", nchar(chars)), collapse = "")
                                str_alias = chartr(chars, replacement, str)

                                # Obtain indices with ! for each string
                                idx = stringr::str_locate_all(pattern = '!', str_alias)

                                # Reduce each string in str
                                reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
                                result = vapply(seq_along(str), reduce, "character")
                                return(result)


                                # Example call
                                str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")
                                chars = c("A", "G", "N") # Characters that are counted
                                cnt = 3 # Count of the characters, at which the strings are cut off
                                reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"





                                share|improve this answer

























                                  0














                                  Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.



                                   reduce_strings = function(str, chars, cnt)

                                  # Replacing chars in str with "!"
                                  chars = paste0(chars, collapse = "")
                                  replacement = paste0(rep("!", nchar(chars)), collapse = "")
                                  str_alias = chartr(chars, replacement, str)

                                  # Obtain indices with ! for each string
                                  idx = stringr::str_locate_all(pattern = '!', str_alias)

                                  # Reduce each string in str
                                  reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
                                  result = vapply(seq_along(str), reduce, "character")
                                  return(result)


                                  # Example call
                                  str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")
                                  chars = c("A", "G", "N") # Characters that are counted
                                  cnt = 3 # Count of the characters, at which the strings are cut off
                                  reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"





                                  share|improve this answer























                                    0












                                    0








                                    0






                                    Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.



                                     reduce_strings = function(str, chars, cnt)

                                    # Replacing chars in str with "!"
                                    chars = paste0(chars, collapse = "")
                                    replacement = paste0(rep("!", nchar(chars)), collapse = "")
                                    str_alias = chartr(chars, replacement, str)

                                    # Obtain indices with ! for each string
                                    idx = stringr::str_locate_all(pattern = '!', str_alias)

                                    # Reduce each string in str
                                    reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
                                    result = vapply(seq_along(str), reduce, "character")
                                    return(result)


                                    # Example call
                                    str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")
                                    chars = c("A", "G", "N") # Characters that are counted
                                    cnt = 3 # Count of the characters, at which the strings are cut off
                                    reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"





                                    share|improve this answer












                                    Interesting problem. I created a function (see below) that solves your problem. It's assumed that there are just letters and no special characters in any of your strings.



                                     reduce_strings = function(str, chars, cnt)

                                    # Replacing chars in str with "!"
                                    chars = paste0(chars, collapse = "")
                                    replacement = paste0(rep("!", nchar(chars)), collapse = "")
                                    str_alias = chartr(chars, replacement, str)

                                    # Obtain indices with ! for each string
                                    idx = stringr::str_locate_all(pattern = '!', str_alias)

                                    # Reduce each string in str
                                    reduce = function(i) substr(str[i], start = 1, stop = idx[[i]][cnt, 1])
                                    result = vapply(seq_along(str), reduce, "character")
                                    return(result)


                                    # Example call
                                    str = c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG")
                                    chars = c("A", "G", "N") # Characters that are counted
                                    cnt = 3 # Count of the characters, at which the strings are cut off
                                    reduce_strings(str, chars, cnt) # "ABBSDGN" "AABSDG" "AGN" "GGG"






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Dec 27 '18 at 20:48









                                    jollyplatypusjollyplatypus

                                    712




                                    712



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.





                                        Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                        Please pay close attention to the following guidance:


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53950197%2fsubset-string-by-counting-specific-characters%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown






                                        Popular posts from this blog

                                        Peggy Mitchell

                                        Palaiologos

                                        The Forum (Inglewood, California)