Find duplicate String inside a text with middle near each other

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I need to find a string of 13 character length in a text file that has a duplicate close to it.

It referencing to the mutation 13 of the genomes.



For example:



ACGAATTGCAGCCACAGTACGAATCGCAGCC.



It starts with ACGAATTGCAGCC and ends with it, but in between are random chars of unknown length.



What I've come up so far is:



grep -Eo '((.)13).1,1001'


I have to find it in this




GTACCATAACTAACAACCTGAAAAGTCACAAAAACATATACAATAAAAGAACTAGATTTCGCATAGGATATATATTAATAAAGTGAACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAAAACTTACTCATACGAGGACTAATAAAAGATTCAAAACAATACAATTGACGAAAACTCAACGAGGAAAGCTAGAAAACCACCAGAGAAACTCAAAACACAAATAGAGATAAAAAAAAAAACCATAAAGAAAAATTCTTACATCGTCACAGCCAAGGAAAAAAAGAAATCGTTAAAATGGAACGCAGTCGAACACAAAAAGACAACACAGAACAAAAAAGGCAAACAGCGTAGAAACAAATACACTCGCGTAGCAAAGGGGCGGCGTCACGCTTGAAACATAAAAATAACCACTGTATATCACGACAATCAACAAAGTCTACATCAAGAAAATCAAAAAAATAC











share|improve this question



























    up vote
    2
    down vote

    favorite












    I need to find a string of 13 character length in a text file that has a duplicate close to it.

    It referencing to the mutation 13 of the genomes.



    For example:



    ACGAATTGCAGCCACAGTACGAATCGCAGCC.



    It starts with ACGAATTGCAGCC and ends with it, but in between are random chars of unknown length.



    What I've come up so far is:



    grep -Eo '((.)13).1,1001'


    I have to find it in this




    GTACCATAACTAACAACCTGAAAAGTCACAAAAACATATACAATAAAAGAACTAGATTTCGCATAGGATATATATTAATAAAGTGAACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAAAACTTACTCATACGAGGACTAATAAAAGATTCAAAACAATACAATTGACGAAAACTCAACGAGGAAAGCTAGAAAACCACCAGAGAAACTCAAAACACAAATAGAGATAAAAAAAAAAACCATAAAGAAAAATTCTTACATCGTCACAGCCAAGGAAAAAAAGAAATCGTTAAAATGGAACGCAGTCGAACACAAAAAGACAACACAGAACAAAAAAGGCAAACAGCGTAGAAACAAATACACTCGCGTAGCAAAGGGGCGGCGTCACGCTTGAAACATAAAAATAACCACTGTATATCACGACAATCAACAAAGTCTACATCAAGAAAATCAAAAAAATAC











    share|improve this question

























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I need to find a string of 13 character length in a text file that has a duplicate close to it.

      It referencing to the mutation 13 of the genomes.



      For example:



      ACGAATTGCAGCCACAGTACGAATCGCAGCC.



      It starts with ACGAATTGCAGCC and ends with it, but in between are random chars of unknown length.



      What I've come up so far is:



      grep -Eo '((.)13).1,1001'


      I have to find it in this




      GTACCATAACTAACAACCTGAAAAGTCACAAAAACATATACAATAAAAGAACTAGATTTCGCATAGGATATATATTAATAAAGTGAACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAAAACTTACTCATACGAGGACTAATAAAAGATTCAAAACAATACAATTGACGAAAACTCAACGAGGAAAGCTAGAAAACCACCAGAGAAACTCAAAACACAAATAGAGATAAAAAAAAAAACCATAAAGAAAAATTCTTACATCGTCACAGCCAAGGAAAAAAAGAAATCGTTAAAATGGAACGCAGTCGAACACAAAAAGACAACACAGAACAAAAAAGGCAAACAGCGTAGAAACAAATACACTCGCGTAGCAAAGGGGCGGCGTCACGCTTGAAACATAAAAATAACCACTGTATATCACGACAATCAACAAAGTCTACATCAAGAAAATCAAAAAAATAC











      share|improve this question















      I need to find a string of 13 character length in a text file that has a duplicate close to it.

      It referencing to the mutation 13 of the genomes.



      For example:



      ACGAATTGCAGCCACAGTACGAATCGCAGCC.



      It starts with ACGAATTGCAGCC and ends with it, but in between are random chars of unknown length.



      What I've come up so far is:



      grep -Eo '((.)13).1,1001'


      I have to find it in this




      GTACCATAACTAACAACCTGAAAAGTCACAAAAACATATACAATAAAAGAACTAGATTTCGCATAGGATATATATTAATAAAGTGAACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAAAACTTACTCATACGAGGACTAATAAAAGATTCAAAACAATACAATTGACGAAAACTCAACGAGGAAAGCTAGAAAACCACCAGAGAAACTCAAAACACAAATAGAGATAAAAAAAAAAACCATAAAGAAAAATTCTTACATCGTCACAGCCAAGGAAAAAAAGAAATCGTTAAAATGGAACGCAGTCGAACACAAAAAGACAACACAGAACAAAAAAGGCAAACAGCGTAGAAACAAATACACTCGCGTAGCAAAGGGGCGGCGTCACGCTTGAAACATAAAAATAACCACTGTATATCACGACAATCAACAAAGTCTACATCAAGAAAATCAAAAAAATAC








      grep scripting bioinformatics






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 1 at 21:18









      fra-san

      1,110214




      1,110214










      asked Nov 30 at 23:21









      Jan Villapaz

      123




      123




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote













          You were quite close, the problem was the 100, too narrow! You may want to consider using Perl PCRE as opposed to Posix Extended. The performance difference is quite noticeable.



          grep -Po '((.)13).1,1000?1' genom
          AACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAA


          Timing comparison on my machine:



          Posix: (-E) 0m4.816s
          Perl: (-P) 0m0.011s





          share|improve this answer


















          • 4




            You may want to make the repetition non greedy .1,1000?, even if this may not matter in this exact case, you only want to include up to the nearest repetition. That horrible slowness with -E looks like a bug (LC_ALL=C will cut it to 1/3, but still orders of magnitude slower than -P).
            – mosvy
            Dec 1 at 0:55











          • Good suggestion! Thanks - will incorporate :)
            – tink
            Dec 1 at 1:23










          • P.S.: @mosvy - nice catch w/ the LC_ALL ... that brings the runtime down to 0m0.714s ... almost sevenfold faster. Still a good tick slower than -P. :)
            – tink
            Dec 1 at 3:23










          • Note that POSIX EREs don't have back references. (POSIX BREs do). The OP is probably using an implementation that offers it as an extension.
            – Stéphane Chazelas
            Dec 1 at 22:04










          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f485274%2ffind-duplicate-string-inside-a-text-with-middle-near-each-other%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote













          You were quite close, the problem was the 100, too narrow! You may want to consider using Perl PCRE as opposed to Posix Extended. The performance difference is quite noticeable.



          grep -Po '((.)13).1,1000?1' genom
          AACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAA


          Timing comparison on my machine:



          Posix: (-E) 0m4.816s
          Perl: (-P) 0m0.011s





          share|improve this answer


















          • 4




            You may want to make the repetition non greedy .1,1000?, even if this may not matter in this exact case, you only want to include up to the nearest repetition. That horrible slowness with -E looks like a bug (LC_ALL=C will cut it to 1/3, but still orders of magnitude slower than -P).
            – mosvy
            Dec 1 at 0:55











          • Good suggestion! Thanks - will incorporate :)
            – tink
            Dec 1 at 1:23










          • P.S.: @mosvy - nice catch w/ the LC_ALL ... that brings the runtime down to 0m0.714s ... almost sevenfold faster. Still a good tick slower than -P. :)
            – tink
            Dec 1 at 3:23










          • Note that POSIX EREs don't have back references. (POSIX BREs do). The OP is probably using an implementation that offers it as an extension.
            – Stéphane Chazelas
            Dec 1 at 22:04














          up vote
          3
          down vote













          You were quite close, the problem was the 100, too narrow! You may want to consider using Perl PCRE as opposed to Posix Extended. The performance difference is quite noticeable.



          grep -Po '((.)13).1,1000?1' genom
          AACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAA


          Timing comparison on my machine:



          Posix: (-E) 0m4.816s
          Perl: (-P) 0m0.011s





          share|improve this answer


















          • 4




            You may want to make the repetition non greedy .1,1000?, even if this may not matter in this exact case, you only want to include up to the nearest repetition. That horrible slowness with -E looks like a bug (LC_ALL=C will cut it to 1/3, but still orders of magnitude slower than -P).
            – mosvy
            Dec 1 at 0:55











          • Good suggestion! Thanks - will incorporate :)
            – tink
            Dec 1 at 1:23










          • P.S.: @mosvy - nice catch w/ the LC_ALL ... that brings the runtime down to 0m0.714s ... almost sevenfold faster. Still a good tick slower than -P. :)
            – tink
            Dec 1 at 3:23










          • Note that POSIX EREs don't have back references. (POSIX BREs do). The OP is probably using an implementation that offers it as an extension.
            – Stéphane Chazelas
            Dec 1 at 22:04












          up vote
          3
          down vote










          up vote
          3
          down vote









          You were quite close, the problem was the 100, too narrow! You may want to consider using Perl PCRE as opposed to Posix Extended. The performance difference is quite noticeable.



          grep -Po '((.)13).1,1000?1' genom
          AACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAA


          Timing comparison on my machine:



          Posix: (-E) 0m4.816s
          Perl: (-P) 0m0.011s





          share|improve this answer














          You were quite close, the problem was the 100, too narrow! You may want to consider using Perl PCRE as opposed to Posix Extended. The performance difference is quite noticeable.



          grep -Po '((.)13).1,1000?1' genom
          AACAAAAAAAAAATAACAACAACAACAACGAATGAAGAAAGGAAAAGGAATGATAAAAAAACGAGTAATAATTGAAAACAATTATAAAGTAAGAAAACCGCAACGGCCCAAGTAAGCAAAGCAAGGATAGGAAATTGATCGACACAACTCCATAAAATTTACAACTAGTACTCAGAAAAAATAACTAAGCTATATCCATATCTACTCTAAAAAAGAAAAGGAATAACGGAACACCCACAAAGAAACTCAATTAGCAAAAACCACAGATAATACAAACCAGAGAAGACCACATAAAAAAATGAACGAGTTACCCTTCAAATTAAAATAAATCTACCAGTAAGCATAAAAACAACAAAGTTACAAAACCAAAGACCAAAAGTAGAAATCAGAACAAGGGACATAAACGTTCACCAAATGAATGAAACAACACAATTTAGAAACAAAAAAGAGGAATAAAAAGCCAGAACAGGAGTACGAACATAATTAATTATGAAAGTGACCTACAAATAAGAAGGAAACACAAACAGAAAACAACTAACCACAAAAAAGACATAATAGTAAACAAAAAAAAAA


          Timing comparison on my machine:



          Posix: (-E) 0m4.816s
          Perl: (-P) 0m0.011s






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 1 at 1:24

























          answered Dec 1 at 0:03









          tink

          4,07511218




          4,07511218







          • 4




            You may want to make the repetition non greedy .1,1000?, even if this may not matter in this exact case, you only want to include up to the nearest repetition. That horrible slowness with -E looks like a bug (LC_ALL=C will cut it to 1/3, but still orders of magnitude slower than -P).
            – mosvy
            Dec 1 at 0:55











          • Good suggestion! Thanks - will incorporate :)
            – tink
            Dec 1 at 1:23










          • P.S.: @mosvy - nice catch w/ the LC_ALL ... that brings the runtime down to 0m0.714s ... almost sevenfold faster. Still a good tick slower than -P. :)
            – tink
            Dec 1 at 3:23










          • Note that POSIX EREs don't have back references. (POSIX BREs do). The OP is probably using an implementation that offers it as an extension.
            – Stéphane Chazelas
            Dec 1 at 22:04












          • 4




            You may want to make the repetition non greedy .1,1000?, even if this may not matter in this exact case, you only want to include up to the nearest repetition. That horrible slowness with -E looks like a bug (LC_ALL=C will cut it to 1/3, but still orders of magnitude slower than -P).
            – mosvy
            Dec 1 at 0:55











          • Good suggestion! Thanks - will incorporate :)
            – tink
            Dec 1 at 1:23










          • P.S.: @mosvy - nice catch w/ the LC_ALL ... that brings the runtime down to 0m0.714s ... almost sevenfold faster. Still a good tick slower than -P. :)
            – tink
            Dec 1 at 3:23










          • Note that POSIX EREs don't have back references. (POSIX BREs do). The OP is probably using an implementation that offers it as an extension.
            – Stéphane Chazelas
            Dec 1 at 22:04







          4




          4




          You may want to make the repetition non greedy .1,1000?, even if this may not matter in this exact case, you only want to include up to the nearest repetition. That horrible slowness with -E looks like a bug (LC_ALL=C will cut it to 1/3, but still orders of magnitude slower than -P).
          – mosvy
          Dec 1 at 0:55





          You may want to make the repetition non greedy .1,1000?, even if this may not matter in this exact case, you only want to include up to the nearest repetition. That horrible slowness with -E looks like a bug (LC_ALL=C will cut it to 1/3, but still orders of magnitude slower than -P).
          – mosvy
          Dec 1 at 0:55













          Good suggestion! Thanks - will incorporate :)
          – tink
          Dec 1 at 1:23




          Good suggestion! Thanks - will incorporate :)
          – tink
          Dec 1 at 1:23












          P.S.: @mosvy - nice catch w/ the LC_ALL ... that brings the runtime down to 0m0.714s ... almost sevenfold faster. Still a good tick slower than -P. :)
          – tink
          Dec 1 at 3:23




          P.S.: @mosvy - nice catch w/ the LC_ALL ... that brings the runtime down to 0m0.714s ... almost sevenfold faster. Still a good tick slower than -P. :)
          – tink
          Dec 1 at 3:23












          Note that POSIX EREs don't have back references. (POSIX BREs do). The OP is probably using an implementation that offers it as an extension.
          – Stéphane Chazelas
          Dec 1 at 22:04




          Note that POSIX EREs don't have back references. (POSIX BREs do). The OP is probably using an implementation that offers it as an extension.
          – Stéphane Chazelas
          Dec 1 at 22:04

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f485274%2ffind-duplicate-string-inside-a-text-with-middle-near-each-other%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Bahrain

          Postfix configuration issue with fips on centos 7; mailgun relay