split file lines by regex delimeter

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



Input file:



www.wifi.in.ua
YI-HondBrychka


Output file:



www
wifi
in
ua
YI
HondBrynchka









share|improve this question




























    0















    I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



    Input file:



    www.wifi.in.ua
    YI-HondBrychka


    Output file:



    www
    wifi
    in
    ua
    YI
    HondBrynchka









    share|improve this question
























      0












      0








      0








      I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



      Input file:



      www.wifi.in.ua
      YI-HondBrychka


      Output file:



      www
      wifi
      in
      ua
      YI
      HondBrynchka









      share|improve this question














      I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:



      Input file:



      www.wifi.in.ua
      YI-HondBrychka


      Output file:



      www
      wifi
      in
      ua
      YI
      HondBrynchka






      regular-expression






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 10 at 19:19









      dizczadizcza

      104




      104




















          2 Answers
          2






          active

          oldest

          votes


















          1














          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer























          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            Mar 10 at 19:33











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            Mar 10 at 19:55











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            Mar 10 at 19:59











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            Mar 10 at 20:03











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            Mar 10 at 20:07


















          0














          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer

























          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            Mar 10 at 19:57











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            Mar 10 at 20:03












          • Now it works as well.

            – dizcza
            Mar 10 at 20:04











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer























          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            Mar 10 at 19:33











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            Mar 10 at 19:55











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            Mar 10 at 19:59











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            Mar 10 at 20:03











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            Mar 10 at 20:07















          1














          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer























          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            Mar 10 at 19:33











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            Mar 10 at 19:55











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            Mar 10 at 19:59











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            Mar 10 at 20:03











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            Mar 10 at 20:07













          1












          1








          1







          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka





          share|improve this answer













          Try using the -o flag, to only print matching strings, e.g.



          $ cat <<HEREDOC | grep -Po 'w+'
          www.wifi.in.ua
          YI-HondBrychka
          HEREDOC

          www
          wifi
          in
          ua
          YI
          HondBrychka






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 10 at 19:29









          igaligal

          6,1411638




          6,1411638












          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            Mar 10 at 19:33











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            Mar 10 at 19:55











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            Mar 10 at 19:59











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            Mar 10 at 20:03











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            Mar 10 at 20:07

















          • Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

            – dizcza
            Mar 10 at 19:33











          • No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

            – igal
            Mar 10 at 19:55











          • Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

            – dizcza
            Mar 10 at 19:59











          • It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

            – igal
            Mar 10 at 20:03











          • Oh, I see, I didn't know that. Thank you.

            – dizcza
            Mar 10 at 20:07
















          Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

          – dizcza
          Mar 10 at 19:33





          Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

          – dizcza
          Mar 10 at 19:33













          No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

          – igal
          Mar 10 at 19:55





          No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

          – igal
          Mar 10 at 19:55













          Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

          – dizcza
          Mar 10 at 19:59





          Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

          – dizcza
          Mar 10 at 19:59













          It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

          – igal
          Mar 10 at 20:03





          It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

          – igal
          Mar 10 at 20:03













          Oh, I see, I didn't know that. Thank you.

          – dizcza
          Mar 10 at 20:07





          Oh, I see, I didn't know that. Thank you.

          – dizcza
          Mar 10 at 20:07













          0














          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer

























          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            Mar 10 at 19:57











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            Mar 10 at 20:03












          • Now it works as well.

            – dizcza
            Mar 10 at 20:04















          0














          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer

























          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            Mar 10 at 19:57











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            Mar 10 at 20:03












          • Now it works as well.

            – dizcza
            Mar 10 at 20:04













          0












          0








          0







          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.






          share|improve this answer















          Replacing all matches of W with newlines, using Perl (from which the W expression originated):



          $ perl -pe '$_ =~ s/W/n/g' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Or, more in line with the actual wording of the question:



          $ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:



          awk -v RS='[^[:alnum:]]' 1 file


          The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.



          Or with GNU sed:



          sed 's/[^[:alnum:]]/n/g' file


          With tr, it becomes



          $ tr -c '[:alnum:]' 'n' <file
          www
          wifi
          in
          ua
          YI
          HondBrychka


          where -c makes it replace each character that is not an [:alnum:] with a newline.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 10 at 19:53

























          answered Mar 10 at 19:37









          KusalanandaKusalananda

          140k17261435




          140k17261435












          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            Mar 10 at 19:57











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            Mar 10 at 20:03












          • Now it works as well.

            – dizcza
            Mar 10 at 20:04

















          • The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

            – dizcza
            Mar 10 at 19:57











          • @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

            – Kusalananda
            Mar 10 at 20:03












          • Now it works as well.

            – dizcza
            Mar 10 at 20:04
















          The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

          – dizcza
          Mar 10 at 19:57





          The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

          – dizcza
          Mar 10 at 19:57













          @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

          – Kusalananda
          Mar 10 at 20:03






          @dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

          – Kusalananda
          Mar 10 at 20:03














          Now it works as well.

          – dizcza
          Mar 10 at 20:04





          Now it works as well.

          – dizcza
          Mar 10 at 20:04

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          Peggy Mitchell

          Palaiologos

          The Forum (Inglewood, California)