split file lines by regex delimeter

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua
YI-HondBrychka

Output file:

www
wifi
in
ua
YI
HondBrynchka

asked Mar 10 at 19:19

dizcza

104

add a comment |

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua
YI-HondBrychka

Output file:

www
wifi
in
ua
YI
HondBrynchka

asked Mar 10 at 19:19

dizcza

104

add a comment |

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua
YI-HondBrychka

Output file:

www
wifi
in
ua
YI
HondBrynchka

asked Mar 10 at 19:19

dizcza

104

I want to split each line from input file by a non-alphanumeric regex W and print all the split chunks in the output file like so:

Input file:

www.wifi.in.ua
YI-HondBrychka

Output file:

www
wifi
in
ua
YI
HondBrynchka

regular-expression

asked Mar 10 at 19:19

dizcza

104

asked Mar 10 at 19:19

dizcza

104

asked Mar 10 at 19:19

dizcza

104

asked Mar 10 at 19:19

dizcza

104

asked Mar 10 at 19:19

dizcza

104

add a comment |

2 Answers
2

active

oldest

votes

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC

www
wifi
in
ua
YI
HondBrychka

answered Mar 10 at 19:29

igal

6,1411638

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
Mar 10 at 19:33

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
Mar 10 at 19:55

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
Mar 10 at 19:59

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
Mar 10 at 20:03

Oh, I see, I didn't know that. Thank you.

– dizcza
Mar 10 at 20:07

|
show 1 more comment

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited Mar 10 at 19:53

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
Mar 10 at 19:57

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda♦
Mar 10 at 20:03

Now it works as well.

– dizcza
Mar 10 at 20:04

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f505512%2fsplit-file-lines-by-regex-delimeter%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC

www
wifi
in
ua
YI
HondBrychka

answered Mar 10 at 19:29

igal

6,1411638

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
Mar 10 at 19:33

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
Mar 10 at 19:55

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
Mar 10 at 19:59

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
Mar 10 at 20:03

Oh, I see, I didn't know that. Thank you.

– dizcza
Mar 10 at 20:07

|
show 1 more comment

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC

www
wifi
in
ua
YI
HondBrychka

answered Mar 10 at 19:29

igal

6,1411638

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
Mar 10 at 19:33

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
Mar 10 at 19:55

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
Mar 10 at 19:59

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
Mar 10 at 20:03

Oh, I see, I didn't know that. Thank you.

– dizcza
Mar 10 at 20:07

|
show 1 more comment

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC

www
wifi
in
ua
YI
HondBrychka

answered Mar 10 at 19:29

igal

6,1411638

Try using the -o flag, to only print matching strings, e.g.

$ cat <<HEREDOC | grep -Po 'w+'
www.wifi.in.ua
YI-HondBrychka
HEREDOC

www
wifi
in
ua
YI
HondBrychka

answered Mar 10 at 19:29

igal

6,1411638

answered Mar 10 at 19:29

igal

6,1411638

answered Mar 10 at 19:29

igal

6,1411638

answered Mar 10 at 19:29

igal

6,1411638

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
Mar 10 at 19:33

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
Mar 10 at 19:55

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
Mar 10 at 19:59

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
Mar 10 at 20:03

Oh, I see, I didn't know that. Thank you.

– dizcza
Mar 10 at 20:07

|
show 1 more comment

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
Mar 10 at 19:33

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
Mar 10 at 19:55

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
Mar 10 at 19:59

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
Mar 10 at 20:03

Oh, I see, I didn't know that. Thank you.

– dizcza
Mar 10 at 20:07

Did you mean grep -Po 'w+' HEREDOC? This works, thank you.

– dizcza
Mar 10 at 19:33

No, I meant what I wrote there - it works for me if I copy-paste it. In practice, I imagine you would use a file instead of a heredoc string, and it would look like grep -Po 'w+' /path/to/file.

– igal
Mar 10 at 19:55

Substituting HEREDOC for path to file, cat <<HEREDOC | grep -Po 'w+' doesn't work for me while cat HEREDOC | grep -Po 'w+' (or better grep -Po 'w+' HEREDOC) works fine.

– dizcza
Mar 10 at 19:59

It sounds to me like you're confused about what a heredoc is - it's not supposed to represent a file path. You can read more about "here documents" in the Bash manual: gnu.org/software/bash/manual/bashref.html#Here-Documents

– igal
Mar 10 at 20:03

Oh, I see, I didn't know that. Thank you.

– dizcza
Mar 10 at 20:07

|
show 1 more comment

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited Mar 10 at 19:53

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
Mar 10 at 19:57

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda♦
Mar 10 at 20:03

Now it works as well.

– dizcza
Mar 10 at 20:04

add a comment |

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited Mar 10 at 19:53

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
Mar 10 at 19:57

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda♦
Mar 10 at 20:03

Now it works as well.

– dizcza
Mar 10 at 20:04

add a comment |

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited Mar 10 at 19:53

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

Replacing all matches of W with newlines, using Perl (from which the W expression originated):

$ perl -pe '$_ =~ s/W/n/g' <file
www
wifi
in
ua
YI
HondBrychka

Or, more in line with the actual wording of the question:

$ perl -pe '$_ = join("n", split(/W/)) . "n"' <file
www
wifi
in
ua
YI
HondBrychka

Expressing the PCRE W as the ERE [^[:alnum:]] and using GNU awk:

awk -v RS='[^[:alnum:]]' 1 file

The 1 is short for ' print ' and this sets the input record separator to any W character. The records are then printed on individual lines.

Or with GNU sed:

sed 's/[^[:alnum:]]/n/g' file

With tr, it becomes

$ tr -c '[:alnum:]' 'n' <file
www
wifi
in
ua
YI
HondBrychka

where -c makes it replace each character that is not an [:alnum:] with a newline.

edited Mar 10 at 19:53

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

edited Mar 10 at 19:53

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

answered Mar 10 at 19:37

Kusalananda♦

140k17261435

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
Mar 10 at 19:57

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda♦
Mar 10 at 20:03

Now it works as well.

– dizcza
Mar 10 at 20:04

add a comment |

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
Mar 10 at 19:57

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda♦
Mar 10 at 20:03

Now it works as well.

– dizcza
Mar 10 at 20:04

The last solution has an artifact of empty new lines. Compare tr -c '[:alpha:]' 'n' < HEREDOC with grep -Po '[a-zA-Z]+' HEREDOC if we add "Zhenek_Lebed98" to the input file HEREDOC.

– dizcza
Mar 10 at 19:57

@dizcza Add -s to the command line. It will make it squeeze multiple newlines into one. But on the other hand, you do have an empty word between 9 and 8.

– Kusalananda♦
Mar 10 at 20:03

Now it works as well.

– dizcza
Mar 10 at 20:04

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu