Text Processing - Get 2 lines with exact text between them
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.
I am processing input like this:
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
, and my desired output is this:
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.
I am running Ubuntu 17.10.
Looking forward to any help.
edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d'
, but this actually removes everything.
text-processing awk sed
add a comment |Â
up vote
1
down vote
favorite
I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.
I am processing input like this:
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
, and my desired output is this:
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.
I am running Ubuntu 17.10.
Looking forward to any help.
edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d'
, but this actually removes everything.
text-processing awk sed
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.
I am processing input like this:
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
, and my desired output is this:
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.
I am running Ubuntu 17.10.
Looking forward to any help.
edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d'
, but this actually removes everything.
text-processing awk sed
I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.
I am processing input like this:
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
, and my desired output is this:
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.
I am running Ubuntu 17.10.
Looking forward to any help.
edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d'
, but this actually removes everything.
text-processing awk sed
edited Jan 14 at 21:46
don_crissti
46.6k15124153
46.6k15124153
asked Jan 14 at 18:19
mikro45
83
83
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
To delete back-to-back Start
and End
lines, this should do in GNU sed:
$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input
if we see Start
, load the next line with N
, then see if the contents of the buffer are just Somename:StartnSomename:End
with Somename
same on both lines (n
is a newline). If so, delete it. Here, 1
is a reference to the first group within (..)
, and matches the same string that was encountered there. .*
just means any number (*
) of any characters (.
).
Using sed -e '/Start/,/End/d'
would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start
and End
, so everything is deleted.
Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â mikro45
Jan 14 at 20:44
@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â ilkkachu
Jan 14 at 21:01
It isn't working for whole file the way, you wrote it. But when I changed it tosed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input
, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â mikro45
Jan 14 at 21:33
@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if:End
could appear on some other line./^.*:Startn.*:End$/
would relax that, and just match any strings before:Start
and:End
. (Without the1
, the(
and)
are also unnecessary since they only act as a place for the1
to point to.)
â ilkkachu
Jan 14 at 21:44
add a comment |Â
up vote
1
down vote
another solution, as I like trying to do these in awk.
BEGIN
RS="Endn"
ORS="Endn"
NF > 2
using the built in RS
or record separator variable, awk will treat between each Endn
as a record, and presuming that the servername:Start
and servername:End
are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2
line. If this is true, the whole line will be printed, with Endn
used as the output record separator (ORS
)
~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
To delete back-to-back Start
and End
lines, this should do in GNU sed:
$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input
if we see Start
, load the next line with N
, then see if the contents of the buffer are just Somename:StartnSomename:End
with Somename
same on both lines (n
is a newline). If so, delete it. Here, 1
is a reference to the first group within (..)
, and matches the same string that was encountered there. .*
just means any number (*
) of any characters (.
).
Using sed -e '/Start/,/End/d'
would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start
and End
, so everything is deleted.
Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â mikro45
Jan 14 at 20:44
@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â ilkkachu
Jan 14 at 21:01
It isn't working for whole file the way, you wrote it. But when I changed it tosed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input
, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â mikro45
Jan 14 at 21:33
@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if:End
could appear on some other line./^.*:Startn.*:End$/
would relax that, and just match any strings before:Start
and:End
. (Without the1
, the(
and)
are also unnecessary since they only act as a place for the1
to point to.)
â ilkkachu
Jan 14 at 21:44
add a comment |Â
up vote
2
down vote
accepted
To delete back-to-back Start
and End
lines, this should do in GNU sed:
$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input
if we see Start
, load the next line with N
, then see if the contents of the buffer are just Somename:StartnSomename:End
with Somename
same on both lines (n
is a newline). If so, delete it. Here, 1
is a reference to the first group within (..)
, and matches the same string that was encountered there. .*
just means any number (*
) of any characters (.
).
Using sed -e '/Start/,/End/d'
would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start
and End
, so everything is deleted.
Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â mikro45
Jan 14 at 20:44
@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â ilkkachu
Jan 14 at 21:01
It isn't working for whole file the way, you wrote it. But when I changed it tosed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input
, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â mikro45
Jan 14 at 21:33
@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if:End
could appear on some other line./^.*:Startn.*:End$/
would relax that, and just match any strings before:Start
and:End
. (Without the1
, the(
and)
are also unnecessary since they only act as a place for the1
to point to.)
â ilkkachu
Jan 14 at 21:44
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
To delete back-to-back Start
and End
lines, this should do in GNU sed:
$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input
if we see Start
, load the next line with N
, then see if the contents of the buffer are just Somename:StartnSomename:End
with Somename
same on both lines (n
is a newline). If so, delete it. Here, 1
is a reference to the first group within (..)
, and matches the same string that was encountered there. .*
just means any number (*
) of any characters (.
).
Using sed -e '/Start/,/End/d'
would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start
and End
, so everything is deleted.
To delete back-to-back Start
and End
lines, this should do in GNU sed:
$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input
if we see Start
, load the next line with N
, then see if the contents of the buffer are just Somename:StartnSomename:End
with Somename
same on both lines (n
is a newline). If so, delete it. Here, 1
is a reference to the first group within (..)
, and matches the same string that was encountered there. .*
just means any number (*
) of any characters (.
).
Using sed -e '/Start/,/End/d'
would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start
and End
, so everything is deleted.
edited Jan 14 at 21:43
answered Jan 14 at 20:39
ilkkachu
49.8k674137
49.8k674137
Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â mikro45
Jan 14 at 20:44
@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â ilkkachu
Jan 14 at 21:01
It isn't working for whole file the way, you wrote it. But when I changed it tosed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input
, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â mikro45
Jan 14 at 21:33
@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if:End
could appear on some other line./^.*:Startn.*:End$/
would relax that, and just match any strings before:Start
and:End
. (Without the1
, the(
and)
are also unnecessary since they only act as a place for the1
to point to.)
â ilkkachu
Jan 14 at 21:44
add a comment |Â
Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â mikro45
Jan 14 at 20:44
@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â ilkkachu
Jan 14 at 21:01
It isn't working for whole file the way, you wrote it. But when I changed it tosed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input
, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â mikro45
Jan 14 at 21:33
@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if:End
could appear on some other line./^.*:Startn.*:End$/
would relax that, and just match any strings before:Start
and:End
. (Without the1
, the(
and)
are also unnecessary since they only act as a place for the1
to point to.)
â ilkkachu
Jan 14 at 21:44
Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â mikro45
Jan 14 at 20:44
Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â mikro45
Jan 14 at 20:44
@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â ilkkachu
Jan 14 at 21:01
@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â ilkkachu
Jan 14 at 21:01
It isn't working for whole file the way, you wrote it. But when I changed it to
sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input
, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?â mikro45
Jan 14 at 21:33
It isn't working for whole file the way, you wrote it. But when I changed it to
sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input
, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?â mikro45
Jan 14 at 21:33
@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if
:End
could appear on some other line. /^.*:Startn.*:End$/
would relax that, and just match any strings before :Start
and :End
. (Without the 1
, the (
and )
are also unnecessary since they only act as a place for the 1
to point to.)â ilkkachu
Jan 14 at 21:44
@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if
:End
could appear on some other line. /^.*:Startn.*:End$/
would relax that, and just match any strings before :Start
and :End
. (Without the 1
, the (
and )
are also unnecessary since they only act as a place for the 1
to point to.)â ilkkachu
Jan 14 at 21:44
add a comment |Â
up vote
1
down vote
another solution, as I like trying to do these in awk.
BEGIN
RS="Endn"
ORS="Endn"
NF > 2
using the built in RS
or record separator variable, awk will treat between each Endn
as a record, and presuming that the servername:Start
and servername:End
are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2
line. If this is true, the whole line will be printed, with Endn
used as the output record separator (ORS
)
~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
add a comment |Â
up vote
1
down vote
another solution, as I like trying to do these in awk.
BEGIN
RS="Endn"
ORS="Endn"
NF > 2
using the built in RS
or record separator variable, awk will treat between each Endn
as a record, and presuming that the servername:Start
and servername:End
are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2
line. If this is true, the whole line will be printed, with Endn
used as the output record separator (ORS
)
~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
add a comment |Â
up vote
1
down vote
up vote
1
down vote
another solution, as I like trying to do these in awk.
BEGIN
RS="Endn"
ORS="Endn"
NF > 2
using the built in RS
or record separator variable, awk will treat between each Endn
as a record, and presuming that the servername:Start
and servername:End
are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2
line. If this is true, the whole line will be printed, with Endn
used as the output record separator (ORS
)
~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
another solution, as I like trying to do these in awk.
BEGIN
RS="Endn"
ORS="Endn"
NF > 2
using the built in RS
or record separator variable, awk will treat between each Endn
as a record, and presuming that the servername:Start
and servername:End
are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2
line. If this is true, the whole line will be printed, with Endn
used as the output record separator (ORS
)
~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
answered Jan 17 at 23:23
Guy
7231318
7231318
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f417080%2ftext-processing-get-2-lines-with-exact-text-between-them%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password