Text Processing - Get 2 lines with exact text between them

up vote
1
down vote

favorite

I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.

I am processing input like this:

Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

, and my desired output is this:

Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.

I am running Ubuntu 17.10.

Looking forward to any help.

edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.

edited Jan 14 at 21:46

don_crissti

46.6k15124153

asked Jan 14 at 18:19

mikro45

add a commentÂ |Â

up vote
1
down vote

favorite

I am processing input like this:

Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

, and my desired output is this:

Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.

I am running Ubuntu 17.10.

Looking forward to any help.

edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.

edited Jan 14 at 21:46

don_crissti

46.6k15124153

asked Jan 14 at 18:19

mikro45

add a commentÂ |Â

up vote
1
down vote

favorite

I am processing input like this:

Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

, and my desired output is this:

Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.

I am running Ubuntu 17.10.

Looking forward to any help.

edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.

edited Jan 14 at 21:46

don_crissti

46.6k15124153

asked Jan 14 at 18:19

mikro45

I am processing input like this:

Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

, and my desired output is this:

Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.

I am running Ubuntu 17.10.

Looking forward to any help.

edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.

edited Jan 14 at 21:46

don_crissti

46.6k15124153

asked Jan 14 at 18:19

mikro45

edited Jan 14 at 21:46

don_crissti

46.6k15124153

edited Jan 14 at 21:46

don_crissti

46.6k15124153

edited Jan 14 at 21:46

don_crissti

46.6k15124153

asked Jan 14 at 18:19

mikro45

asked Jan 14 at 18:19

mikro45

asked Jan 14 at 18:19

mikro45

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

To delete back-to-back Start and End lines, this should do in GNU sed:

$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input

if we see Start, load the next line with N, then see if the contents of the buffer are just Somename:StartnSomename:End with Somename same on both lines (n is a newline). If so, delete it. Here, 1 is a reference to the first group within (..), and matches the same string that was encountered there. .* just means any number (*) of any characters (.).

Using sed -e '/Start/,/End/d' would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start and End, so everything is deleted.

edited Jan 14 at 21:43

answered Jan 14 at 20:39

ilkkachu

49.8k674137

Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â€“Â mikro45
Jan 14 at 20:44

@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â€“Â ilkkachu
Jan 14 at 21:01

It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â€“Â mikro45
Jan 14 at 21:33

@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
â€“Â ilkkachu
Jan 14 at 21:44

add a commentÂ |Â

up vote
1
down vote

another solution, as I like trying to do these in awk.

BEGIN 
 RS="Endn"
 ORS="Endn"
 
NF > 2

using the built in RS or record separator variable, awk will treat between each Endn as a record, and presuming that the servername:Start and servername:End are both single words, its just a case of printing the lines with more that 2 fields via the NF > 2 line. If this is true, the whole line will be printed, with Endn used as the output record separator (ORS)

~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

answered Jan 17 at 23:23

Guy

7231318

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f417080%2ftext-processing-get-2-lines-with-exact-text-between-them%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

To delete back-to-back Start and End lines, this should do in GNU sed:

$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input

edited Jan 14 at 21:43

answered Jan 14 at 20:39

ilkkachu

49.8k674137

Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â€“Â mikro45
Jan 14 at 20:44

@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â€“Â ilkkachu
Jan 14 at 21:01

It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â€“Â mikro45
Jan 14 at 21:33

@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
â€“Â ilkkachu
Jan 14 at 21:44

add a commentÂ |Â

up vote
2
down vote

accepted

To delete back-to-back Start and End lines, this should do in GNU sed:

$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input

edited Jan 14 at 21:43

answered Jan 14 at 20:39

ilkkachu

49.8k674137

Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â€“Â mikro45
Jan 14 at 20:44

@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â€“Â ilkkachu
Jan 14 at 21:01

It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â€“Â mikro45
Jan 14 at 21:33

@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
â€“Â ilkkachu
Jan 14 at 21:44

add a commentÂ |Â

up vote
2
down vote

accepted

To delete back-to-back Start and End lines, this should do in GNU sed:

$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input

edited Jan 14 at 21:43

answered Jan 14 at 20:39

ilkkachu

49.8k674137

To delete back-to-back Start and End lines, this should do in GNU sed:

$ sed -e '/Start/ N; /^(.*):Startn1:End$/d ' < input

edited Jan 14 at 21:43

answered Jan 14 at 20:39

ilkkachu

49.8k674137

edited Jan 14 at 21:43

answered Jan 14 at 20:39

ilkkachu

49.8k674137

answered Jan 14 at 20:39

ilkkachu

49.8k674137

answered Jan 14 at 20:39

ilkkachu

49.8k674137

Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â€“Â mikro45
Jan 14 at 20:44

@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â€“Â ilkkachu
Jan 14 at 21:01

It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â€“Â mikro45
Jan 14 at 21:33

@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
â€“Â ilkkachu
Jan 14 at 21:44

add a commentÂ |Â

Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â€“Â mikro45
Jan 14 at 20:44

@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â€“Â ilkkachu
Jan 14 at 21:01

It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â€“Â mikro45
Jan 14 at 21:33

@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
â€“Â ilkkachu
Jan 14 at 21:44

Actually it isn't literally "Server2" and "Server3". There are actual unknown URLs and the file is much longer than that. Pardon me for not specifying this part.
â€“Â mikro45
Jan 14 at 20:44

@mikro45, argh, no, pardon me for missing the sentence with the actual point of the question.
â€“Â ilkkachu
Jan 14 at 21:01

It isn't working for whole file the way, you wrote it. But when I changed it to sed -e '/Start/ N; /^(.*):Startn(.*):End$/d ' < input, changing "1" for "(.*)", it works now. Can you explain what do "1" and "(.*)" do?
â€“Â mikro45
Jan 14 at 21:33

@mikro45, yeah, I assumed the label would repeat identically, as it seemed to be in the example. That was mostly to avoid the possibility that the pattern would match something else, if :End could appear on some other line. /^.*:Startn.*:End$/ would relax that, and just match any strings before :Start and :End. (Without the 1, the ( and ) are also unnecessary since they only act as a place for the 1 to point to.)
â€“Â ilkkachu
Jan 14 at 21:44

add a commentÂ |Â

up vote
1
down vote

another solution, as I like trying to do these in awk.

BEGIN 
 RS="Endn"
 ORS="Endn"
 
NF > 2

~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

answered Jan 17 at 23:23

Guy

7231318

add a commentÂ |Â

up vote
1
down vote

another solution, as I like trying to do these in awk.

BEGIN 
 RS="Endn"
 ORS="Endn"
 
NF > 2

~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

answered Jan 17 at 23:23

Guy

7231318

add a commentÂ |Â

up vote
1
down vote

another solution, as I like trying to do these in awk.

BEGIN 
 RS="Endn"
 ORS="Endn"
 
NF > 2

~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

answered Jan 17 at 23:23

Guy

7231318

another solution, as I like trying to do these in awk.

BEGIN 
 RS="Endn"
 ORS="Endn"
 
NF > 2

~$>echo '
Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End
' | awk 'BEGIN RS="Endn"; ORS="Endn"; NF > 2;'
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

answered Jan 17 at 23:23

Guy

7231318

answered Jan 17 at 23:23

Guy

7231318

answered Jan 17 at 23:23

Guy

7231318

answered Jan 17 at 23:23

Guy

7231318

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu