Text Processing - How to get pattern A matching line until first occurrence of pattern B matching line?

up vote
1
down vote

favorite

I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.

UPDATED: example_file.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000

What I want is to get, from the bottom up, all the AK5 pattern with R after it, like this:

Pattern A: AK5*R

and get all the lines going up until the first occurrence of pattern B is matched. e.g.:

Pattern B: AK2

Desired output:

First Pattern A matched will be called E1

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

UPDATED: Second Pattern A matched will be called E2

AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

and so on if there are more than 1 pattern A matched.

EDIT: I know sed can do this but I still don't have any luck in getting the line from each pattern A matched to its first occurrence of pattern B matched and store them in a temporary text file to be process further.

This is my example sed command that gets all available pattern B in the example_file.txt

sed -ne '/AK2*/,/AK5*R/p' example_file.txt

Example command logical scenario:

A="AK5*R"
B="AK2"

find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.

(NOTE: First occurrence of $B meaning, starting from each $A line get $A line and the previous lines until the very first $B matching line it encounters. So e.g. if the first $A line starts from the middle line of a file like in line number 50 if the file has 100 total lines then from there move to the previous line until command encounters the very first $B line it sees.) See example below.

example_file2.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A

Output:

AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

edited Feb 9 at 2:19

asked Feb 8 at 10:07

WashichawbachaW

12510

1

You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â€“Â wurtel
Feb 8 at 10:17

1

there's no Pattern B: B=AK2 in your input content. Update your question
â€“Â RomanPerekhrest
Feb 8 at 10:18

@wurtel, There are two Pattern B which is the AK2 in the example_text.file. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In the example_file.txt the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â€“Â WashichawbachaW
Feb 9 at 0:12

@RomanPerekhrest, There is in line number 5: AK2*777*6666666, line number 7: AK2*777*7777777, line number 13: AK2*777*6666666, and line number 15: AK2*777*7777777. Sorry, I think you have literally see B=AK2 as the whole pattern. It's only AK2 is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â€“Â WashichawbachaW
Feb 9 at 0:20

Yes, sed could extract the ranges: tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac. What it could not do is redirect each range to a separate file.
â€“Â Isaac
Feb 9 at 2:48

add a commentÂ |Â

up vote
1
down vote

favorite

I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.

UPDATED: example_file.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000

What I want is to get, from the bottom up, all the AK5 pattern with R after it, like this:

Pattern A: AK5*R

and get all the lines going up until the first occurrence of pattern B is matched. e.g.:

Pattern B: AK2

Desired output:

First Pattern A matched will be called E1

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

UPDATED: Second Pattern A matched will be called E2

AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

and so on if there are more than 1 pattern A matched.

This is my example sed command that gets all available pattern B in the example_file.txt

sed -ne '/AK2*/,/AK5*R/p' example_file.txt

Example command logical scenario:

A="AK5*R"
B="AK2"

find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.

example_file2.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A

Output:

AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

edited Feb 9 at 2:19

asked Feb 8 at 10:07

WashichawbachaW

12510

1

You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â€“Â wurtel
Feb 8 at 10:17

1

there's no Pattern B: B=AK2 in your input content. Update your question
â€“Â RomanPerekhrest
Feb 8 at 10:18

@wurtel, There are two Pattern B which is the AK2 in the example_text.file. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In the example_file.txt the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â€“Â WashichawbachaW
Feb 9 at 0:12

@RomanPerekhrest, There is in line number 5: AK2*777*6666666, line number 7: AK2*777*7777777, line number 13: AK2*777*6666666, and line number 15: AK2*777*7777777. Sorry, I think you have literally see B=AK2 as the whole pattern. It's only AK2 is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â€“Â WashichawbachaW
Feb 9 at 0:20

Yes, sed could extract the ranges: tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac. What it could not do is redirect each range to a separate file.
â€“Â Isaac
Feb 9 at 2:48

add a commentÂ |Â

up vote
1
down vote

favorite

I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.

UPDATED: example_file.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000

What I want is to get, from the bottom up, all the AK5 pattern with R after it, like this:

Pattern A: AK5*R

and get all the lines going up until the first occurrence of pattern B is matched. e.g.:

Pattern B: AK2

Desired output:

First Pattern A matched will be called E1

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

UPDATED: Second Pattern A matched will be called E2

AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

and so on if there are more than 1 pattern A matched.

This is my example sed command that gets all available pattern B in the example_file.txt

sed -ne '/AK2*/,/AK5*R/p' example_file.txt

Example command logical scenario:

A="AK5*R"
B="AK2"

find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.

example_file2.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A

Output:

AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

edited Feb 9 at 2:19

asked Feb 8 at 10:07

WashichawbachaW

12510

I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.

UPDATED: example_file.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000

What I want is to get, from the bottom up, all the AK5 pattern with R after it, like this:

Pattern A: AK5*R

and get all the lines going up until the first occurrence of pattern B is matched. e.g.:

Pattern B: AK2

Desired output:

First Pattern A matched will be called E1

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

UPDATED: Second Pattern A matched will be called E2

AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

and so on if there are more than 1 pattern A matched.

This is my example sed command that gets all available pattern B in the example_file.txt

sed -ne '/AK2*/,/AK5*R/p' example_file.txt

Example command logical scenario:

A="AK5*R"
B="AK2"

find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.

example_file2.txt

ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A

Output:

AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

edited Feb 9 at 2:19

asked Feb 8 at 10:07

WashichawbachaW

12510

edited Feb 9 at 2:19

asked Feb 8 at 10:07

WashichawbachaW

12510

asked Feb 8 at 10:07

WashichawbachaW

12510

asked Feb 8 at 10:07

WashichawbachaW

12510

1

You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â€“Â wurtel
Feb 8 at 10:17

1

there's no Pattern B: B=AK2 in your input content. Update your question
â€“Â RomanPerekhrest
Feb 8 at 10:18

@wurtel, There are two Pattern B which is the AK2 in the example_text.file. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In the example_file.txt the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â€“Â WashichawbachaW
Feb 9 at 0:12

@RomanPerekhrest, There is in line number 5: AK2*777*6666666, line number 7: AK2*777*7777777, line number 13: AK2*777*6666666, and line number 15: AK2*777*7777777. Sorry, I think you have literally see B=AK2 as the whole pattern. It's only AK2 is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â€“Â WashichawbachaW
Feb 9 at 0:20

Yes, sed could extract the ranges: tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac. What it could not do is redirect each range to a separate file.
â€“Â Isaac
Feb 9 at 2:48

add a commentÂ |Â

1

You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â€“Â wurtel
Feb 8 at 10:17

1

there's no Pattern B: B=AK2 in your input content. Update your question
â€“Â RomanPerekhrest
Feb 8 at 10:18

@wurtel, There are two Pattern B which is the AK2 in the example_text.file. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In the example_file.txt the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â€“Â WashichawbachaW
Feb 9 at 0:12

@RomanPerekhrest, There is in line number 5: AK2*777*6666666, line number 7: AK2*777*7777777, line number 13: AK2*777*6666666, and line number 15: AK2*777*7777777. Sorry, I think you have literally see B=AK2 as the whole pattern. It's only AK2 is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â€“Â WashichawbachaW
Feb 9 at 0:20

Yes, sed could extract the ranges: tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac. What it could not do is redirect each range to a separate file.
â€“Â Isaac
Feb 9 at 2:48

You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â€“Â wurtel
Feb 8 at 10:17

there's no Pattern B: B=AK2 in your input content. Update your question
â€“Â RomanPerekhrest
Feb 8 at 10:18

@wurtel, There are two Pattern B which is the AK2 in the example_text.file. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In the example_file.txt the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â€“Â WashichawbachaW
Feb 9 at 0:12

@RomanPerekhrest, There is in line number 5: AK2*777*6666666, line number 7: AK2*777*7777777, line number 13: AK2*777*6666666, and line number 15: AK2*777*7777777. Sorry, I think you have literally see B=AK2 as the whole pattern. It's only AK2 is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â€“Â WashichawbachaW
Feb 9 at 0:20

Yes, sed could extract the ranges: tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac. What it could not do is redirect each range to a separate file.
â€“Â Isaac
Feb 9 at 2:48

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Reading again your description I understand that you want the first match of pattern B from the bottom up until (going up) the first match of pattern A. But the resulting sections should be in the order that the file has.

That requires a lot of logic. The following shell script does it all. Will place the results in the correct internal order in files E and some number, first file (E1) will have the first match from the top, last file will have the last match section.

#!/bin/bash

rm -rf resE* E*

tac ../example_file.txt |
 awk 'BEGINi=1
 /^AK5*R.*/p=1
 if(p==1)f="resE" i;print($0)>>f;close(f)
 /^AK2.*/if(p==1)i++;p=0
 '
set -- resE* 
c=$#
for (( i=1;i<=$c;i++)); do
 pos=$(($c-$i+1))
 [ -f "$1" ] && tac "$1" > "E$pos"
 shift
done

The resulting ranges will be:

$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

edited Feb 9 at 2:42

answered Feb 8 at 18:11

Isaac

6,6381734

I'm trying to execute a command to first get the pattern A matched AK5*R and from that line, it moves to previous line until the very first pattern B AK2 is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666 from the file but the very first AK2 matching line starting from each pattern A AK5*R ). See my updated output.
â€“Â WashichawbachaW
Feb 9 at 1:49

I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â€“Â WashichawbachaW
Feb 9 at 2:44

add a commentÂ |Â

up vote
1
down vote

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handleÃ¢Â€Â”with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'

for f in my_unique_prefix_*; do
 mv "$f" "e$f##my_unique_prefix_.txt";
done

rm e0.txt

There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.

If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt is omitted, and if you don't mind if the files are numbered from e01 rather than from e1, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f e -k - "/$patB/" '999'

rm e00

edited Feb 9 at 3:58

answered Feb 9 at 3:39

Wildcard

22k855153

csplit: /AK2/': match not found` This what happens.
â€“Â WashichawbachaW
Feb 13 at 0:41

@WashichawbachaW, I get, for example: csplit: '/AK2/': match not found on repetition 16. But that doesn't matter. The -k option means that csplit will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e?? and you should see all the files desired.
â€“Â Wildcard
Feb 13 at 0:52

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f422760%2ftext-processing-how-to-get-pattern-a-matching-line-until-first-occurrence-of-p%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

#!/bin/bash

rm -rf resE* E*

tac ../example_file.txt |
 awk 'BEGINi=1
 /^AK5*R.*/p=1
 if(p==1)f="resE" i;print($0)>>f;close(f)
 /^AK2.*/if(p==1)i++;p=0
 '
set -- resE* 
c=$#
for (( i=1;i<=$c;i++)); do
 pos=$(($c-$i+1))
 [ -f "$1" ] && tac "$1" > "E$pos"
 shift
done

The resulting ranges will be:

$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

edited Feb 9 at 2:42

answered Feb 8 at 18:11

Isaac

6,6381734

I'm trying to execute a command to first get the pattern A matched AK5*R and from that line, it moves to previous line until the very first pattern B AK2 is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666 from the file but the very first AK2 matching line starting from each pattern A AK5*R ). See my updated output.
â€“Â WashichawbachaW
Feb 9 at 1:49

I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â€“Â WashichawbachaW
Feb 9 at 2:44

add a commentÂ |Â

up vote
1
down vote

accepted

#!/bin/bash

rm -rf resE* E*

tac ../example_file.txt |
 awk 'BEGINi=1
 /^AK5*R.*/p=1
 if(p==1)f="resE" i;print($0)>>f;close(f)
 /^AK2.*/if(p==1)i++;p=0
 '
set -- resE* 
c=$#
for (( i=1;i<=$c;i++)); do
 pos=$(($c-$i+1))
 [ -f "$1" ] && tac "$1" > "E$pos"
 shift
done

The resulting ranges will be:

$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

edited Feb 9 at 2:42

answered Feb 8 at 18:11

Isaac

6,6381734

I'm trying to execute a command to first get the pattern A matched AK5*R and from that line, it moves to previous line until the very first pattern B AK2 is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666 from the file but the very first AK2 matching line starting from each pattern A AK5*R ). See my updated output.
â€“Â WashichawbachaW
Feb 9 at 1:49

I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â€“Â WashichawbachaW
Feb 9 at 2:44

add a commentÂ |Â

up vote
1
down vote

accepted

#!/bin/bash

rm -rf resE* E*

tac ../example_file.txt |
 awk 'BEGINi=1
 /^AK5*R.*/p=1
 if(p==1)f="resE" i;print($0)>>f;close(f)
 /^AK2.*/if(p==1)i++;p=0
 '
set -- resE* 
c=$#
for (( i=1;i<=$c;i++)); do
 pos=$(($c-$i+1))
 [ -f "$1" ] && tac "$1" > "E$pos"
 shift
done

The resulting ranges will be:

$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

edited Feb 9 at 2:42

answered Feb 8 at 18:11

Isaac

6,6381734

#!/bin/bash

rm -rf resE* E*

tac ../example_file.txt |
 awk 'BEGINi=1
 /^AK5*R.*/p=1
 if(p==1)f="resE" i;print($0)>>f;close(f)
 /^AK2.*/if(p==1)i++;p=0
 '
set -- resE* 
c=$#
for (( i=1;i<=$c;i++)); do
 pos=$(($c-$i+1))
 [ -f "$1" ] && tac "$1" > "E$pos"
 shift
done

The resulting ranges will be:

$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5

$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

edited Feb 9 at 2:42

answered Feb 8 at 18:11

Isaac

6,6381734

edited Feb 9 at 2:42

answered Feb 8 at 18:11

Isaac

6,6381734

answered Feb 8 at 18:11

Isaac

6,6381734

answered Feb 8 at 18:11

Isaac

6,6381734

I'm trying to execute a command to first get the pattern A matched AK5*R and from that line, it moves to previous line until the very first pattern B AK2 is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666 from the file but the very first AK2 matching line starting from each pattern A AK5*R ). See my updated output.
â€“Â WashichawbachaW
Feb 9 at 1:49

I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â€“Â WashichawbachaW
Feb 9 at 2:44

add a commentÂ |Â

I'm trying to execute a command to first get the pattern A matched AK5*R and from that line, it moves to previous line until the very first pattern B AK2 is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666 from the file but the very first AK2 matching line starting from each pattern A AK5*R ). See my updated output.
â€“Â WashichawbachaW
Feb 9 at 1:49

I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â€“Â WashichawbachaW
Feb 9 at 2:44

I'm trying to execute a command to first get the pattern A matched AK5*R and from that line, it moves to previous line until the very first pattern B AK2 is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666 from the file but the very first AK2 matching line starting from each pattern A AK5*R ). See my updated output.
â€“Â WashichawbachaW
Feb 9 at 1:49

I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â€“Â WashichawbachaW
Feb 9 at 2:44

add a commentÂ |Â

up vote
1
down vote

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handleÃ¢Â€Â”with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'

for f in my_unique_prefix_*; do
 mv "$f" "e$f##my_unique_prefix_.txt";
done

rm e0.txt

There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f e -k - "/$patB/" '999'

rm e00

edited Feb 9 at 3:58

answered Feb 9 at 3:39

Wildcard

22k855153

csplit: /AK2/': match not found` This what happens.
â€“Â WashichawbachaW
Feb 13 at 0:41

@WashichawbachaW, I get, for example: csplit: '/AK2/': match not found on repetition 16. But that doesn't matter. The -k option means that csplit will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e?? and you should see all the files desired.
â€“Â Wildcard
Feb 13 at 0:52

add a commentÂ |Â

up vote
1
down vote

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handleÃ¢Â€Â”with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'

for f in my_unique_prefix_*; do
 mv "$f" "e$f##my_unique_prefix_.txt";
done

rm e0.txt

There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f e -k - "/$patB/" '999'

rm e00

edited Feb 9 at 3:58

answered Feb 9 at 3:39

Wildcard

22k855153

csplit: /AK2/': match not found` This what happens.
â€“Â WashichawbachaW
Feb 13 at 0:41

@WashichawbachaW, I get, for example: csplit: '/AK2/': match not found on repetition 16. But that doesn't matter. The -k option means that csplit will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e?? and you should see all the files desired.
â€“Â Wildcard
Feb 13 at 0:52

add a commentÂ |Â

up vote
1
down vote

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handleÃ¢Â€Â”with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'

for f in my_unique_prefix_*; do
 mv "$f" "e$f##my_unique_prefix_.txt";
done

rm e0.txt

There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f e -k - "/$patB/" '999'

rm e00

edited Feb 9 at 3:58

answered Feb 9 at 3:39

Wildcard

22k855153

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handleÃ¢Â€Â”with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'

for f in my_unique_prefix_*; do
 mv "$f" "e$f##my_unique_prefix_.txt";
done

rm e0.txt

There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.

patA='AK5[*]R'
patB='AK2'

printf '%sn' "g/$patA/?$patB?,.p" |
 ex example_file.txt |
 csplit -f e -k - "/$patB/" '999'

rm e00

edited Feb 9 at 3:58

answered Feb 9 at 3:39

Wildcard

22k855153

edited Feb 9 at 3:58

answered Feb 9 at 3:39

Wildcard

22k855153

answered Feb 9 at 3:39

Wildcard

22k855153

answered Feb 9 at 3:39

Wildcard

22k855153

csplit: /AK2/': match not found` This what happens.
â€“Â WashichawbachaW
Feb 13 at 0:41

@WashichawbachaW, I get, for example: csplit: '/AK2/': match not found on repetition 16. But that doesn't matter. The -k option means that csplit will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e?? and you should see all the files desired.
â€“Â Wildcard
Feb 13 at 0:52

add a commentÂ |Â

csplit: /AK2/': match not found` This what happens.
â€“Â WashichawbachaW
Feb 13 at 0:41

@WashichawbachaW, I get, for example: csplit: '/AK2/': match not found on repetition 16. But that doesn't matter. The -k option means that csplit will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e?? and you should see all the files desired.
â€“Â Wildcard
Feb 13 at 0:52

csplit: /AK2/': match not found` This what happens.
â€“Â WashichawbachaW
Feb 13 at 0:41

@WashichawbachaW, I get, for example: csplit: '/AK2/': match not found on repetition 16. But that doesn't matter. The -k option means that csplit will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e?? and you should see all the files desired.
â€“Â Wildcard
Feb 13 at 0:52

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu