Text Processing - How to get pattern A matching line until first occurrence of pattern B matching line?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.
UPDATED: example_file.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000
What I want is to get, from the bottom up, all the AK5
pattern with R
after it, like this:
Pattern A: AK5*R
and get all the lines going up until the first occurrence of pattern B is matched. e.g.:
Pattern B: AK2
Desired output:
First Pattern A matched will be called E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
UPDATED: Second Pattern A matched will be called E2
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
and so on if there are more than 1 pattern A matched.
EDIT: I know sed
can do this but I still don't have any luck in getting the line from each pattern A matched to its first occurrence of pattern B matched and store them in a temporary text file to be process further.
This is my example sed
command that gets all available pattern B in the example_file.txt
sed -ne '/AK2*/,/AK5*R/p' example_file.txt
Example command logical scenario:
A="AK5*R"
B="AK2"
find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.
(NOTE: First occurrence of $B meaning, starting from each $A line get $A line and the previous lines until the very first $B matching line it encounters. So e.g. if the first $A line starts from the middle line of a file like in line number 50 if the file has 100 total lines then from there move to the previous line until command encounters the very first $B line it sees.) See example below.
example_file2.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
Output:
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
text-processing sed
add a comment |Â
up vote
1
down vote
favorite
I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.
UPDATED: example_file.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000
What I want is to get, from the bottom up, all the AK5
pattern with R
after it, like this:
Pattern A: AK5*R
and get all the lines going up until the first occurrence of pattern B is matched. e.g.:
Pattern B: AK2
Desired output:
First Pattern A matched will be called E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
UPDATED: Second Pattern A matched will be called E2
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
and so on if there are more than 1 pattern A matched.
EDIT: I know sed
can do this but I still don't have any luck in getting the line from each pattern A matched to its first occurrence of pattern B matched and store them in a temporary text file to be process further.
This is my example sed
command that gets all available pattern B in the example_file.txt
sed -ne '/AK2*/,/AK5*R/p' example_file.txt
Example command logical scenario:
A="AK5*R"
B="AK2"
find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.
(NOTE: First occurrence of $B meaning, starting from each $A line get $A line and the previous lines until the very first $B matching line it encounters. So e.g. if the first $A line starts from the middle line of a file like in line number 50 if the file has 100 total lines then from there move to the previous line until command encounters the very first $B line it sees.) See example below.
example_file2.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
Output:
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
text-processing sed
1
You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â wurtel
Feb 8 at 10:17
1
there's noPattern B: B=AK2
in your input content. Update your question
â RomanPerekhrest
Feb 8 at 10:18
@wurtel, There are two Pattern B which is theAK2
in theexample_text.file
. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In theexample_file.txt
the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â WashichawbachaW
Feb 9 at 0:12
@RomanPerekhrest, There is in line number 5:AK2*777*6666666
, line number 7:AK2*777*7777777
, line number 13:AK2*777*6666666
, and line number 15:AK2*777*7777777
. Sorry, I think you have literally seeB=AK2
as the whole pattern. It's onlyAK2
is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â WashichawbachaW
Feb 9 at 0:20
Yes, sed could extract the ranges:tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac
. What it could not do is redirect each range to a separate file.
â Isaac
Feb 9 at 2:48
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.
UPDATED: example_file.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000
What I want is to get, from the bottom up, all the AK5
pattern with R
after it, like this:
Pattern A: AK5*R
and get all the lines going up until the first occurrence of pattern B is matched. e.g.:
Pattern B: AK2
Desired output:
First Pattern A matched will be called E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
UPDATED: Second Pattern A matched will be called E2
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
and so on if there are more than 1 pattern A matched.
EDIT: I know sed
can do this but I still don't have any luck in getting the line from each pattern A matched to its first occurrence of pattern B matched and store them in a temporary text file to be process further.
This is my example sed
command that gets all available pattern B in the example_file.txt
sed -ne '/AK2*/,/AK5*R/p' example_file.txt
Example command logical scenario:
A="AK5*R"
B="AK2"
find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.
(NOTE: First occurrence of $B meaning, starting from each $A line get $A line and the previous lines until the very first $B matching line it encounters. So e.g. if the first $A line starts from the middle line of a file like in line number 50 if the file has 100 total lines then from there move to the previous line until command encounters the very first $B line it sees.) See example below.
example_file2.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
Output:
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
text-processing sed
I want to get the lines that, in reverse order, match from pattern A matching line to the first occurrence of pattern B matching line along with the lines that it passes.
UPDATED: example_file.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*6666666
AK5*A
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
AK9*P*20*20*19
SE*69*6969
GE*1*6767
IEA*1*0000000000
What I want is to get, from the bottom up, all the AK5
pattern with R
after it, like this:
Pattern A: AK5*R
and get all the lines going up until the first occurrence of pattern B is matched. e.g.:
Pattern B: AK2
Desired output:
First Pattern A matched will be called E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
UPDATED: Second Pattern A matched will be called E2
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
and so on if there are more than 1 pattern A matched.
EDIT: I know sed
can do this but I still don't have any luck in getting the line from each pattern A matched to its first occurrence of pattern B matched and store them in a temporary text file to be process further.
This is my example sed
command that gets all available pattern B in the example_file.txt
sed -ne '/AK2*/,/AK5*R/p' example_file.txt
Example command logical scenario:
A="AK5*R"
B="AK2"
find the first $A < example_file.txt; # AK5*R
move to previous line until first occurrence of $B line; # AK2*any_number*any_number
get all lines from first $A to its first occurrence of $B and store in a text file; # result > e1.txt
# The same way goes to the second occurrence of pattern A.
(NOTE: First occurrence of $B meaning, starting from each $A line get $A line and the previous lines until the very first $B matching line it encounters. So e.g. if the first $A line starts from the middle line of a file like in line number 50 if the file has 100 total lines then from there move to the previous line until command encounters the very first $B line it sees.) See example below.
example_file2.txt
ISA*00* *00* *ZZ*SIX-SIX6 *12*666666666666 *66666666*6666*U*666666666*6666666666*0*P*
GS*FA*SIX-SIX-SIX*666666666*6666666*6666*6666*X*66666
ST*666*666
AK1*SX*666
AK2*777*6666666
AK5*A
AK2*777*7777777
AK5*A
AK2*777*888888
AK5*A
AK2*777*7777777
AK5*A
AK2*777*5555555
AK5*A
AK2*777*7777777
AK5*A
AK2*777*4545435
AK5*A
AK2*777*7777777
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*A
Output:
AK2*777*0987654
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
text-processing sed
edited Feb 9 at 2:19
asked Feb 8 at 10:07
WashichawbachaW
12510
12510
1
You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â wurtel
Feb 8 at 10:17
1
there's noPattern B: B=AK2
in your input content. Update your question
â RomanPerekhrest
Feb 8 at 10:18
@wurtel, There are two Pattern B which is theAK2
in theexample_text.file
. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In theexample_file.txt
the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â WashichawbachaW
Feb 9 at 0:12
@RomanPerekhrest, There is in line number 5:AK2*777*6666666
, line number 7:AK2*777*7777777
, line number 13:AK2*777*6666666
, and line number 15:AK2*777*7777777
. Sorry, I think you have literally seeB=AK2
as the whole pattern. It's onlyAK2
is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â WashichawbachaW
Feb 9 at 0:20
Yes, sed could extract the ranges:tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac
. What it could not do is redirect each range to a separate file.
â Isaac
Feb 9 at 2:48
add a comment |Â
1
You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â wurtel
Feb 8 at 10:17
1
there's noPattern B: B=AK2
in your input content. Update your question
â RomanPerekhrest
Feb 8 at 10:18
@wurtel, There are two Pattern B which is theAK2
in theexample_text.file
. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In theexample_file.txt
the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.
â WashichawbachaW
Feb 9 at 0:12
@RomanPerekhrest, There is in line number 5:AK2*777*6666666
, line number 7:AK2*777*7777777
, line number 13:AK2*777*6666666
, and line number 15:AK2*777*7777777
. Sorry, I think you have literally seeB=AK2
as the whole pattern. It's onlyAK2
is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanks
â WashichawbachaW
Feb 9 at 0:20
Yes, sed could extract the ranges:tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac
. What it could not do is redirect each range to a separate file.
â Isaac
Feb 9 at 2:48
1
1
You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â wurtel
Feb 8 at 10:17
You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â wurtel
Feb 8 at 10:17
1
1
there's no
Pattern B: B=AK2
in your input content. Update your questionâ RomanPerekhrest
Feb 8 at 10:18
there's no
Pattern B: B=AK2
in your input content. Update your questionâ RomanPerekhrest
Feb 8 at 10:18
@wurtel, There are two Pattern B which is the
AK2
in the example_text.file
. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In the example_file.txt
the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.â WashichawbachaW
Feb 9 at 0:12
@wurtel, There are two Pattern B which is the
AK2
in the example_text.file
. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In the example_file.txt
the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.â WashichawbachaW
Feb 9 at 0:12
@RomanPerekhrest, There is in line number 5:
AK2*777*6666666
, line number 7: AK2*777*7777777
, line number 13: AK2*777*6666666
, and line number 15: AK2*777*7777777
. Sorry, I think you have literally see B=AK2
as the whole pattern. It's only AK2
is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanksâ WashichawbachaW
Feb 9 at 0:20
@RomanPerekhrest, There is in line number 5:
AK2*777*6666666
, line number 7: AK2*777*7777777
, line number 13: AK2*777*6666666
, and line number 15: AK2*777*7777777
. Sorry, I think you have literally see B=AK2
as the whole pattern. It's only AK2
is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanksâ WashichawbachaW
Feb 9 at 0:20
Yes, sed could extract the ranges:
tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac
. What it could not do is redirect each range to a separate file.â Isaac
Feb 9 at 2:48
Yes, sed could extract the ranges:
tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac
. What it could not do is redirect each range to a separate file.â Isaac
Feb 9 at 2:48
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
Reading again your description I understand that you want the first match of pattern B from the bottom up until (going up) the first match of pattern A. But the resulting sections should be in the order that the file has.
That requires a lot of logic. The following shell script does it all. Will place the results in the correct internal order in files E
and some number, first file (E1
) will have the first match from the top, last file will have the last match section.
#!/bin/bash
rm -rf resE* E*
tac ../example_file.txt |
awk 'BEGINi=1
/^AK5*R.*/p=1
if(p==1)f="resE" i;print($0)>>f;close(f)
/^AK2.*/if(p==1)i++;p=0
'
set -- resE*
c=$#
for (( i=1;i<=$c;i++)); do
pos=$(($c-$i+1))
[ -f "$1" ] && tac "$1" > "E$pos"
shift
done
The resulting ranges will be:
$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
I'm trying to execute a command to first get the pattern A matchedAK5*R
and from that line, it moves to previous line until the very first pattern BAK2
is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666
from the file but the very firstAK2
matching line starting from each pattern AAK5*R
). See my updated output.
â WashichawbachaW
Feb 9 at 1:49
I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â WashichawbachaW
Feb 9 at 2:44
add a comment |Â
up vote
1
down vote
POSIX ex
to the rescue again!
ex
is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.
The following one-liner works perfectly on your example_file2.txt
:
printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt
On your example_file.txt
, it also works, but because the g
lobal command in ex
can't write to a separate destination for each range acted upon, the desired two output files are merged like so:
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
However, this is easy enough to handleâÂÂwith another POSIX tool, csplit
, which is designed to split files according to a "context."
Portable POSIX solution:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'
for f in my_unique_prefix_*; do
mv "$f" "e$f##my_unique_prefix_.txt";
done
rm e0.txt
There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.
If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt
is omitted, and if you don't mind if the files are numbered from e01
rather than from e1
, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f e -k - "/$patB/" '999'
rm e00
csplit:
/AK2/': match not found` This what happens.
â WashichawbachaW
Feb 13 at 0:41
@WashichawbachaW, I get, for example:csplit: '/AK2/': match not found on repetition 16
. But that doesn't matter. The-k
option means thatcsplit
will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check withls -l e??
and you should see all the files desired.
â Wildcard
Feb 13 at 0:52
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Reading again your description I understand that you want the first match of pattern B from the bottom up until (going up) the first match of pattern A. But the resulting sections should be in the order that the file has.
That requires a lot of logic. The following shell script does it all. Will place the results in the correct internal order in files E
and some number, first file (E1
) will have the first match from the top, last file will have the last match section.
#!/bin/bash
rm -rf resE* E*
tac ../example_file.txt |
awk 'BEGINi=1
/^AK5*R.*/p=1
if(p==1)f="resE" i;print($0)>>f;close(f)
/^AK2.*/if(p==1)i++;p=0
'
set -- resE*
c=$#
for (( i=1;i<=$c;i++)); do
pos=$(($c-$i+1))
[ -f "$1" ] && tac "$1" > "E$pos"
shift
done
The resulting ranges will be:
$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
I'm trying to execute a command to first get the pattern A matchedAK5*R
and from that line, it moves to previous line until the very first pattern BAK2
is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666
from the file but the very firstAK2
matching line starting from each pattern AAK5*R
). See my updated output.
â WashichawbachaW
Feb 9 at 1:49
I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â WashichawbachaW
Feb 9 at 2:44
add a comment |Â
up vote
1
down vote
accepted
Reading again your description I understand that you want the first match of pattern B from the bottom up until (going up) the first match of pattern A. But the resulting sections should be in the order that the file has.
That requires a lot of logic. The following shell script does it all. Will place the results in the correct internal order in files E
and some number, first file (E1
) will have the first match from the top, last file will have the last match section.
#!/bin/bash
rm -rf resE* E*
tac ../example_file.txt |
awk 'BEGINi=1
/^AK5*R.*/p=1
if(p==1)f="resE" i;print($0)>>f;close(f)
/^AK2.*/if(p==1)i++;p=0
'
set -- resE*
c=$#
for (( i=1;i<=$c;i++)); do
pos=$(($c-$i+1))
[ -f "$1" ] && tac "$1" > "E$pos"
shift
done
The resulting ranges will be:
$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
I'm trying to execute a command to first get the pattern A matchedAK5*R
and from that line, it moves to previous line until the very first pattern BAK2
is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666
from the file but the very firstAK2
matching line starting from each pattern AAK5*R
). See my updated output.
â WashichawbachaW
Feb 9 at 1:49
I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â WashichawbachaW
Feb 9 at 2:44
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Reading again your description I understand that you want the first match of pattern B from the bottom up until (going up) the first match of pattern A. But the resulting sections should be in the order that the file has.
That requires a lot of logic. The following shell script does it all. Will place the results in the correct internal order in files E
and some number, first file (E1
) will have the first match from the top, last file will have the last match section.
#!/bin/bash
rm -rf resE* E*
tac ../example_file.txt |
awk 'BEGINi=1
/^AK5*R.*/p=1
if(p==1)f="resE" i;print($0)>>f;close(f)
/^AK2.*/if(p==1)i++;p=0
'
set -- resE*
c=$#
for (( i=1;i<=$c;i++)); do
pos=$(($c-$i+1))
[ -f "$1" ] && tac "$1" > "E$pos"
shift
done
The resulting ranges will be:
$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
Reading again your description I understand that you want the first match of pattern B from the bottom up until (going up) the first match of pattern A. But the resulting sections should be in the order that the file has.
That requires a lot of logic. The following shell script does it all. Will place the results in the correct internal order in files E
and some number, first file (E1
) will have the first match from the top, last file will have the last match section.
#!/bin/bash
rm -rf resE* E*
tac ../example_file.txt |
awk 'BEGINi=1
/^AK5*R.*/p=1
if(p==1)f="resE" i;print($0)>>f;close(f)
/^AK2.*/if(p==1)i++;p=0
'
set -- resE*
c=$#
for (( i=1;i<=$c;i++)); do
pos=$(($c-$i+1))
[ -f "$1" ] && tac "$1" > "E$pos"
shift
done
The resulting ranges will be:
$ cat E1
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
$ cat E2
AK2*777*7777777
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
edited Feb 9 at 2:42
answered Feb 8 at 18:11
Isaac
6,6381734
6,6381734
I'm trying to execute a command to first get the pattern A matchedAK5*R
and from that line, it moves to previous line until the very first pattern BAK2
is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666
from the file but the very firstAK2
matching line starting from each pattern AAK5*R
). See my updated output.
â WashichawbachaW
Feb 9 at 1:49
I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â WashichawbachaW
Feb 9 at 2:44
add a comment |Â
I'm trying to execute a command to first get the pattern A matchedAK5*R
and from that line, it moves to previous line until the very first pattern BAK2
is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666
from the file but the very firstAK2
matching line starting from each pattern AAK5*R
). See my updated output.
â WashichawbachaW
Feb 9 at 1:49
I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â WashichawbachaW
Feb 9 at 2:44
I'm trying to execute a command to first get the pattern A matched
AK5*R
and from that line, it moves to previous line until the very first pattern B AK2
is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666
from the file but the very first AK2
matching line starting from each pattern A AK5*R
). See my updated output.â WashichawbachaW
Feb 9 at 1:49
I'm trying to execute a command to first get the pattern A matched
AK5*R
and from that line, it moves to previous line until the very first pattern B AK2
is found (Note: Not the very first pattern B matching line which is line number 5:AK2*777*6666666
from the file but the very first AK2
matching line starting from each pattern A AK5*R
). See my updated output.â WashichawbachaW
Feb 9 at 1:49
I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â WashichawbachaW
Feb 9 at 2:44
I will run some series of test. But so far, it prints the way I want it to be. I will mark it check after I'm done with some files I have here.
â WashichawbachaW
Feb 9 at 2:44
add a comment |Â
up vote
1
down vote
POSIX ex
to the rescue again!
ex
is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.
The following one-liner works perfectly on your example_file2.txt
:
printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt
On your example_file.txt
, it also works, but because the g
lobal command in ex
can't write to a separate destination for each range acted upon, the desired two output files are merged like so:
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
However, this is easy enough to handleâÂÂwith another POSIX tool, csplit
, which is designed to split files according to a "context."
Portable POSIX solution:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'
for f in my_unique_prefix_*; do
mv "$f" "e$f##my_unique_prefix_.txt";
done
rm e0.txt
There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.
If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt
is omitted, and if you don't mind if the files are numbered from e01
rather than from e1
, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f e -k - "/$patB/" '999'
rm e00
csplit:
/AK2/': match not found` This what happens.
â WashichawbachaW
Feb 13 at 0:41
@WashichawbachaW, I get, for example:csplit: '/AK2/': match not found on repetition 16
. But that doesn't matter. The-k
option means thatcsplit
will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check withls -l e??
and you should see all the files desired.
â Wildcard
Feb 13 at 0:52
add a comment |Â
up vote
1
down vote
POSIX ex
to the rescue again!
ex
is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.
The following one-liner works perfectly on your example_file2.txt
:
printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt
On your example_file.txt
, it also works, but because the g
lobal command in ex
can't write to a separate destination for each range acted upon, the desired two output files are merged like so:
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
However, this is easy enough to handleâÂÂwith another POSIX tool, csplit
, which is designed to split files according to a "context."
Portable POSIX solution:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'
for f in my_unique_prefix_*; do
mv "$f" "e$f##my_unique_prefix_.txt";
done
rm e0.txt
There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.
If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt
is omitted, and if you don't mind if the files are numbered from e01
rather than from e1
, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f e -k - "/$patB/" '999'
rm e00
csplit:
/AK2/': match not found` This what happens.
â WashichawbachaW
Feb 13 at 0:41
@WashichawbachaW, I get, for example:csplit: '/AK2/': match not found on repetition 16
. But that doesn't matter. The-k
option means thatcsplit
will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check withls -l e??
and you should see all the files desired.
â Wildcard
Feb 13 at 0:52
add a comment |Â
up vote
1
down vote
up vote
1
down vote
POSIX ex
to the rescue again!
ex
is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.
The following one-liner works perfectly on your example_file2.txt
:
printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt
On your example_file.txt
, it also works, but because the g
lobal command in ex
can't write to a separate destination for each range acted upon, the desired two output files are merged like so:
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
However, this is easy enough to handleâÂÂwith another POSIX tool, csplit
, which is designed to split files according to a "context."
Portable POSIX solution:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'
for f in my_unique_prefix_*; do
mv "$f" "e$f##my_unique_prefix_.txt";
done
rm e0.txt
There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.
If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt
is omitted, and if you don't mind if the files are numbered from e01
rather than from e1
, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f e -k - "/$patB/" '999'
rm e00
POSIX ex
to the rescue again!
ex
is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.
The following one-liner works perfectly on your example_file2.txt
:
printf '%sn' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt
On your example_file.txt
, it also works, but because the g
lobal command in ex
can't write to a separate destination for each range acted upon, the desired two output files are merged like so:
AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5
However, this is easy enough to handleâÂÂwith another POSIX tool, csplit
, which is designed to split files according to a "context."
Portable POSIX solution:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '999'
for f in my_unique_prefix_*; do
mv "$f" "e$f##my_unique_prefix_.txt";
done
rm e0.txt
There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.
If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt
is omitted, and if you don't mind if the files are numbered from e01
rather than from e1
, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:
patA='AK5[*]R'
patB='AK2'
printf '%sn' "g/$patA/?$patB?,.p" |
ex example_file.txt |
csplit -f e -k - "/$patB/" '999'
rm e00
edited Feb 9 at 3:58
answered Feb 9 at 3:39
Wildcard
22k855153
22k855153
csplit:
/AK2/': match not found` This what happens.
â WashichawbachaW
Feb 13 at 0:41
@WashichawbachaW, I get, for example:csplit: '/AK2/': match not found on repetition 16
. But that doesn't matter. The-k
option means thatcsplit
will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check withls -l e??
and you should see all the files desired.
â Wildcard
Feb 13 at 0:52
add a comment |Â
csplit:
/AK2/': match not found` This what happens.
â WashichawbachaW
Feb 13 at 0:41
@WashichawbachaW, I get, for example:csplit: '/AK2/': match not found on repetition 16
. But that doesn't matter. The-k
option means thatcsplit
will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check withls -l e??
and you should see all the files desired.
â Wildcard
Feb 13 at 0:52
csplit:
/AK2/': match not found` This what happens.â WashichawbachaW
Feb 13 at 0:41
csplit:
/AK2/': match not found` This what happens.â WashichawbachaW
Feb 13 at 0:41
@WashichawbachaW, I get, for example:
csplit: '/AK2/': match not found on repetition 16
. But that doesn't matter. The -k
option means that csplit
will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e??
and you should see all the files desired.â Wildcard
Feb 13 at 0:52
@WashichawbachaW, I get, for example:
csplit: '/AK2/': match not found on repetition 16
. But that doesn't matter. The -k
option means that csplit
will leave the files it already created in place, even though it subsequently encountered an error (because you don't have 999 instances of AK2 in your input file). If you're using the version of the command at the very end of my answer, check with ls -l e??
and you should see all the files desired.â Wildcard
Feb 13 at 0:52
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f422760%2ftext-processing-how-to-get-pattern-a-matching-line-until-first-occurrence-of-p%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
You say "don't have any luck in getting the line from pattern A to B" but you apparently want all the lines from patterm B to pattern A, at least that's whet you show in the desired output. Your first language is probably not English, but please try to edit your question to make it clearer what you want.
â wurtel
Feb 8 at 10:17
1
there's no
Pattern B: B=AK2
in your input content. Update your questionâ RomanPerekhrest
Feb 8 at 10:18
@wurtel, There are two Pattern B which is the
AK2
in theexample_text.file
. I don't want to print all the lines from pattern B to A. As you can see I separated them in my desired output. I want a command that finds first pattern A and then move to previous lines until the first match of pattern be is found. In theexample_file.txt
the first match of pattern A is in line number 12. So from that point it moves up until first occurrence of pattern B is matched which is in line number 7. The same goes to the 2nd pattern A matched where pattern B is in line number 15.â WashichawbachaW
Feb 9 at 0:12
@RomanPerekhrest, There is in line number 5:
AK2*777*6666666
, line number 7:AK2*777*7777777
, line number 13:AK2*777*6666666
, and line number 15:AK2*777*7777777
. Sorry, I think you have literally seeB=AK2
as the whole pattern. It's onlyAK2
is the pattern. I just put it in a variable B for representation of consistent pattern I want to find. Anyways, I'm just gonna correct this section to prevent confusion. Thanksâ WashichawbachaW
Feb 9 at 0:20
Yes, sed could extract the ranges:
tac ../infile | sed -ne '/^AK5*R/,/AK2*/p' | tac
. What it could not do is redirect each range to a separate file.â Isaac
Feb 9 at 2:48