Extracting 2 parts of a string using awk [closed]

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I would like to extract and print two patterns of a string
My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.
F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314
F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315
I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is
A309 A314
B309 B313
To clarify further I need help extracting only the characters following F2 and F4.
awk string
closed as unclear what you're asking by ñÃÂsýù÷, Jeff Schaller, slm⦠Jul 13 at 3:51
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
 |Â
show 1 more comment
up vote
0
down vote
favorite
I would like to extract and print two patterns of a string
My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.
F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314
F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315
I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is
A309 A314
B309 B313
To clarify further I need help extracting only the characters following F2 and F4.
awk string
closed as unclear what you're asking by ñÃÂsýù÷, Jeff Schaller, slm⦠Jul 13 at 3:51
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
1
Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
â steeldriver
Jul 12 at 18:58
looks like column 2 is dropped (it has an "F4", not an F2 or F6)
â Jeff Schaller
Jul 12 at 19:16
2
note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
â Jeff Schaller
Jul 12 at 19:20
@steeldriver it is pattern based. The F# markers could be anywhere on the string.
â Allan GItobu
Jul 12 at 20:17
I have updated the text. I did not notice I had F6 twice on the second line
â Allan GItobu
Jul 12 at 20:18
 |Â
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I would like to extract and print two patterns of a string
My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.
F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314
F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315
I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is
A309 A314
B309 B313
To clarify further I need help extracting only the characters following F2 and F4.
awk string
I would like to extract and print two patterns of a string
My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.
F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314
F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315
I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is
A309 A314
B309 B313
To clarify further I need help extracting only the characters following F2 and F4.
awk string
edited Jul 13 at 17:44
asked Jul 12 at 18:33
Allan GItobu
253
253
closed as unclear what you're asking by ñÃÂsýù÷, Jeff Schaller, slm⦠Jul 13 at 3:51
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as unclear what you're asking by ñÃÂsýù÷, Jeff Schaller, slm⦠Jul 13 at 3:51
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
1
Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
â steeldriver
Jul 12 at 18:58
looks like column 2 is dropped (it has an "F4", not an F2 or F6)
â Jeff Schaller
Jul 12 at 19:16
2
note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
â Jeff Schaller
Jul 12 at 19:20
@steeldriver it is pattern based. The F# markers could be anywhere on the string.
â Allan GItobu
Jul 12 at 20:17
I have updated the text. I did not notice I had F6 twice on the second line
â Allan GItobu
Jul 12 at 20:18
 |Â
show 1 more comment
1
Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
â steeldriver
Jul 12 at 18:58
looks like column 2 is dropped (it has an "F4", not an F2 or F6)
â Jeff Schaller
Jul 12 at 19:16
2
note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
â Jeff Schaller
Jul 12 at 19:20
@steeldriver it is pattern based. The F# markers could be anywhere on the string.
â Allan GItobu
Jul 12 at 20:17
I have updated the text. I did not notice I had F6 twice on the second line
â Allan GItobu
Jul 12 at 20:18
1
1
Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
â steeldriver
Jul 12 at 18:58
Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
â steeldriver
Jul 12 at 18:58
looks like column 2 is dropped (it has an "F4", not an F2 or F6)
â Jeff Schaller
Jul 12 at 19:16
looks like column 2 is dropped (it has an "F4", not an F2 or F6)
â Jeff Schaller
Jul 12 at 19:16
2
2
note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
â Jeff Schaller
Jul 12 at 19:20
note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
â Jeff Schaller
Jul 12 at 19:20
@steeldriver it is pattern based. The F# markers could be anywhere on the string.
â Allan GItobu
Jul 12 at 20:17
@steeldriver it is pattern based. The F# markers could be anywhere on the string.
â Allan GItobu
Jul 12 at 20:17
I have updated the text. I did not notice I had F6 twice on the second line
â Allan GItobu
Jul 12 at 20:18
I have updated the text. I did not notice I had F6 twice on the second line
â Allan GItobu
Jul 12 at 20:18
 |Â
show 1 more comment
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
The following awk script is an approximation of what I think your requirement is:
for(i=1;i<=NF;i++)
if (match($i, "F2....$") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....$") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.
The output, based on your input, is:
A309 A314
B309 B315
An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:
for(i=1;i<=NF;i++)
if (match($i, "F2....") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
â Jeff Schaller
Jul 12 at 23:25
Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
â Allan GItobu
Jul 13 at 17:32
You'd call that awk script with a parameter or with redirected input:./test_awk2.txt < input-fileor as./test_awk2.txt input-file.
â Jeff Schaller
Jul 14 at 1:01
add a comment |Â
up vote
1
down vote
With Perl, using a lookbehind for the anchor characters:
$ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
A309 A313
B309 B313
add a comment |Â
up vote
0
down vote
How about this:
echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
The following awk script is an approximation of what I think your requirement is:
for(i=1;i<=NF;i++)
if (match($i, "F2....$") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....$") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.
The output, based on your input, is:
A309 A314
B309 B315
An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:
for(i=1;i<=NF;i++)
if (match($i, "F2....") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
â Jeff Schaller
Jul 12 at 23:25
Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
â Allan GItobu
Jul 13 at 17:32
You'd call that awk script with a parameter or with redirected input:./test_awk2.txt < input-fileor as./test_awk2.txt input-file.
â Jeff Schaller
Jul 14 at 1:01
add a comment |Â
up vote
1
down vote
accepted
The following awk script is an approximation of what I think your requirement is:
for(i=1;i<=NF;i++)
if (match($i, "F2....$") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....$") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.
The output, based on your input, is:
A309 A314
B309 B315
An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:
for(i=1;i<=NF;i++)
if (match($i, "F2....") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
â Jeff Schaller
Jul 12 at 23:25
Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
â Allan GItobu
Jul 13 at 17:32
You'd call that awk script with a parameter or with redirected input:./test_awk2.txt < input-fileor as./test_awk2.txt input-file.
â Jeff Schaller
Jul 14 at 1:01
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
The following awk script is an approximation of what I think your requirement is:
for(i=1;i<=NF;i++)
if (match($i, "F2....$") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....$") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.
The output, based on your input, is:
A309 A314
B309 B315
An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:
for(i=1;i<=NF;i++)
if (match($i, "F2....") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
The following awk script is an approximation of what I think your requirement is:
for(i=1;i<=NF;i++)
if (match($i, "F2....$") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....$") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.
The output, based on your input, is:
A309 A314
B309 B315
An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:
for(i=1;i<=NF;i++)
if (match($i, "F2....") > 0)
printf "%s ", substr($i, RSTART + 2, 4);
if (match($i, "F6....") > 0)
printf "% s", substr($i, RSTART + 2, 4);
print ""
edited Jul 14 at 1:03
answered Jul 12 at 19:32
Jeff Schaller
30.8k846104
30.8k846104
Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
â Jeff Schaller
Jul 12 at 23:25
Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
â Allan GItobu
Jul 13 at 17:32
You'd call that awk script with a parameter or with redirected input:./test_awk2.txt < input-fileor as./test_awk2.txt input-file.
â Jeff Schaller
Jul 14 at 1:01
add a comment |Â
Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
â Jeff Schaller
Jul 12 at 23:25
Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
â Allan GItobu
Jul 13 at 17:32
You'd call that awk script with a parameter or with redirected input:./test_awk2.txt < input-fileor as./test_awk2.txt input-file.
â Jeff Schaller
Jul 14 at 1:01
Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
â Jeff Schaller
Jul 12 at 23:25
Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
â Jeff Schaller
Jul 12 at 23:25
Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
â Allan GItobu
Jul 13 at 17:32
Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
â Allan GItobu
Jul 13 at 17:32
You'd call that awk script with a parameter or with redirected input:
./test_awk2.txt < input-file or as ./test_awk2.txt input-file.â Jeff Schaller
Jul 14 at 1:01
You'd call that awk script with a parameter or with redirected input:
./test_awk2.txt < input-file or as ./test_awk2.txt input-file.â Jeff Schaller
Jul 14 at 1:01
add a comment |Â
up vote
1
down vote
With Perl, using a lookbehind for the anchor characters:
$ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
A309 A313
B309 B313
add a comment |Â
up vote
1
down vote
With Perl, using a lookbehind for the anchor characters:
$ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
A309 A313
B309 B313
add a comment |Â
up vote
1
down vote
up vote
1
down vote
With Perl, using a lookbehind for the anchor characters:
$ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
A309 A313
B309 B313
With Perl, using a lookbehind for the anchor characters:
$ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
A309 A313
B309 B313
answered Jul 12 at 22:18
steeldriver
30.9k34877
30.9k34877
add a comment |Â
add a comment |Â
up vote
0
down vote
How about this:
echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2
add a comment |Â
up vote
0
down vote
How about this:
echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2
add a comment |Â
up vote
0
down vote
up vote
0
down vote
How about this:
echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2
How about this:
echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2
answered Jul 12 at 18:54
user48452
465
465
add a comment |Â
add a comment |Â
1
Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
â steeldriver
Jul 12 at 18:58
looks like column 2 is dropped (it has an "F4", not an F2 or F6)
â Jeff Schaller
Jul 12 at 19:16
2
note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
â Jeff Schaller
Jul 12 at 19:20
@steeldriver it is pattern based. The F# markers could be anywhere on the string.
â Allan GItobu
Jul 12 at 20:17
I have updated the text. I did not notice I had F6 twice on the second line
â Allan GItobu
Jul 12 at 20:18