Extracting 2 parts of a string using awk [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I would like to extract and print two patterns of a string



My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.



F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314

F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315


I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is



A309 A314

B309 B313


To clarify further I need help extracting only the characters following F2 and F4.







share|improve this question













closed as unclear what you're asking by αғsнιη, Jeff Schaller, slm♦ Jul 13 at 3:51


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 1




    Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
    – steeldriver
    Jul 12 at 18:58










  • looks like column 2 is dropped (it has an "F4", not an F2 or F6)
    – Jeff Schaller
    Jul 12 at 19:16






  • 2




    note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
    – Jeff Schaller
    Jul 12 at 19:20










  • @steeldriver it is pattern based. The F# markers could be anywhere on the string.
    – Allan GItobu
    Jul 12 at 20:17










  • I have updated the text. I did not notice I had F6 twice on the second line
    – Allan GItobu
    Jul 12 at 20:18














up vote
0
down vote

favorite












I would like to extract and print two patterns of a string



My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.



F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314

F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315


I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is



A309 A314

B309 B313


To clarify further I need help extracting only the characters following F2 and F4.







share|improve this question













closed as unclear what you're asking by αғsнιη, Jeff Schaller, slm♦ Jul 13 at 3:51


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 1




    Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
    – steeldriver
    Jul 12 at 18:58










  • looks like column 2 is dropped (it has an "F4", not an F2 or F6)
    – Jeff Schaller
    Jul 12 at 19:16






  • 2




    note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
    – Jeff Schaller
    Jul 12 at 19:20










  • @steeldriver it is pattern based. The F# markers could be anywhere on the string.
    – Allan GItobu
    Jul 12 at 20:17










  • I have updated the text. I did not notice I had F6 twice on the second line
    – Allan GItobu
    Jul 12 at 20:18












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I would like to extract and print two patterns of a string



My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.



F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314

F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315


I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is



A309 A314

B309 B313


To clarify further I need help extracting only the characters following F2 and F4.







share|improve this question













I would like to extract and print two patterns of a string



My file has hundreds of lines of text but here are two of them. Each line has a marker F1, F2, F4, F4, F5, F6, and F7 and each of these markers is followed by 4 characters.



F1A308F2A309 F3A310F4A311 F5A312F6A313F7A314

F1B308F2B309 F3B310F4B317 F5B312F6B313F7B315


I would like to extract the 4 characters after the pattern "F2" and the 4 characters after the pattern "F6" so that the output is



A309 A314

B309 B313


To clarify further I need help extracting only the characters following F2 and F4.









share|improve this question












share|improve this question




share|improve this question








edited Jul 13 at 17:44
























asked Jul 12 at 18:33









Allan GItobu

253




253




closed as unclear what you're asking by αғsнιη, Jeff Schaller, slm♦ Jul 13 at 3:51


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as unclear what you're asking by αғsнιη, Jeff Schaller, slm♦ Jul 13 at 3:51


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.









  • 1




    Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
    – steeldriver
    Jul 12 at 18:58










  • looks like column 2 is dropped (it has an "F4", not an F2 or F6)
    – Jeff Schaller
    Jul 12 at 19:16






  • 2




    note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
    – Jeff Schaller
    Jul 12 at 19:20










  • @steeldriver it is pattern based. The F# markers could be anywhere on the string.
    – Allan GItobu
    Jul 12 at 20:17










  • I have updated the text. I did not notice I had F6 twice on the second line
    – Allan GItobu
    Jul 12 at 20:18












  • 1




    Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
    – steeldriver
    Jul 12 at 18:58










  • looks like column 2 is dropped (it has an "F4", not an F2 or F6)
    – Jeff Schaller
    Jul 12 at 19:16






  • 2




    note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
    – Jeff Schaller
    Jul 12 at 19:20










  • @steeldriver it is pattern based. The F# markers could be anywhere on the string.
    – Allan GItobu
    Jul 12 at 20:17










  • I have updated the text. I did not notice I had F6 twice on the second line
    – Allan GItobu
    Jul 12 at 20:18







1




1




Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
– steeldriver
Jul 12 at 18:58




Do you really want a pattern-based match - or just the last four characters of the first two whitespace-separated fields?
– steeldriver
Jul 12 at 18:58












looks like column 2 is dropped (it has an "F4", not an F2 or F6)
– Jeff Schaller
Jul 12 at 19:16




looks like column 2 is dropped (it has an "F4", not an F2 or F6)
– Jeff Schaller
Jul 12 at 19:16




2




2




note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
– Jeff Schaller
Jul 12 at 19:20




note also that field 3 in both cases has two "F6"'s -- you want only the last F6?
– Jeff Schaller
Jul 12 at 19:20












@steeldriver it is pattern based. The F# markers could be anywhere on the string.
– Allan GItobu
Jul 12 at 20:17




@steeldriver it is pattern based. The F# markers could be anywhere on the string.
– Allan GItobu
Jul 12 at 20:17












I have updated the text. I did not notice I had F6 twice on the second line
– Allan GItobu
Jul 12 at 20:18




I have updated the text. I did not notice I had F6 twice on the second line
– Allan GItobu
Jul 12 at 20:18










3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted










The following awk script is an approximation of what I think your requirement is:




for(i=1;i<=NF;i++)
if (match($i, "F2....$") > 0)
printf "%s ", substr($i, RSTART + 2, 4);

if (match($i, "F6....$") > 0)
printf "% s", substr($i, RSTART + 2, 4);


print ""



It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.



The output, based on your input, is:



A309 A314

B309 B315



An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:




for(i=1;i<=NF;i++)
if (match($i, "F2....") > 0)
printf "%s ", substr($i, RSTART + 2, 4);

if (match($i, "F6....") > 0)
printf "% s", substr($i, RSTART + 2, 4);


print ""






share|improve this answer























  • Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
    – Jeff Schaller
    Jul 12 at 23:25










  • Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
    – Allan GItobu
    Jul 13 at 17:32











  • You'd call that awk script with a parameter or with redirected input: ./test_awk2.txt < input-file or as ./test_awk2.txt input-file.
    – Jeff Schaller
    Jul 14 at 1:01

















up vote
1
down vote













With Perl, using a lookbehind for the anchor characters:



$ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
A309 A313

B309 B313





share|improve this answer




























    up vote
    0
    down vote













    How about this:



    echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2





    share|improve this answer




























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote



      accepted










      The following awk script is an approximation of what I think your requirement is:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....$") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....$") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""



      It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.



      The output, based on your input, is:



      A309 A314

      B309 B315



      An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""






      share|improve this answer























      • Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
        – Jeff Schaller
        Jul 12 at 23:25










      • Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
        – Allan GItobu
        Jul 13 at 17:32











      • You'd call that awk script with a parameter or with redirected input: ./test_awk2.txt < input-file or as ./test_awk2.txt input-file.
        – Jeff Schaller
        Jul 14 at 1:01














      up vote
      1
      down vote



      accepted










      The following awk script is an approximation of what I think your requirement is:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....$") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....$") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""



      It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.



      The output, based on your input, is:



      A309 A314

      B309 B315



      An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""






      share|improve this answer























      • Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
        – Jeff Schaller
        Jul 12 at 23:25










      • Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
        – Allan GItobu
        Jul 13 at 17:32











      • You'd call that awk script with a parameter or with redirected input: ./test_awk2.txt < input-file or as ./test_awk2.txt input-file.
        – Jeff Schaller
        Jul 14 at 1:01












      up vote
      1
      down vote



      accepted







      up vote
      1
      down vote



      accepted






      The following awk script is an approximation of what I think your requirement is:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....$") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....$") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""



      It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.



      The output, based on your input, is:



      A309 A314

      B309 B315



      An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""






      share|improve this answer















      The following awk script is an approximation of what I think your requirement is:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....$") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....$") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""



      It loops through each line, then loops through each field of that line. For each element, if the tail end of the element has "F2" followed by 4 characters, then print those 4 characters followed by a space. Once it's done looping over a line, print a carriage return.



      The output, based on your input, is:



      A309 A314

      B309 B315



      An updated version of the awk script, to handle the elements existing anywhere within their field, only needs the $ anchoring removed:




      for(i=1;i<=NF;i++)
      if (match($i, "F2....") > 0)
      printf "%s ", substr($i, RSTART + 2, 4);

      if (match($i, "F6....") > 0)
      printf "% s", substr($i, RSTART + 2, 4);


      print ""







      share|improve this answer















      share|improve this answer



      share|improve this answer








      edited Jul 14 at 1:03


























      answered Jul 12 at 19:32









      Jeff Schaller

      30.8k846104




      30.8k846104











      • Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
        – Jeff Schaller
        Jul 12 at 23:25










      • Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
        – Allan GItobu
        Jul 13 at 17:32











      • You'd call that awk script with a parameter or with redirected input: ./test_awk2.txt < input-file or as ./test_awk2.txt input-file.
        – Jeff Schaller
        Jul 14 at 1:01
















      • Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
        – Jeff Schaller
        Jul 12 at 23:25










      • Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
        – Allan GItobu
        Jul 13 at 17:32











      • You'd call that awk script with a parameter or with redirected input: ./test_awk2.txt < input-file or as ./test_awk2.txt input-file.
        – Jeff Schaller
        Jul 14 at 1:01















      Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
      – Jeff Schaller
      Jul 12 at 23:25




      Please note that the first version of this answer aligned with the first version of the question / input. It requires the F2/F6 text to be at the end of a field. The new version of the question/input has the text in the middle of a field, which would require at least the $ anchor to be removed.
      – Jeff Schaller
      Jul 12 at 23:25












      Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
      – Allan GItobu
      Jul 13 at 17:32





      Thanks - but let me bug you a little more. Where does the file name go? In my case the file name is test_awk2.txt on the same folder as the script #!/bin/awk -f BEGIN print "test_awk2.txt" for(i=1;i<=NF;i++) if (match($i, "F2....$") > 0) printf "%s ", substr($i, RSTART + 2, 4); if (match($i, "F6....$") > 0) printf "% s", substr($i, RSTART + 2, 4); print ""
      – Allan GItobu
      Jul 13 at 17:32













      You'd call that awk script with a parameter or with redirected input: ./test_awk2.txt < input-file or as ./test_awk2.txt input-file.
      – Jeff Schaller
      Jul 14 at 1:01




      You'd call that awk script with a parameter or with redirected input: ./test_awk2.txt < input-file or as ./test_awk2.txt input-file.
      – Jeff Schaller
      Jul 14 at 1:01












      up vote
      1
      down vote













      With Perl, using a lookbehind for the anchor characters:



      $ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
      A309 A313

      B309 B313





      share|improve this answer

























        up vote
        1
        down vote













        With Perl, using a lookbehind for the anchor characters:



        $ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
        A309 A313

        B309 B313





        share|improve this answer























          up vote
          1
          down vote










          up vote
          1
          down vote









          With Perl, using a lookbehind for the anchor characters:



          $ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
          A309 A313

          B309 B313





          share|improve this answer













          With Perl, using a lookbehind for the anchor characters:



          $ perl -lne 'print join " ", /(?<=F2|F6)(.4)/g' file
          A309 A313

          B309 B313






          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Jul 12 at 22:18









          steeldriver

          30.9k34877




          30.9k34877




















              up vote
              0
              down vote













              How about this:



              echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2





              share|improve this answer

























                up vote
                0
                down vote













                How about this:



                echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2





                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  How about this:



                  echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2





                  share|improve this answer













                  How about this:



                  echo 'str' | egrep -o '(F2|F6)....' | egrep -o '....$' | xargs -n2






                  share|improve this answer













                  share|improve this answer



                  share|improve this answer











                  answered Jul 12 at 18:54









                  user48452

                  465




                  465