Awk: Count occurrences of a string in one column, between a range of lines starting 2 lines below pattern 1 and ending with a condition

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












2















I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.



It looks something like this:



 >>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000


Goals



  1. Begin 3 lines below the pattern /m.o./ (where $1=1).

  2. Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).

  3. This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.

The pattern /====/ is not possible to use as it appears earlier in the document.



  • The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.


Attempt



I've searched other answers on the web, which have provided me parts of the code to use. Examples:




  • define my starting line as pattern plus 3 (source):



    awk '/m.o./n=NR+3n



  • in between the starting and final line, count how many times $2 != "1" (source)



    awk '$2!="1"++count



  • define my final line as follows:



    awk '{if ($3 > 0)print count; exit


but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.



I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.



Thank you.










share|improve this question
























  • Moved my deleted answer to a comment, since you've decided to answer the question on your own - awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file

    – Inian
    Jan 11 at 9:01















2















I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.



It looks something like this:



 >>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000


Goals



  1. Begin 3 lines below the pattern /m.o./ (where $1=1).

  2. Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).

  3. This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.

The pattern /====/ is not possible to use as it appears earlier in the document.



  • The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.


Attempt



I've searched other answers on the web, which have provided me parts of the code to use. Examples:




  • define my starting line as pattern plus 3 (source):



    awk '/m.o./n=NR+3n



  • in between the starting and final line, count how many times $2 != "1" (source)



    awk '$2!="1"++count



  • define my final line as follows:



    awk '{if ($3 > 0)print count; exit


but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.



I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.



Thank you.










share|improve this question
























  • Moved my deleted answer to a comment, since you've decided to answer the question on your own - awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file

    – Inian
    Jan 11 at 9:01













2












2








2








I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.



It looks something like this:



 >>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000


Goals



  1. Begin 3 lines below the pattern /m.o./ (where $1=1).

  2. Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).

  3. This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.

The pattern /====/ is not possible to use as it appears earlier in the document.



  • The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.


Attempt



I've searched other answers on the web, which have provided me parts of the code to use. Examples:




  • define my starting line as pattern plus 3 (source):



    awk '/m.o./n=NR+3n



  • in between the starting and final line, count how many times $2 != "1" (source)



    awk '$2!="1"++count



  • define my final line as follows:



    awk '{if ($3 > 0)print count; exit


but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.



I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.



Thank you.










share|improve this question
















I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.



It looks something like this:



 >>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000


Goals



  1. Begin 3 lines below the pattern /m.o./ (where $1=1).

  2. Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).

  3. This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.

The pattern /====/ is not possible to use as it appears earlier in the document.



  • The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.


Attempt



I've searched other answers on the web, which have provided me parts of the code to use. Examples:




  • define my starting line as pattern plus 3 (source):



    awk '/m.o./n=NR+3n



  • in between the starting and final line, count how many times $2 != "1" (source)



    awk '$2!="1"++count



  • define my final line as follows:



    awk '{if ($3 > 0)print count; exit


but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.



I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.



Thank you.







bash shell-script awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 9 at 18:22







Blaise

















asked Jan 9 at 16:34









BlaiseBlaise

1134




1134












  • Moved my deleted answer to a comment, since you've decided to answer the question on your own - awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file

    – Inian
    Jan 11 at 9:01

















  • Moved my deleted answer to a comment, since you've decided to answer the question on your own - awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file

    – Inian
    Jan 11 at 9:01
















Moved my deleted answer to a comment, since you've decided to answer the question on your own - awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file

– Inian
Jan 11 at 9:01





Moved my deleted answer to a comment, since you've decided to answer the question on your own - awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file

– Inian
Jan 11 at 9:01










3 Answers
3






active

oldest

votes


















0














There's many ways to do this but the simplest to understand might be as follows:



You can either create a complex conditional to select rows to count:



awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total ' 


Or you can put the conditional inside the code block:



awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total ' 





share|improve this answer























  • Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!

    – Blaise
    Jan 9 at 17:27



















0














You can try this awk :



awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile





share|improve this answer
































    0














    whew I finally figured it out with the following line:



     awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input


    The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.



    For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.



    This is what finally worked for me.



     $1 ~ /m.o./ n=NR+3n


    this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.



    At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.



     && $3+0 > 0 n=0


    This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).



    The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.



    Finally, I have a way to bind the starting and ending lines.



    I can now apply my desired function within that range, which is to count lines according to a condition.



     if ( n != 0 && $2 != "1" && $3+0 < 0) count++; 


    I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.



     END print count ' input


    At the end of the script, it prints the summed up variable count for my input file.



    Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.






    share|improve this answer






















      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f493500%2fawk-count-occurrences-of-a-string-in-one-column-between-a-range-of-lines-start%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      There's many ways to do this but the simplest to understand might be as follows:



      You can either create a complex conditional to select rows to count:



      awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total ' 


      Or you can put the conditional inside the code block:



      awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total ' 





      share|improve this answer























      • Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!

        – Blaise
        Jan 9 at 17:27
















      0














      There's many ways to do this but the simplest to understand might be as follows:



      You can either create a complex conditional to select rows to count:



      awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total ' 


      Or you can put the conditional inside the code block:



      awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total ' 





      share|improve this answer























      • Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!

        – Blaise
        Jan 9 at 17:27














      0












      0








      0







      There's many ways to do this but the simplest to understand might be as follows:



      You can either create a complex conditional to select rows to count:



      awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total ' 


      Or you can put the conditional inside the code block:



      awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total ' 





      share|improve this answer













      There's many ways to do this but the simplest to understand might be as follows:



      You can either create a complex conditional to select rows to count:



      awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total ' 


      Or you can put the conditional inside the code block:



      awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total ' 






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jan 9 at 16:54









      coulingcouling

      435311




      435311












      • Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!

        – Blaise
        Jan 9 at 17:27


















      • Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!

        – Blaise
        Jan 9 at 17:27

















      Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!

      – Blaise
      Jan 9 at 17:27






      Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!

      – Blaise
      Jan 9 at 17:27














      0














      You can try this awk :



      awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile





      share|improve this answer





























        0














        You can try this awk :



        awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile





        share|improve this answer



























          0












          0








          0







          You can try this awk :



          awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile





          share|improve this answer















          You can try this awk :



          awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 9 at 19:44

























          answered Jan 9 at 18:39









          ctac_ctac_

          1,3891210




          1,3891210





















              0














              whew I finally figured it out with the following line:



               awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input


              The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.



              For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.



              This is what finally worked for me.



               $1 ~ /m.o./ n=NR+3n


              this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.



              At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.



               && $3+0 > 0 n=0


              This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).



              The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.



              Finally, I have a way to bind the starting and ending lines.



              I can now apply my desired function within that range, which is to count lines according to a condition.



               if ( n != 0 && $2 != "1" && $3+0 < 0) count++; 


              I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.



               END print count ' input


              At the end of the script, it prints the summed up variable count for my input file.



              Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.






              share|improve this answer



























                0














                whew I finally figured it out with the following line:



                 awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input


                The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.



                For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.



                This is what finally worked for me.



                 $1 ~ /m.o./ n=NR+3n


                this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.



                At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.



                 && $3+0 > 0 n=0


                This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).



                The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.



                Finally, I have a way to bind the starting and ending lines.



                I can now apply my desired function within that range, which is to count lines according to a condition.



                 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; 


                I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.



                 END print count ' input


                At the end of the script, it prints the summed up variable count for my input file.



                Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.






                share|improve this answer

























                  0












                  0








                  0







                  whew I finally figured it out with the following line:



                   awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input


                  The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.



                  For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.



                  This is what finally worked for me.



                   $1 ~ /m.o./ n=NR+3n


                  this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.



                  At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.



                   && $3+0 > 0 n=0


                  This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).



                  The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.



                  Finally, I have a way to bind the starting and ending lines.



                  I can now apply my desired function within that range, which is to count lines according to a condition.



                   if ( n != 0 && $2 != "1" && $3+0 < 0) count++; 


                  I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.



                   END print count ' input


                  At the end of the script, it prints the summed up variable count for my input file.



                  Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.






                  share|improve this answer













                  whew I finally figured it out with the following line:



                   awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input


                  The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.



                  For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.



                  This is what finally worked for me.



                   $1 ~ /m.o./ n=NR+3n


                  this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.



                  At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.



                   && $3+0 > 0 n=0


                  This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).



                  The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.



                  Finally, I have a way to bind the starting and ending lines.



                  I can now apply my desired function within that range, which is to count lines according to a condition.



                   if ( n != 0 && $2 != "1" && $3+0 < 0) count++; 


                  I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.



                   END print count ' input


                  At the end of the script, it prints the summed up variable count for my input file.



                  Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 10 at 13:58









                  BlaiseBlaise

                  1134




                  1134



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f493500%2fawk-count-occurrences-of-a-string-in-one-column-between-a-range-of-lines-start%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown






                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      How many registers does an x86_64 CPU actually have?

                      Nur Jahan