Awk: Count occurrences of a string in one column, between a range of lines starting 2 lines below pattern 1 and ending with a condition
Clash Royale CLAN TAG#URR8PPP
I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.
It looks something like this:
>>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000
Goals
- Begin 3 lines below the pattern /m.o./ (where $1=1).
- Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).
- This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.
The pattern /====/ is not possible to use as it appears earlier in the document.
- The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.
Attempt
I've searched other answers on the web, which have provided me parts of the code to use. Examples:
define my starting line as pattern plus 3 (source):
awk '/m.o./n=NR+3n
in between the starting and final line, count how many times $2 != "1" (source)
awk '$2!="1"++count
define my final line as follows:
awk '{if ($3 > 0)print count; exit
but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.
I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.
Thank you.
bash shell-script awk
add a comment |
I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.
It looks something like this:
>>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000
Goals
- Begin 3 lines below the pattern /m.o./ (where $1=1).
- Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).
- This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.
The pattern /====/ is not possible to use as it appears earlier in the document.
- The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.
Attempt
I've searched other answers on the web, which have provided me parts of the code to use. Examples:
define my starting line as pattern plus 3 (source):
awk '/m.o./n=NR+3n
in between the starting and final line, count how many times $2 != "1" (source)
awk '$2!="1"++count
define my final line as follows:
awk '{if ($3 > 0)print count; exit
but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.
I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.
Thank you.
bash shell-script awk
Moved my deleted answer to a comment, since you've decided to answer the question on your own -awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file
– Inian
Jan 11 at 9:01
add a comment |
I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.
It looks something like this:
>>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000
Goals
- Begin 3 lines below the pattern /m.o./ (where $1=1).
- Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).
- This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.
The pattern /====/ is not possible to use as it appears earlier in the document.
- The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.
Attempt
I've searched other answers on the web, which have provided me parts of the code to use. Examples:
define my starting line as pattern plus 3 (source):
awk '/m.o./n=NR+3n
in between the starting and final line, count how many times $2 != "1" (source)
awk '$2!="1"++count
define my final line as follows:
awk '{if ($3 > 0)print count; exit
but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.
I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.
Thank you.
bash shell-script awk
I have an input file with several thousand lines. I am interested in one section of it, which contains the first instance of pattern /m.o./ in the file. I need to search for this pattern, run my code, then stop the code before any other patterns of m.o. or other lines.
It looks something like this:
>>>>> -0.2834320000 -0.9672660000 0.0000000000 6.0 C
m.o. irrep orbital orbital orbital
energy (a.u.) energy (e.v.) occupancy
========================================================
1 1 -20.63710689 -561.5697 2.0000
2 1 -20.58909944 -560.2634 2.0000
3 1 -11.45645851 -311.7491 2.0000
4 1 -11.29965696 -307.4823 2.0000
5 1 -11.29203148 -307.2748 2.0000
6 1 -1.44555716 -39.3360 2.0000
7 1 -1.35738379 -36.9367 2.0000
8 1 -1.07586111 -29.2760 2.0000
9 1 -0.91591305 -24.9235 2.0000
10 1 -0.75492584 -20.5428 2.0000
11 1 -0.71126523 -19.3547 2.0000
12 1 -0.70828880 -19.2737 2.0000
13 2 -0.62802299 -17.0895 2.0000
14 1 -0.61775719 -16.8102 2.0000
15 2 -0.50208166 -13.6625 2.0000
16 1 -0.49193707 -13.3864 2.0000
17 1 -0.43731872 -11.9002 2.0000
18 2 -0.43546575 -11.8497 2.0000
19 2 0.07335689 1.9962 0.0000
Goals
- Begin 3 lines below the pattern /m.o./ (where $1=1).
- Count how many times $2 is not equal to "1" (in other files, $2 can also be 3 or 4, so I need to count by $2 != 1).
- This counting must be in the range of lines where $3 is negative, ie. up until the second line from the bottom.
The pattern /====/ is not possible to use as it appears earlier in the document.
- The output should be 3. On the range of lines where $3 is negative, there are 3 lines where $2 is not equal to 1.
Attempt
I've searched other answers on the web, which have provided me parts of the code to use. Examples:
define my starting line as pattern plus 3 (source):
awk '/m.o./n=NR+3n
in between the starting and final line, count how many times $2 != "1" (source)
awk '$2!="1"++count
define my final line as follows:
awk '{if ($3 > 0)print count; exit
but I don't know how to put all of this together. Importantly, I must somehow avoid counting the extra 2 in $2 on the bottom line.
I'm of course open to rewriting the above code. I just wanted to supply examples for clarity.
Thank you.
bash shell-script awk
bash shell-script awk
edited Jan 9 at 18:22
Blaise
asked Jan 9 at 16:34
BlaiseBlaise
1134
1134
Moved my deleted answer to a comment, since you've decided to answer the question on your own -awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file
– Inian
Jan 11 at 9:01
add a comment |
Moved my deleted answer to a comment, since you've decided to answer the question on your own -awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file
– Inian
Jan 11 at 9:01
Moved my deleted answer to a comment, since you've decided to answer the question on your own -
awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file
– Inian
Jan 11 at 9:01
Moved my deleted answer to a comment, since you've decided to answer the question on your own -
awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file
– Inian
Jan 11 at 9:01
add a comment |
3 Answers
3
active
oldest
votes
There's many ways to do this but the simplest to understand might be as follows:
You can either create a complex conditional to select rows to count:
awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total '
Or you can put the conditional inside the code block:
awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total '
Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!
– Blaise
Jan 9 at 17:27
add a comment |
You can try this awk :
awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile
add a comment |
whew I finally figured it out with the following line:
awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input
The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.
For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.
This is what finally worked for me.
$1 ~ /m.o./ n=NR+3n
this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.
At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.
&& $3+0 > 0 n=0
This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).
The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.
Finally, I have a way to bind the starting and ending lines.
I can now apply my desired function within that range, which is to count lines according to a condition.
if ( n != 0 && $2 != "1" && $3+0 < 0) count++;
I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.
END print count ' input
At the end of the script, it prints the summed up variable count for my input file.
Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f493500%2fawk-count-occurrences-of-a-string-in-one-column-between-a-range-of-lines-start%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
There's many ways to do this but the simplest to understand might be as follows:
You can either create a complex conditional to select rows to count:
awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total '
Or you can put the conditional inside the code block:
awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total '
Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!
– Blaise
Jan 9 at 17:27
add a comment |
There's many ways to do this but the simplest to understand might be as follows:
You can either create a complex conditional to select rows to count:
awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total '
Or you can put the conditional inside the code block:
awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total '
Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!
– Blaise
Jan 9 at 17:27
add a comment |
There's many ways to do this but the simplest to understand might be as follows:
You can either create a complex conditional to select rows to count:
awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total '
Or you can put the conditional inside the code block:
awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total '
There's many ways to do this but the simplest to understand might be as follows:
You can either create a complex conditional to select rows to count:
awk 'BEGIN total=0 NR > 3 && $2 != 1 && $3 < 0 total++ END print total '
Or you can put the conditional inside the code block:
awk 'BEGIN total=0 NR > 3 if ( $2 != 1 && $3 < 0 ) total++ END print total '
answered Jan 9 at 16:54
coulingcouling
435311
435311
Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!
– Blaise
Jan 9 at 17:27
add a comment |
Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!
– Blaise
Jan 9 at 17:27
Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!
– Blaise
Jan 9 at 17:27
Hello, I think I wasn't clear in my post, sorry about this. Is there a way to define my starting line NR by the pattern /m.o./ + 3 lines? Then exit awk when $3>0? My input file is thousands of lines long, so if I don't isolate this section by the beginning and end, I will count other lines elsewhere in the file. For example, your first code returns 546 on my input file. Thank you!
– Blaise
Jan 9 at 17:27
add a comment |
You can try this awk :
awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile
add a comment |
You can try this awk :
awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile
add a comment |
You can try this awk :
awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile
You can try this awk :
awk '$1=="m.o."if(l)exit;l++;nextl&&l<3l++;nextlif($3<0&&$2!=1)c++ENDprint c' infile
edited Jan 9 at 19:44
answered Jan 9 at 18:39
ctac_ctac_
1,3891210
1,3891210
add a comment |
add a comment |
whew I finally figured it out with the following line:
awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input
The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.
For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.
This is what finally worked for me.
$1 ~ /m.o./ n=NR+3n
this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.
At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.
&& $3+0 > 0 n=0
This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).
The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.
Finally, I have a way to bind the starting and ending lines.
I can now apply my desired function within that range, which is to count lines according to a condition.
if ( n != 0 && $2 != "1" && $3+0 < 0) count++;
I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.
END print count ' input
At the end of the script, it prints the summed up variable count for my input file.
Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.
add a comment |
whew I finally figured it out with the following line:
awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input
The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.
For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.
This is what finally worked for me.
$1 ~ /m.o./ n=NR+3n
this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.
At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.
&& $3+0 > 0 n=0
This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).
The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.
Finally, I have a way to bind the starting and ending lines.
I can now apply my desired function within that range, which is to count lines according to a condition.
if ( n != 0 && $2 != "1" && $3+0 < 0) count++;
I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.
END print count ' input
At the end of the script, it prints the summed up variable count for my input file.
Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.
add a comment |
whew I finally figured it out with the following line:
awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input
The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.
For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.
This is what finally worked for me.
$1 ~ /m.o./ n=NR+3n
this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.
At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.
&& $3+0 > 0 n=0
This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).
The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.
Finally, I have a way to bind the starting and ending lines.
I can now apply my desired function within that range, which is to count lines according to a condition.
if ( n != 0 && $2 != "1" && $3+0 < 0) count++;
I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.
END print count ' input
At the end of the script, it prints the summed up variable count for my input file.
Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.
whew I finally figured it out with the following line:
awk '$1 ~ /m.o./ n=NR+3n && $3+0 > 0 n=0 if ( n != 0 && $2 != "1" && $3+0 < 0) count++; END print count ' input
The problem before is that each statement seemed to be acting on the entire document independently, so I couldn't enforce the condition to work only within a range of lines, which led it to counting many other lines that I didn't want to be counted. I kept getting values greater than the correct answer of 3.
For example, using flags -- which seemed to be a common solution on the web for this problem -- the flags didn't seem to activate at the appropriate lines, or the counting was happening outside of the line range permitted by the flags. It was counting lines that weren't even part of my pattern. Inian coded to exclude lines with the >>>> pattern (which were returning a count match for whatever reason), but there were other patterns being mismatched, and it wasn't reasonable to find them all with 20k lines in the document.
This is what finally worked for me.
$1 ~ /m.o./ n=NR+3n
this set the script to begin at the first instance where $1 contained "m.o.". I needed to specify $1 in order to avoid the second pattern occurrence of m.o. in the script. Fortunately the second instance was in $2, so I avoided it by matching only for $1. I don't know how to avoid it if both were in the same column.
At the point of match, n is defined as the line number (NR) plus 3 in the brackets, then recorded somehow by adding it again outside the bracket. In this way, I seem to be able to use awk to start at a pattern plus an arbitrary number of lines.
&& $3+0 > 0 n=0
This allows me to end the range of lines according to a variable condition rather than matching a pattern (many other solutions on the web match a defined string pattern using /pattern/ to define the end of the line range, which I could not figure out how to adapt here).
The && I believe maintains the pattern match from previously to bind the starting point, then for any point afterward in the document where $3 > 0 (my condition), n becomes zero.
Finally, I have a way to bind the starting and ending lines.
I can now apply my desired function within that range, which is to count lines according to a condition.
if ( n != 0 && $2 != "1" && $3+0 < 0) count++;
I remain within my line range by invoking the first term: If n is not zero, which is only the case between my pattern match and the conditions I set. Within this line range, the script pulls lines where $2 is not 1 and $3 is negative. It increases my count variable by 1 for each instance.
END print count ' input
At the end of the script, it prints the summed up variable count for my input file.
Hope that helps someone later eventually. Thanks to @Inian and the linked sources for the help getting here.
answered Jan 10 at 13:58
BlaiseBlaise
1134
1134
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f493500%2fawk-count-occurrences-of-a-string-in-one-column-between-a-range-of-lines-start%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Moved my deleted answer to a comment, since you've decided to answer the question on your own -
awk 'BEGIN flag = 1 flag && /m.o/ flag = 0; for (i=1; i<=3; i++) getline; $0 !~ />>>>>/ && $2 != "1" && $3+0 < 0 count++; END print count ' file
– Inian
Jan 11 at 9:01