Calculating Average of a Column only if two Conditions from Separate Columns are Met using Awk [closed]

Clash Royale CLAN TAG#URR8PPP
I am analyzing my experimental data where I have column 2 = parent colony (there are 3 total colonies: A, C, and E) and column 4 = pH treatment. I am trying to calculate the average carbonate production from column 14 (ie. DaytimeDeltaMass) for each colony at each pH treatment using the awk function. Does anybody know how to write an awk function that requires the conditions of colony type and pH treatment to be met, and then calculating the average carbonate production from there (ie. average DaytimeDeltaMass for Colony C at pH 7.75, average DaytimeDeltaMass for Colony C at pH 7.85, etc)? Attached is a subset of the data.
Sample_ID Parent_Colony Tank pH_Treatment DaytimeDeltaMass
E12 E 2 7.75 11.97
E20 E 4 7.85 8.32
E17 E 2 7.75 19.25
A15 A 3 7.85 4.52
A29 A 2 7.75 14.26
A14 A 3 7.85 13.29
C14 C 6 8.05 48.23
C12 C 6 8.05 29.42
C42 C 2 7.75 13.42
awk
closed as unclear what you're asking by G-Man, RalfFriedl, Stephen Harris, Thomas, jimmij Dec 12 at 2:08
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
|
show 4 more comments
I am analyzing my experimental data where I have column 2 = parent colony (there are 3 total colonies: A, C, and E) and column 4 = pH treatment. I am trying to calculate the average carbonate production from column 14 (ie. DaytimeDeltaMass) for each colony at each pH treatment using the awk function. Does anybody know how to write an awk function that requires the conditions of colony type and pH treatment to be met, and then calculating the average carbonate production from there (ie. average DaytimeDeltaMass for Colony C at pH 7.75, average DaytimeDeltaMass for Colony C at pH 7.85, etc)? Attached is a subset of the data.
Sample_ID Parent_Colony Tank pH_Treatment DaytimeDeltaMass
E12 E 2 7.75 11.97
E20 E 4 7.85 8.32
E17 E 2 7.75 19.25
A15 A 3 7.85 4.52
A29 A 2 7.75 14.26
A14 A 3 7.85 13.29
C14 C 6 8.05 48.23
C12 C 6 8.05 29.42
C42 C 2 7.75 13.42
awk
closed as unclear what you're asking by G-Man, RalfFriedl, Stephen Harris, Thomas, jimmij Dec 12 at 2:08
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
some sample data (input & output) would go a long way towards getting a usable answer
– Jeff Schaller
Dec 10 at 19:17
The formatting came out terribly here, but number indicates the column number, I skipped adding columns 5-13 to avoid cluttering this question, and then column 14 is DaytimeDeltaMass. The sample data I included (my real sample size is 30 per parent colony) is supposed to indicate a value from each of the 5 columns I listed. My goal is to use an awk function to calculate the average DaytimeDeltaMass for each colony at each pH treatment (ie. average for colony E at pH 7.75, average for colony E at pH 7.85, etc.)
– John
Dec 10 at 19:57
@roaima they're new, you could always just be kind & give them a hand doing simple formatting improvements :-) @John any reason you specifyawk? Would you be happy with a solution usinggrepor other tools? Does this even have to run on a command line or have you tried software likeR?
– trs
Dec 10 at 20:10
Ok I fixed the formatting. Just keep in mind that there are columns 5-13 missing here because it isn't relevant to my question, so DaytimeDeltaMass is actually column 14 not column 5. Thank you very much!
– John
Dec 10 at 20:11
@trs thank you for the response. I've already done that using terminal and Rstudio, but all of this is for my final project in a class where I am supposed to analyze my data using different types of computing techniques (ie. Rstudio, python, textwrangler, etc.), so using the awk function to answer this is my final step. I've been stuck at this point for awhile unfortunately
– John
Dec 10 at 20:13
|
show 4 more comments
I am analyzing my experimental data where I have column 2 = parent colony (there are 3 total colonies: A, C, and E) and column 4 = pH treatment. I am trying to calculate the average carbonate production from column 14 (ie. DaytimeDeltaMass) for each colony at each pH treatment using the awk function. Does anybody know how to write an awk function that requires the conditions of colony type and pH treatment to be met, and then calculating the average carbonate production from there (ie. average DaytimeDeltaMass for Colony C at pH 7.75, average DaytimeDeltaMass for Colony C at pH 7.85, etc)? Attached is a subset of the data.
Sample_ID Parent_Colony Tank pH_Treatment DaytimeDeltaMass
E12 E 2 7.75 11.97
E20 E 4 7.85 8.32
E17 E 2 7.75 19.25
A15 A 3 7.85 4.52
A29 A 2 7.75 14.26
A14 A 3 7.85 13.29
C14 C 6 8.05 48.23
C12 C 6 8.05 29.42
C42 C 2 7.75 13.42
awk
I am analyzing my experimental data where I have column 2 = parent colony (there are 3 total colonies: A, C, and E) and column 4 = pH treatment. I am trying to calculate the average carbonate production from column 14 (ie. DaytimeDeltaMass) for each colony at each pH treatment using the awk function. Does anybody know how to write an awk function that requires the conditions of colony type and pH treatment to be met, and then calculating the average carbonate production from there (ie. average DaytimeDeltaMass for Colony C at pH 7.75, average DaytimeDeltaMass for Colony C at pH 7.85, etc)? Attached is a subset of the data.
Sample_ID Parent_Colony Tank pH_Treatment DaytimeDeltaMass
E12 E 2 7.75 11.97
E20 E 4 7.85 8.32
E17 E 2 7.75 19.25
A15 A 3 7.85 4.52
A29 A 2 7.75 14.26
A14 A 3 7.85 13.29
C14 C 6 8.05 48.23
C12 C 6 8.05 29.42
C42 C 2 7.75 13.42
awk
awk
edited Dec 10 at 20:55
asked Dec 10 at 18:37
John
11
11
closed as unclear what you're asking by G-Man, RalfFriedl, Stephen Harris, Thomas, jimmij Dec 12 at 2:08
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as unclear what you're asking by G-Man, RalfFriedl, Stephen Harris, Thomas, jimmij Dec 12 at 2:08
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
some sample data (input & output) would go a long way towards getting a usable answer
– Jeff Schaller
Dec 10 at 19:17
The formatting came out terribly here, but number indicates the column number, I skipped adding columns 5-13 to avoid cluttering this question, and then column 14 is DaytimeDeltaMass. The sample data I included (my real sample size is 30 per parent colony) is supposed to indicate a value from each of the 5 columns I listed. My goal is to use an awk function to calculate the average DaytimeDeltaMass for each colony at each pH treatment (ie. average for colony E at pH 7.75, average for colony E at pH 7.85, etc.)
– John
Dec 10 at 19:57
@roaima they're new, you could always just be kind & give them a hand doing simple formatting improvements :-) @John any reason you specifyawk? Would you be happy with a solution usinggrepor other tools? Does this even have to run on a command line or have you tried software likeR?
– trs
Dec 10 at 20:10
Ok I fixed the formatting. Just keep in mind that there are columns 5-13 missing here because it isn't relevant to my question, so DaytimeDeltaMass is actually column 14 not column 5. Thank you very much!
– John
Dec 10 at 20:11
@trs thank you for the response. I've already done that using terminal and Rstudio, but all of this is for my final project in a class where I am supposed to analyze my data using different types of computing techniques (ie. Rstudio, python, textwrangler, etc.), so using the awk function to answer this is my final step. I've been stuck at this point for awhile unfortunately
– John
Dec 10 at 20:13
|
show 4 more comments
some sample data (input & output) would go a long way towards getting a usable answer
– Jeff Schaller
Dec 10 at 19:17
The formatting came out terribly here, but number indicates the column number, I skipped adding columns 5-13 to avoid cluttering this question, and then column 14 is DaytimeDeltaMass. The sample data I included (my real sample size is 30 per parent colony) is supposed to indicate a value from each of the 5 columns I listed. My goal is to use an awk function to calculate the average DaytimeDeltaMass for each colony at each pH treatment (ie. average for colony E at pH 7.75, average for colony E at pH 7.85, etc.)
– John
Dec 10 at 19:57
@roaima they're new, you could always just be kind & give them a hand doing simple formatting improvements :-) @John any reason you specifyawk? Would you be happy with a solution usinggrepor other tools? Does this even have to run on a command line or have you tried software likeR?
– trs
Dec 10 at 20:10
Ok I fixed the formatting. Just keep in mind that there are columns 5-13 missing here because it isn't relevant to my question, so DaytimeDeltaMass is actually column 14 not column 5. Thank you very much!
– John
Dec 10 at 20:11
@trs thank you for the response. I've already done that using terminal and Rstudio, but all of this is for my final project in a class where I am supposed to analyze my data using different types of computing techniques (ie. Rstudio, python, textwrangler, etc.), so using the awk function to answer this is my final step. I've been stuck at this point for awhile unfortunately
– John
Dec 10 at 20:13
some sample data (input & output) would go a long way towards getting a usable answer
– Jeff Schaller
Dec 10 at 19:17
some sample data (input & output) would go a long way towards getting a usable answer
– Jeff Schaller
Dec 10 at 19:17
The formatting came out terribly here, but number indicates the column number, I skipped adding columns 5-13 to avoid cluttering this question, and then column 14 is DaytimeDeltaMass. The sample data I included (my real sample size is 30 per parent colony) is supposed to indicate a value from each of the 5 columns I listed. My goal is to use an awk function to calculate the average DaytimeDeltaMass for each colony at each pH treatment (ie. average for colony E at pH 7.75, average for colony E at pH 7.85, etc.)
– John
Dec 10 at 19:57
The formatting came out terribly here, but number indicates the column number, I skipped adding columns 5-13 to avoid cluttering this question, and then column 14 is DaytimeDeltaMass. The sample data I included (my real sample size is 30 per parent colony) is supposed to indicate a value from each of the 5 columns I listed. My goal is to use an awk function to calculate the average DaytimeDeltaMass for each colony at each pH treatment (ie. average for colony E at pH 7.75, average for colony E at pH 7.85, etc.)
– John
Dec 10 at 19:57
@roaima they're new, you could always just be kind & give them a hand doing simple formatting improvements :-) @John any reason you specify
awk? Would you be happy with a solution using grep or other tools? Does this even have to run on a command line or have you tried software like R ?– trs
Dec 10 at 20:10
@roaima they're new, you could always just be kind & give them a hand doing simple formatting improvements :-) @John any reason you specify
awk? Would you be happy with a solution using grep or other tools? Does this even have to run on a command line or have you tried software like R ?– trs
Dec 10 at 20:10
Ok I fixed the formatting. Just keep in mind that there are columns 5-13 missing here because it isn't relevant to my question, so DaytimeDeltaMass is actually column 14 not column 5. Thank you very much!
– John
Dec 10 at 20:11
Ok I fixed the formatting. Just keep in mind that there are columns 5-13 missing here because it isn't relevant to my question, so DaytimeDeltaMass is actually column 14 not column 5. Thank you very much!
– John
Dec 10 at 20:11
@trs thank you for the response. I've already done that using terminal and Rstudio, but all of this is for my final project in a class where I am supposed to analyze my data using different types of computing techniques (ie. Rstudio, python, textwrangler, etc.), so using the awk function to answer this is my final step. I've been stuck at this point for awhile unfortunately
– John
Dec 10 at 20:13
@trs thank you for the response. I've already done that using terminal and Rstudio, but all of this is for my final project in a class where I am supposed to analyze my data using different types of computing techniques (ie. Rstudio, python, textwrangler, etc.), so using the awk function to answer this is my final step. I've been stuck at this point for awhile unfortunately
– John
Dec 10 at 20:13
|
show 4 more comments
2 Answers
2
active
oldest
votes
This has been answered on SO. https://stackoverflow.com/a/29911907/2442434
Change $2 == 0 and $3 to match your condition / columns.
awk '$2 == 0 total += $3; count++; END print total/count; ' CLN_Tapes_LON
Also check out the more detailed answer (not copied) https://stackoverflow.com/a/29914126/2442434
I believe that this is not a correct answer to the question. However, since the question is unclear, I might be misunderstanding it and you might be right. So let's wait and see what the OP says.
– G-Man
Dec 10 at 20:36
This doesn't fully answer my question because I need two conditions to be met (ie. Parent Colony and pH Treatment). I am trying to use the awk function to calculate the average DaytimeDeltaMass for each parent colony at each pH treatment
– John
Dec 10 at 20:40
@trs if I want to rework that code to account for 2 conditions, does it make sense to start the code with: awk '$2 == E && $4==7.75
– John
Dec 10 at 21:29
The code provided was a direct quote from the other question. I believe SO / SE do not provide a coding service but help point to the right solution, so I did not see the need to change column numbers or number of conditions to match the example given in the question. You may want to add double quotes to "E" then it should work would be my guess.
– trs
Dec 11 at 20:37
add a comment |
You might want to try
awk '
NR > 2 TOT[$2,$4] += $5 # sum values into array indexed by both columns'' contents
CNT[$2,$4]++ # count occurrences
END for (t in TOT) split (t, C, SUBSEP) # split indices into colonies and pH''es
print C[1], C[2], TOT[t]/CNT[t] # print colony, pH, average
' OFS="t" file | sort # define output filed sparator and input file
A 7.75 14.26
A 7.85 8.905
C 7.75 13.42
C 8.05 38.825
E 7.75 15.61
E 7.85 8.32
that code was excellent. Worked like a charm!
– John
Dec 11 at 14:59
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This has been answered on SO. https://stackoverflow.com/a/29911907/2442434
Change $2 == 0 and $3 to match your condition / columns.
awk '$2 == 0 total += $3; count++; END print total/count; ' CLN_Tapes_LON
Also check out the more detailed answer (not copied) https://stackoverflow.com/a/29914126/2442434
I believe that this is not a correct answer to the question. However, since the question is unclear, I might be misunderstanding it and you might be right. So let's wait and see what the OP says.
– G-Man
Dec 10 at 20:36
This doesn't fully answer my question because I need two conditions to be met (ie. Parent Colony and pH Treatment). I am trying to use the awk function to calculate the average DaytimeDeltaMass for each parent colony at each pH treatment
– John
Dec 10 at 20:40
@trs if I want to rework that code to account for 2 conditions, does it make sense to start the code with: awk '$2 == E && $4==7.75
– John
Dec 10 at 21:29
The code provided was a direct quote from the other question. I believe SO / SE do not provide a coding service but help point to the right solution, so I did not see the need to change column numbers or number of conditions to match the example given in the question. You may want to add double quotes to "E" then it should work would be my guess.
– trs
Dec 11 at 20:37
add a comment |
This has been answered on SO. https://stackoverflow.com/a/29911907/2442434
Change $2 == 0 and $3 to match your condition / columns.
awk '$2 == 0 total += $3; count++; END print total/count; ' CLN_Tapes_LON
Also check out the more detailed answer (not copied) https://stackoverflow.com/a/29914126/2442434
I believe that this is not a correct answer to the question. However, since the question is unclear, I might be misunderstanding it and you might be right. So let's wait and see what the OP says.
– G-Man
Dec 10 at 20:36
This doesn't fully answer my question because I need two conditions to be met (ie. Parent Colony and pH Treatment). I am trying to use the awk function to calculate the average DaytimeDeltaMass for each parent colony at each pH treatment
– John
Dec 10 at 20:40
@trs if I want to rework that code to account for 2 conditions, does it make sense to start the code with: awk '$2 == E && $4==7.75
– John
Dec 10 at 21:29
The code provided was a direct quote from the other question. I believe SO / SE do not provide a coding service but help point to the right solution, so I did not see the need to change column numbers or number of conditions to match the example given in the question. You may want to add double quotes to "E" then it should work would be my guess.
– trs
Dec 11 at 20:37
add a comment |
This has been answered on SO. https://stackoverflow.com/a/29911907/2442434
Change $2 == 0 and $3 to match your condition / columns.
awk '$2 == 0 total += $3; count++; END print total/count; ' CLN_Tapes_LON
Also check out the more detailed answer (not copied) https://stackoverflow.com/a/29914126/2442434
This has been answered on SO. https://stackoverflow.com/a/29911907/2442434
Change $2 == 0 and $3 to match your condition / columns.
awk '$2 == 0 total += $3; count++; END print total/count; ' CLN_Tapes_LON
Also check out the more detailed answer (not copied) https://stackoverflow.com/a/29914126/2442434
answered Dec 10 at 20:22
trs
1012
1012
I believe that this is not a correct answer to the question. However, since the question is unclear, I might be misunderstanding it and you might be right. So let's wait and see what the OP says.
– G-Man
Dec 10 at 20:36
This doesn't fully answer my question because I need two conditions to be met (ie. Parent Colony and pH Treatment). I am trying to use the awk function to calculate the average DaytimeDeltaMass for each parent colony at each pH treatment
– John
Dec 10 at 20:40
@trs if I want to rework that code to account for 2 conditions, does it make sense to start the code with: awk '$2 == E && $4==7.75
– John
Dec 10 at 21:29
The code provided was a direct quote from the other question. I believe SO / SE do not provide a coding service but help point to the right solution, so I did not see the need to change column numbers or number of conditions to match the example given in the question. You may want to add double quotes to "E" then it should work would be my guess.
– trs
Dec 11 at 20:37
add a comment |
I believe that this is not a correct answer to the question. However, since the question is unclear, I might be misunderstanding it and you might be right. So let's wait and see what the OP says.
– G-Man
Dec 10 at 20:36
This doesn't fully answer my question because I need two conditions to be met (ie. Parent Colony and pH Treatment). I am trying to use the awk function to calculate the average DaytimeDeltaMass for each parent colony at each pH treatment
– John
Dec 10 at 20:40
@trs if I want to rework that code to account for 2 conditions, does it make sense to start the code with: awk '$2 == E && $4==7.75
– John
Dec 10 at 21:29
The code provided was a direct quote from the other question. I believe SO / SE do not provide a coding service but help point to the right solution, so I did not see the need to change column numbers or number of conditions to match the example given in the question. You may want to add double quotes to "E" then it should work would be my guess.
– trs
Dec 11 at 20:37
I believe that this is not a correct answer to the question. However, since the question is unclear, I might be misunderstanding it and you might be right. So let's wait and see what the OP says.
– G-Man
Dec 10 at 20:36
I believe that this is not a correct answer to the question. However, since the question is unclear, I might be misunderstanding it and you might be right. So let's wait and see what the OP says.
– G-Man
Dec 10 at 20:36
This doesn't fully answer my question because I need two conditions to be met (ie. Parent Colony and pH Treatment). I am trying to use the awk function to calculate the average DaytimeDeltaMass for each parent colony at each pH treatment
– John
Dec 10 at 20:40
This doesn't fully answer my question because I need two conditions to be met (ie. Parent Colony and pH Treatment). I am trying to use the awk function to calculate the average DaytimeDeltaMass for each parent colony at each pH treatment
– John
Dec 10 at 20:40
@trs if I want to rework that code to account for 2 conditions, does it make sense to start the code with: awk '$2 == E && $4==7.75
– John
Dec 10 at 21:29
@trs if I want to rework that code to account for 2 conditions, does it make sense to start the code with: awk '$2 == E && $4==7.75
– John
Dec 10 at 21:29
The code provided was a direct quote from the other question. I believe SO / SE do not provide a coding service but help point to the right solution, so I did not see the need to change column numbers or number of conditions to match the example given in the question. You may want to add double quotes to "E" then it should work would be my guess.
– trs
Dec 11 at 20:37
The code provided was a direct quote from the other question. I believe SO / SE do not provide a coding service but help point to the right solution, so I did not see the need to change column numbers or number of conditions to match the example given in the question. You may want to add double quotes to "E" then it should work would be my guess.
– trs
Dec 11 at 20:37
add a comment |
You might want to try
awk '
NR > 2 TOT[$2,$4] += $5 # sum values into array indexed by both columns'' contents
CNT[$2,$4]++ # count occurrences
END for (t in TOT) split (t, C, SUBSEP) # split indices into colonies and pH''es
print C[1], C[2], TOT[t]/CNT[t] # print colony, pH, average
' OFS="t" file | sort # define output filed sparator and input file
A 7.75 14.26
A 7.85 8.905
C 7.75 13.42
C 8.05 38.825
E 7.75 15.61
E 7.85 8.32
that code was excellent. Worked like a charm!
– John
Dec 11 at 14:59
add a comment |
You might want to try
awk '
NR > 2 TOT[$2,$4] += $5 # sum values into array indexed by both columns'' contents
CNT[$2,$4]++ # count occurrences
END for (t in TOT) split (t, C, SUBSEP) # split indices into colonies and pH''es
print C[1], C[2], TOT[t]/CNT[t] # print colony, pH, average
' OFS="t" file | sort # define output filed sparator and input file
A 7.75 14.26
A 7.85 8.905
C 7.75 13.42
C 8.05 38.825
E 7.75 15.61
E 7.85 8.32
that code was excellent. Worked like a charm!
– John
Dec 11 at 14:59
add a comment |
You might want to try
awk '
NR > 2 TOT[$2,$4] += $5 # sum values into array indexed by both columns'' contents
CNT[$2,$4]++ # count occurrences
END for (t in TOT) split (t, C, SUBSEP) # split indices into colonies and pH''es
print C[1], C[2], TOT[t]/CNT[t] # print colony, pH, average
' OFS="t" file | sort # define output filed sparator and input file
A 7.75 14.26
A 7.85 8.905
C 7.75 13.42
C 8.05 38.825
E 7.75 15.61
E 7.85 8.32
You might want to try
awk '
NR > 2 TOT[$2,$4] += $5 # sum values into array indexed by both columns'' contents
CNT[$2,$4]++ # count occurrences
END for (t in TOT) split (t, C, SUBSEP) # split indices into colonies and pH''es
print C[1], C[2], TOT[t]/CNT[t] # print colony, pH, average
' OFS="t" file | sort # define output filed sparator and input file
A 7.75 14.26
A 7.85 8.905
C 7.75 13.42
C 8.05 38.825
E 7.75 15.61
E 7.85 8.32
answered Dec 10 at 22:45
RudiC
3,9541312
3,9541312
that code was excellent. Worked like a charm!
– John
Dec 11 at 14:59
add a comment |
that code was excellent. Worked like a charm!
– John
Dec 11 at 14:59
that code was excellent. Worked like a charm!
– John
Dec 11 at 14:59
that code was excellent. Worked like a charm!
– John
Dec 11 at 14:59
add a comment |
some sample data (input & output) would go a long way towards getting a usable answer
– Jeff Schaller
Dec 10 at 19:17
The formatting came out terribly here, but number indicates the column number, I skipped adding columns 5-13 to avoid cluttering this question, and then column 14 is DaytimeDeltaMass. The sample data I included (my real sample size is 30 per parent colony) is supposed to indicate a value from each of the 5 columns I listed. My goal is to use an awk function to calculate the average DaytimeDeltaMass for each colony at each pH treatment (ie. average for colony E at pH 7.75, average for colony E at pH 7.85, etc.)
– John
Dec 10 at 19:57
@roaima they're new, you could always just be kind & give them a hand doing simple formatting improvements :-) @John any reason you specify
awk? Would you be happy with a solution usinggrepor other tools? Does this even have to run on a command line or have you tried software likeR?– trs
Dec 10 at 20:10
Ok I fixed the formatting. Just keep in mind that there are columns 5-13 missing here because it isn't relevant to my question, so DaytimeDeltaMass is actually column 14 not column 5. Thank you very much!
– John
Dec 10 at 20:11
@trs thank you for the response. I've already done that using terminal and Rstudio, but all of this is for my final project in a class where I am supposed to analyze my data using different types of computing techniques (ie. Rstudio, python, textwrangler, etc.), so using the awk function to answer this is my final step. I've been stuck at this point for awhile unfortunately
– John
Dec 10 at 20:13