To get average of columns for all the entries whose first column entries are duplicate

Clash Royale CLAN TAG#URR8PPP
I have a file which looks like this:
30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...
I want my output to have output like this:
30901 89
30902 51
30903 49.5
...
49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.
All I could do was to get number of unique entries usingcat <logfile> | awk 'print $1' | uniq -c
to get the count of duplicate entries based on first column (id)
text-processing awk
add a comment |
I have a file which looks like this:
30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...
I want my output to have output like this:
30901 89
30902 51
30903 49.5
...
49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.
All I could do was to get number of unique entries usingcat <logfile> | awk 'print $1' | uniq -c
to get the count of duplicate entries based on first column (id)
text-processing awk
1
Closely related: Awk to average column based on key in 2nd column
– steeldriver
Jan 30 at 15:25
add a comment |
I have a file which looks like this:
30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...
I want my output to have output like this:
30901 89
30902 51
30903 49.5
...
49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.
All I could do was to get number of unique entries usingcat <logfile> | awk 'print $1' | uniq -c
to get the count of duplicate entries based on first column (id)
text-processing awk
I have a file which looks like this:
30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...
I want my output to have output like this:
30901 89
30902 51
30903 49.5
...
49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.
All I could do was to get number of unique entries usingcat <logfile> | awk 'print $1' | uniq -c
to get the count of duplicate entries based on first column (id)
text-processing awk
text-processing awk
edited Jan 30 at 15:41
Archemar
20.2k93772
20.2k93772
asked Jan 30 at 15:17
Mr_SMr_S
31
31
1
Closely related: Awk to average column based on key in 2nd column
– steeldriver
Jan 30 at 15:25
add a comment |
1
Closely related: Awk to average column based on key in 2nd column
– steeldriver
Jan 30 at 15:25
1
1
Closely related: Awk to average column based on key in 2nd column
– steeldriver
Jan 30 at 15:25
Closely related: Awk to average column based on key in 2nd column
– steeldriver
Jan 30 at 15:25
add a comment |
2 Answers
2
active
oldest
votes
try
awk 'c[$1]++ ; t[$1]+=$4
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log
which give
30903 : 49.5
to get your output, remove if (c[cc]>1) part.
the script basically count c[$1]++ and add t[$1]+=$4 value.
c[x]is for count(x), that is$1: first fieldt[x]is for total(x), that is sum of total value for 4'th field
at the end, we loop over all values, computing and printing average.
More descriptive variable names would be helpful.
– glenn jackman
Jan 30 at 18:31
Thank you it works fine. This is exactly what I wanted.
– Mr_S
Feb 1 at 7:06
@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummycchas a longer name than thecandtglobals, though.
– Kaz
Feb 2 at 19:56
add a comment |
With Miller (http://johnkerl.org/miller/doc) you can run
mlr --nidx stats1 -a mean -f 4 -g 1 inputFile
and have
30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000
-f 4to set the Value-field names on which to compute statistics (the fourth field);-g 1to set group-by-field names (the first field).
Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.
– Mr_S
Feb 1 at 7:41
@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0
– aborruso
Feb 1 at 8:21
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497716%2fto-get-average-of-columns-for-all-the-entries-whose-first-column-entries-are-dup%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
try
awk 'c[$1]++ ; t[$1]+=$4
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log
which give
30903 : 49.5
to get your output, remove if (c[cc]>1) part.
the script basically count c[$1]++ and add t[$1]+=$4 value.
c[x]is for count(x), that is$1: first fieldt[x]is for total(x), that is sum of total value for 4'th field
at the end, we loop over all values, computing and printing average.
More descriptive variable names would be helpful.
– glenn jackman
Jan 30 at 18:31
Thank you it works fine. This is exactly what I wanted.
– Mr_S
Feb 1 at 7:06
@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummycchas a longer name than thecandtglobals, though.
– Kaz
Feb 2 at 19:56
add a comment |
try
awk 'c[$1]++ ; t[$1]+=$4
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log
which give
30903 : 49.5
to get your output, remove if (c[cc]>1) part.
the script basically count c[$1]++ and add t[$1]+=$4 value.
c[x]is for count(x), that is$1: first fieldt[x]is for total(x), that is sum of total value for 4'th field
at the end, we loop over all values, computing and printing average.
More descriptive variable names would be helpful.
– glenn jackman
Jan 30 at 18:31
Thank you it works fine. This is exactly what I wanted.
– Mr_S
Feb 1 at 7:06
@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummycchas a longer name than thecandtglobals, though.
– Kaz
Feb 2 at 19:56
add a comment |
try
awk 'c[$1]++ ; t[$1]+=$4
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log
which give
30903 : 49.5
to get your output, remove if (c[cc]>1) part.
the script basically count c[$1]++ and add t[$1]+=$4 value.
c[x]is for count(x), that is$1: first fieldt[x]is for total(x), that is sum of total value for 4'th field
at the end, we loop over all values, computing and printing average.
try
awk 'c[$1]++ ; t[$1]+=$4
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log
which give
30903 : 49.5
to get your output, remove if (c[cc]>1) part.
the script basically count c[$1]++ and add t[$1]+=$4 value.
c[x]is for count(x), that is$1: first fieldt[x]is for total(x), that is sum of total value for 4'th field
at the end, we loop over all values, computing and printing average.
edited Jan 30 at 20:18
answered Jan 30 at 15:40
ArchemarArchemar
20.2k93772
20.2k93772
More descriptive variable names would be helpful.
– glenn jackman
Jan 30 at 18:31
Thank you it works fine. This is exactly what I wanted.
– Mr_S
Feb 1 at 7:06
@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummycchas a longer name than thecandtglobals, though.
– Kaz
Feb 2 at 19:56
add a comment |
More descriptive variable names would be helpful.
– glenn jackman
Jan 30 at 18:31
Thank you it works fine. This is exactly what I wanted.
– Mr_S
Feb 1 at 7:06
@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummycchas a longer name than thecandtglobals, though.
– Kaz
Feb 2 at 19:56
More descriptive variable names would be helpful.
– glenn jackman
Jan 30 at 18:31
More descriptive variable names would be helpful.
– glenn jackman
Jan 30 at 18:31
Thank you it works fine. This is exactly what I wanted.
– Mr_S
Feb 1 at 7:06
Thank you it works fine. This is exactly what I wanted.
– Mr_S
Feb 1 at 7:06
@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy
cc has a longer name than the c and t globals, though.– Kaz
Feb 2 at 19:56
@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy
cc has a longer name than the c and t globals, though.– Kaz
Feb 2 at 19:56
add a comment |
With Miller (http://johnkerl.org/miller/doc) you can run
mlr --nidx stats1 -a mean -f 4 -g 1 inputFile
and have
30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000
-f 4to set the Value-field names on which to compute statistics (the fourth field);-g 1to set group-by-field names (the first field).
Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.
– Mr_S
Feb 1 at 7:41
@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0
– aborruso
Feb 1 at 8:21
add a comment |
With Miller (http://johnkerl.org/miller/doc) you can run
mlr --nidx stats1 -a mean -f 4 -g 1 inputFile
and have
30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000
-f 4to set the Value-field names on which to compute statistics (the fourth field);-g 1to set group-by-field names (the first field).
Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.
– Mr_S
Feb 1 at 7:41
@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0
– aborruso
Feb 1 at 8:21
add a comment |
With Miller (http://johnkerl.org/miller/doc) you can run
mlr --nidx stats1 -a mean -f 4 -g 1 inputFile
and have
30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000
-f 4to set the Value-field names on which to compute statistics (the fourth field);-g 1to set group-by-field names (the first field).
With Miller (http://johnkerl.org/miller/doc) you can run
mlr --nidx stats1 -a mean -f 4 -g 1 inputFile
and have
30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000
-f 4to set the Value-field names on which to compute statistics (the fourth field);-g 1to set group-by-field names (the first field).
answered Jan 31 at 9:07
aborrusoaborruso
22829
22829
Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.
– Mr_S
Feb 1 at 7:41
@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0
– aborruso
Feb 1 at 8:21
add a comment |
Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.
– Mr_S
Feb 1 at 7:41
@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0
– aborruso
Feb 1 at 8:21
Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.
– Mr_S
Feb 1 at 7:41
Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.
– Mr_S
Feb 1 at 7:41
@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0
– aborruso
Feb 1 at 8:21
@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0
– aborruso
Feb 1 at 8:21
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497716%2fto-get-average-of-columns-for-all-the-entries-whose-first-column-entries-are-dup%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Closely related: Awk to average column based on key in 2nd column
– steeldriver
Jan 30 at 15:25