To get average of columns for all the entries whose first column entries are duplicate

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












0















I have a file which looks like this:



30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...


I want my output to have output like this:



30901 89
30902 51
30903 49.5
...


49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.



All I could do was to get number of unique entries using
cat <logfile> | awk 'print $1' | uniq -c

to get the count of duplicate entries based on first column (id)










share|improve this question



















  • 1





    Closely related: Awk to average column based on key in 2nd column

    – steeldriver
    Jan 30 at 15:25















0















I have a file which looks like this:



30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...


I want my output to have output like this:



30901 89
30902 51
30903 49.5
...


49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.



All I could do was to get number of unique entries using
cat <logfile> | awk 'print $1' | uniq -c

to get the count of duplicate entries based on first column (id)










share|improve this question



















  • 1





    Closely related: Awk to average column based on key in 2nd column

    – steeldriver
    Jan 30 at 15:25













0












0








0








I have a file which looks like this:



30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...


I want my output to have output like this:



30901 89
30902 51
30903 49.5
...


49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.



All I could do was to get number of unique entries using
cat <logfile> | awk 'print $1' | uniq -c

to get the count of duplicate entries based on first column (id)










share|improve this question
















I have a file which looks like this:



30901 foo1 bar1 89
30902 foo2 bar2 51
30903 foo3 bar3 101
30903 foox bary 12
30903 fooz bara 23
30903 foob barc 62
30904 fooe barf 59
...
...


I want my output to have output like this:



30901 89
30902 51
30903 49.5
...


49.5 being the average of fourth column, for all four duplicate entries for id: 30903 in column 1.



All I could do was to get number of unique entries using
cat <logfile> | awk 'print $1' | uniq -c

to get the count of duplicate entries based on first column (id)







text-processing awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 30 at 15:41









Archemar

20.2k93772




20.2k93772










asked Jan 30 at 15:17









Mr_SMr_S

31




31







  • 1





    Closely related: Awk to average column based on key in 2nd column

    – steeldriver
    Jan 30 at 15:25












  • 1





    Closely related: Awk to average column based on key in 2nd column

    – steeldriver
    Jan 30 at 15:25







1




1





Closely related: Awk to average column based on key in 2nd column

– steeldriver
Jan 30 at 15:25





Closely related: Awk to average column based on key in 2nd column

– steeldriver
Jan 30 at 15:25










2 Answers
2






active

oldest

votes


















3














try



 awk 'c[$1]++ ; t[$1]+=$4 
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log


which give



30903 : 49.5


to get your output, remove if (c[cc]>1) part.



the script basically count c[$1]++ and add t[$1]+=$4 value.




  • c[x] is for count(x), that is $1 : first field


  • t[x] is for total(x), that is sum of total value for 4'th field

at the end, we loop over all values, computing and printing average.






share|improve this answer

























  • More descriptive variable names would be helpful.

    – glenn jackman
    Jan 30 at 18:31











  • Thank you it works fine. This is exactly what I wanted.

    – Mr_S
    Feb 1 at 7:06











  • @glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy cc has a longer name than the c and t globals, though.

    – Kaz
    Feb 2 at 19:56


















0














With Miller (http://johnkerl.org/miller/doc) you can run



mlr --nidx stats1 -a mean -f 4 -g 1 inputFile


and have



30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000



  • -f 4 to set the Value-field names on which to compute statistics (the fourth field);


  • -g 1 to set group-by-field names (the first field).





share|improve this answer























  • Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.

    – Mr_S
    Feb 1 at 7:41











  • @Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0

    – aborruso
    Feb 1 at 8:21










Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497716%2fto-get-average-of-columns-for-all-the-entries-whose-first-column-entries-are-dup%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














try



 awk 'c[$1]++ ; t[$1]+=$4 
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log


which give



30903 : 49.5


to get your output, remove if (c[cc]>1) part.



the script basically count c[$1]++ and add t[$1]+=$4 value.




  • c[x] is for count(x), that is $1 : first field


  • t[x] is for total(x), that is sum of total value for 4'th field

at the end, we loop over all values, computing and printing average.






share|improve this answer

























  • More descriptive variable names would be helpful.

    – glenn jackman
    Jan 30 at 18:31











  • Thank you it works fine. This is exactly what I wanted.

    – Mr_S
    Feb 1 at 7:06











  • @glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy cc has a longer name than the c and t globals, though.

    – Kaz
    Feb 2 at 19:56















3














try



 awk 'c[$1]++ ; t[$1]+=$4 
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log


which give



30903 : 49.5


to get your output, remove if (c[cc]>1) part.



the script basically count c[$1]++ and add t[$1]+=$4 value.




  • c[x] is for count(x), that is $1 : first field


  • t[x] is for total(x), that is sum of total value for 4'th field

at the end, we loop over all values, computing and printing average.






share|improve this answer

























  • More descriptive variable names would be helpful.

    – glenn jackman
    Jan 30 at 18:31











  • Thank you it works fine. This is exactly what I wanted.

    – Mr_S
    Feb 1 at 7:06











  • @glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy cc has a longer name than the c and t globals, though.

    – Kaz
    Feb 2 at 19:56













3












3








3







try



 awk 'c[$1]++ ; t[$1]+=$4 
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log


which give



30903 : 49.5


to get your output, remove if (c[cc]>1) part.



the script basically count c[$1]++ and add t[$1]+=$4 value.




  • c[x] is for count(x), that is $1 : first field


  • t[x] is for total(x), that is sum of total value for 4'th field

at the end, we loop over all values, computing and printing average.






share|improve this answer















try



 awk 'c[$1]++ ; t[$1]+=$4 
END for (cc in c ) if ( c[cc]>1 ) printf "%s : %.1fn",cc,t[cc]/c[cc];' file.log


which give



30903 : 49.5


to get your output, remove if (c[cc]>1) part.



the script basically count c[$1]++ and add t[$1]+=$4 value.




  • c[x] is for count(x), that is $1 : first field


  • t[x] is for total(x), that is sum of total value for 4'th field

at the end, we loop over all values, computing and printing average.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 30 at 20:18

























answered Jan 30 at 15:40









ArchemarArchemar

20.2k93772




20.2k93772












  • More descriptive variable names would be helpful.

    – glenn jackman
    Jan 30 at 18:31











  • Thank you it works fine. This is exactly what I wanted.

    – Mr_S
    Feb 1 at 7:06











  • @glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy cc has a longer name than the c and t globals, though.

    – Kaz
    Feb 2 at 19:56

















  • More descriptive variable names would be helpful.

    – glenn jackman
    Jan 30 at 18:31











  • Thank you it works fine. This is exactly what I wanted.

    – Mr_S
    Feb 1 at 7:06











  • @glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy cc has a longer name than the c and t globals, though.

    – Kaz
    Feb 2 at 19:56
















More descriptive variable names would be helpful.

– glenn jackman
Jan 30 at 18:31





More descriptive variable names would be helpful.

– glenn jackman
Jan 30 at 18:31













Thank you it works fine. This is exactly what I wanted.

– Mr_S
Feb 1 at 7:06





Thank you it works fine. This is exactly what I wanted.

– Mr_S
Feb 1 at 7:06













@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy cc has a longer name than the c and t globals, though.

– Kaz
Feb 2 at 19:56





@glennjackman Descriptive names in a throwaway twp-liner in Awk? What for. It's a bit ironic that the loop dummy cc has a longer name than the c and t globals, though.

– Kaz
Feb 2 at 19:56













0














With Miller (http://johnkerl.org/miller/doc) you can run



mlr --nidx stats1 -a mean -f 4 -g 1 inputFile


and have



30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000



  • -f 4 to set the Value-field names on which to compute statistics (the fourth field);


  • -g 1 to set group-by-field names (the first field).





share|improve this answer























  • Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.

    – Mr_S
    Feb 1 at 7:41











  • @Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0

    – aborruso
    Feb 1 at 8:21















0














With Miller (http://johnkerl.org/miller/doc) you can run



mlr --nidx stats1 -a mean -f 4 -g 1 inputFile


and have



30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000



  • -f 4 to set the Value-field names on which to compute statistics (the fourth field);


  • -g 1 to set group-by-field names (the first field).





share|improve this answer























  • Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.

    – Mr_S
    Feb 1 at 7:41











  • @Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0

    – aborruso
    Feb 1 at 8:21













0












0








0







With Miller (http://johnkerl.org/miller/doc) you can run



mlr --nidx stats1 -a mean -f 4 -g 1 inputFile


and have



30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000



  • -f 4 to set the Value-field names on which to compute statistics (the fourth field);


  • -g 1 to set group-by-field names (the first field).





share|improve this answer













With Miller (http://johnkerl.org/miller/doc) you can run



mlr --nidx stats1 -a mean -f 4 -g 1 inputFile


and have



30901 89.000000
30902 51.000000
30903 49.500000
30904 59.000000



  • -f 4 to set the Value-field names on which to compute statistics (the fourth field);


  • -g 1 to set group-by-field names (the first field).






share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 31 at 9:07









aborrusoaborruso

22829




22829












  • Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.

    – Mr_S
    Feb 1 at 7:41











  • @Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0

    – aborruso
    Feb 1 at 8:21

















  • Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.

    – Mr_S
    Feb 1 at 7:41











  • @Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0

    – aborruso
    Feb 1 at 8:21
















Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.

– Mr_S
Feb 1 at 7:41





Thanks for the alternate solution. I cannot confirm it because I am using Fedora OS and I couldn't find Miller for fedora.

– Mr_S
Feb 1 at 7:41













@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0

– aborruso
Feb 1 at 8:21





@Mr_S here you have the generic linux binary (mlr.linux.x86_64 ) github.com/johnkerl/miller/releases/tag/5.4.0

– aborruso
Feb 1 at 8:21

















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497716%2fto-get-average-of-columns-for-all-the-entries-whose-first-column-entries-are-dup%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

Peggy Mitchell

Palaiologos

The Forum (Inglewood, California)