Simplest command to print unique values of some column data with count of repeated values

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.



sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.



Input :



A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00


Output :




A(2) -06:00
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00


or



Output :




A 2 -06:00
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00






share|improve this question


























    up vote
    0
    down vote

    favorite












    A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.



    sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.



    Input :



    A 3210 -06:00
    A 5172 -06:00
    A 3335 -07:00
    A 3258 -05:00
    B 3322 -05:00
    B 5097 -05:00
    C 3238 -06:00
    C 5364 -05:00
    C 3366 -06:00
    C 3293 -06:00


    Output :




    A(2) -06:00
    A(1) -07:00
    A(1) -05:00
    B(2) -05:00
    C(3) -06:00
    C(1) -05:00


    or



    Output :




    A 2 -06:00
    A 1 -07:00
    A 1 -05:00
    B 2 -05:00
    C 3 -06:00
    C 1 -05:00






    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.



      sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.



      Input :



      A 3210 -06:00
      A 5172 -06:00
      A 3335 -07:00
      A 3258 -05:00
      B 3322 -05:00
      B 5097 -05:00
      C 3238 -06:00
      C 5364 -05:00
      C 3366 -06:00
      C 3293 -06:00


      Output :




      A(2) -06:00
      A(1) -07:00
      A(1) -05:00
      B(2) -05:00
      C(3) -06:00
      C(1) -05:00


      or



      Output :




      A 2 -06:00
      A 1 -07:00
      A 1 -05:00
      B 2 -05:00
      C 3 -06:00
      C 1 -05:00






      share|improve this question














      A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.



      sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.



      Input :



      A 3210 -06:00
      A 5172 -06:00
      A 3335 -07:00
      A 3258 -05:00
      B 3322 -05:00
      B 5097 -05:00
      C 3238 -06:00
      C 5364 -05:00
      C 3366 -06:00
      C 3293 -06:00


      Output :




      A(2) -06:00
      A(1) -07:00
      A(1) -05:00
      B(2) -05:00
      C(3) -06:00
      C(1) -05:00


      or



      Output :




      A 2 -06:00
      A 1 -07:00
      A 1 -05:00
      B 2 -05:00
      C 3 -06:00
      C 1 -05:00








      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 11 at 19:25

























      asked Apr 11 at 18:50









      Bharat

      3058




      3058




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted











          1. Given Input use cut, sort, uniq and sed:



            cut -d ' ' -f1,3 Input | 
            sort | uniq -c |
            sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'



          2. Using datamash and sed:



            datamash -t ' ' -g1,3 -s countunique 2 < Input | 
            sed 's/(.*) (.*) (.*)/1 3 2/'


          Output of either:



          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer






















          • Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
            – Bharat
            Apr 11 at 19:29

















          up vote
          3
          down vote













          Not precisely the format you want, but fits all other requirements:



          awk 'print $1" "$3' <inFile> | sort | uniq -c



          In english, use awk to print only the first and third columns, then sort, then uniq with count.






          share|improve this answer




















          • An awk statement that uses arrays to do the counting for you is handy.
            – DopeGhoti
            Apr 11 at 19:14










          • I see it works very well, nice one, I take my comments back ..
            – Bharat
            Apr 11 at 19:50

















          up vote
          3
          down vote













          $ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
          A(1) -07:00
          B(2) -05:00
          A(2) -06:00
          A(1) -05:00
          C(3) -06:00
          C(1) -05:00


          Note that the output may not be sorted. Pass it through sort if needed.



          The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.




          In the alternative format:



          If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):



          $ cut -d ' ' -f 1,3 file | sort | uniq -c
          1 A -05:00
          2 A -06:00
          1 A -07:00
          2 B -05:00
          1 C -05:00
          3 C -06:00


          To swap the two first columns:



          $ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer






















          • it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
            – Bharat
            Apr 11 at 19:21










          • Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
            – Bharat
            Apr 11 at 20:36










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f437116%2fsimplest-command-to-print-unique-values-of-some-column-data-with-count-of-repeat%23new-answer', 'question_page');

          );

          Post as a guest






























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted











          1. Given Input use cut, sort, uniq and sed:



            cut -d ' ' -f1,3 Input | 
            sort | uniq -c |
            sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'



          2. Using datamash and sed:



            datamash -t ' ' -g1,3 -s countunique 2 < Input | 
            sed 's/(.*) (.*) (.*)/1 3 2/'


          Output of either:



          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer






















          • Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
            – Bharat
            Apr 11 at 19:29














          up vote
          2
          down vote



          accepted











          1. Given Input use cut, sort, uniq and sed:



            cut -d ' ' -f1,3 Input | 
            sort | uniq -c |
            sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'



          2. Using datamash and sed:



            datamash -t ' ' -g1,3 -s countunique 2 < Input | 
            sed 's/(.*) (.*) (.*)/1 3 2/'


          Output of either:



          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer






















          • Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
            – Bharat
            Apr 11 at 19:29












          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted







          1. Given Input use cut, sort, uniq and sed:



            cut -d ' ' -f1,3 Input | 
            sort | uniq -c |
            sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'



          2. Using datamash and sed:



            datamash -t ' ' -g1,3 -s countunique 2 < Input | 
            sed 's/(.*) (.*) (.*)/1 3 2/'


          Output of either:



          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer















          1. Given Input use cut, sort, uniq and sed:



            cut -d ' ' -f1,3 Input | 
            sort | uniq -c |
            sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'



          2. Using datamash and sed:



            datamash -t ' ' -g1,3 -s countunique 2 < Input | 
            sed 's/(.*) (.*) (.*)/1 3 2/'


          Output of either:



          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 11 at 19:30

























          answered Apr 11 at 19:11









          agc

          4,0091935




          4,0091935











          • Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
            – Bharat
            Apr 11 at 19:29
















          • Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
            – Bharat
            Apr 11 at 19:29















          Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
          – Bharat
          Apr 11 at 19:29




          Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
          – Bharat
          Apr 11 at 19:29












          up vote
          3
          down vote













          Not precisely the format you want, but fits all other requirements:



          awk 'print $1" "$3' <inFile> | sort | uniq -c



          In english, use awk to print only the first and third columns, then sort, then uniq with count.






          share|improve this answer




















          • An awk statement that uses arrays to do the counting for you is handy.
            – DopeGhoti
            Apr 11 at 19:14










          • I see it works very well, nice one, I take my comments back ..
            – Bharat
            Apr 11 at 19:50














          up vote
          3
          down vote













          Not precisely the format you want, but fits all other requirements:



          awk 'print $1" "$3' <inFile> | sort | uniq -c



          In english, use awk to print only the first and third columns, then sort, then uniq with count.






          share|improve this answer




















          • An awk statement that uses arrays to do the counting for you is handy.
            – DopeGhoti
            Apr 11 at 19:14










          • I see it works very well, nice one, I take my comments back ..
            – Bharat
            Apr 11 at 19:50












          up vote
          3
          down vote










          up vote
          3
          down vote









          Not precisely the format you want, but fits all other requirements:



          awk 'print $1" "$3' <inFile> | sort | uniq -c



          In english, use awk to print only the first and third columns, then sort, then uniq with count.






          share|improve this answer












          Not precisely the format you want, but fits all other requirements:



          awk 'print $1" "$3' <inFile> | sort | uniq -c



          In english, use awk to print only the first and third columns, then sort, then uniq with count.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 11 at 19:03









          hhoke1

          31416




          31416











          • An awk statement that uses arrays to do the counting for you is handy.
            – DopeGhoti
            Apr 11 at 19:14










          • I see it works very well, nice one, I take my comments back ..
            – Bharat
            Apr 11 at 19:50
















          • An awk statement that uses arrays to do the counting for you is handy.
            – DopeGhoti
            Apr 11 at 19:14










          • I see it works very well, nice one, I take my comments back ..
            – Bharat
            Apr 11 at 19:50















          An awk statement that uses arrays to do the counting for you is handy.
          – DopeGhoti
          Apr 11 at 19:14




          An awk statement that uses arrays to do the counting for you is handy.
          – DopeGhoti
          Apr 11 at 19:14












          I see it works very well, nice one, I take my comments back ..
          – Bharat
          Apr 11 at 19:50




          I see it works very well, nice one, I take my comments back ..
          – Bharat
          Apr 11 at 19:50










          up vote
          3
          down vote













          $ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
          A(1) -07:00
          B(2) -05:00
          A(2) -06:00
          A(1) -05:00
          C(3) -06:00
          C(1) -05:00


          Note that the output may not be sorted. Pass it through sort if needed.



          The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.




          In the alternative format:



          If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):



          $ cut -d ' ' -f 1,3 file | sort | uniq -c
          1 A -05:00
          2 A -06:00
          1 A -07:00
          2 B -05:00
          1 C -05:00
          3 C -06:00


          To swap the two first columns:



          $ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer






















          • it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
            – Bharat
            Apr 11 at 19:21










          • Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
            – Bharat
            Apr 11 at 20:36














          up vote
          3
          down vote













          $ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
          A(1) -07:00
          B(2) -05:00
          A(2) -06:00
          A(1) -05:00
          C(3) -06:00
          C(1) -05:00


          Note that the output may not be sorted. Pass it through sort if needed.



          The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.




          In the alternative format:



          If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):



          $ cut -d ' ' -f 1,3 file | sort | uniq -c
          1 A -05:00
          2 A -06:00
          1 A -07:00
          2 B -05:00
          1 C -05:00
          3 C -06:00


          To swap the two first columns:



          $ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer






















          • it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
            – Bharat
            Apr 11 at 19:21










          • Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
            – Bharat
            Apr 11 at 20:36












          up vote
          3
          down vote










          up vote
          3
          down vote









          $ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
          A(1) -07:00
          B(2) -05:00
          A(2) -06:00
          A(1) -05:00
          C(3) -06:00
          C(1) -05:00


          Note that the output may not be sorted. Pass it through sort if needed.



          The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.




          In the alternative format:



          If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):



          $ cut -d ' ' -f 1,3 file | sort | uniq -c
          1 A -05:00
          2 A -06:00
          1 A -07:00
          2 B -05:00
          1 C -05:00
          3 C -06:00


          To swap the two first columns:



          $ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00





          share|improve this answer














          $ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
          A(1) -07:00
          B(2) -05:00
          A(2) -06:00
          A(1) -05:00
          C(3) -06:00
          C(1) -05:00


          Note that the output may not be sorted. Pass it through sort if needed.



          The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.




          In the alternative format:



          If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):



          $ cut -d ' ' -f 1,3 file | sort | uniq -c
          1 A -05:00
          2 A -06:00
          1 A -07:00
          2 B -05:00
          1 C -05:00
          3 C -06:00


          To swap the two first columns:



          $ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
          A 1 -05:00
          A 2 -06:00
          A 1 -07:00
          B 2 -05:00
          C 1 -05:00
          C 3 -06:00






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 11 at 19:37

























          answered Apr 11 at 19:13









          Kusalananda

          102k13199316




          102k13199316











          • it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
            – Bharat
            Apr 11 at 19:21










          • Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
            – Bharat
            Apr 11 at 20:36
















          • it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
            – Bharat
            Apr 11 at 19:21










          • Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
            – Bharat
            Apr 11 at 20:36















          it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
          – Bharat
          Apr 11 at 19:21




          it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
          – Bharat
          Apr 11 at 19:21












          Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
          – Bharat
          Apr 11 at 20:36




          Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
          – Bharat
          Apr 11 at 20:36












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f437116%2fsimplest-command-to-print-unique-values-of-some-column-data-with-count-of-repeat%23new-answer', 'question_page');

          );

          Post as a guest