Find count of unique values in columns based on date in column 1

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












0















I have 3 comma separated quoted fields.



last crawled,linking page,domain
"Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
"Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
"Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
etc


I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :



awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file


But I am a little confused on getting all 3 columns in a single go.










share|improve this question




























    0















    I have 3 comma separated quoted fields.



    last crawled,linking page,domain
    "Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
    "Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
    "Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
    etc


    I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :



    awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
    awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file


    But I am a little confused on getting all 3 columns in a single go.










    share|improve this question


























      0












      0








      0


      0






      I have 3 comma separated quoted fields.



      last crawled,linking page,domain
      "Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
      "Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
      "Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
      etc


      I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :



      awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
      awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file


      But I am a little confused on getting all 3 columns in a single go.










      share|improve this question
















      I have 3 comma separated quoted fields.



      last crawled,linking page,domain
      "Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
      "Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
      "Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
      etc


      I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :



      awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
      awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file


      But I am a little confused on getting all 3 columns in a single go.







      text-processing awk csv






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 28 at 12:22







      Mallik Kumar

















      asked Jan 28 at 12:02









      Mallik KumarMallik Kumar

      154




      154




















          1 Answer
          1






          active

          oldest

          votes


















          0














          # script.awk

          BEGIN
          FPAT = "[^"]+"



          # the first if is to avoid processing title row and rows that do not contain fields
          if (NF > 1)
          # the third field is the linking page column; the second field is a comma
          if ($3 != "" && $1 $3 in unique_linking_pages == 0)
          unique_linking_pages[$1 $3]
          unique_linking_page_counts[$1]++

          # the fifth field is the domain column; the fourth field is a comma
          if ($5 != "" && $1 $5 in unique_domains == 0)
          unique_domains[$1 $5]
          unique_domain_counts[$1]++


          # in case of empty fields in columns 2 and or 3 of the input file,
          # this ensures that all the dates are recorded
          dates[$1]



          END " unique_linking_page_counts[date] "





          share|improve this answer

























          • @NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts

            – Mallik Kumar
            Jan 28 at 13:13












          • So, for example, even though Nov 17, 2018 has two unique linking pages, you want the unique_linking_page_counts to be equal to 2 for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!

            – Niko Gambt
            Jan 28 at 13:35












          • Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.

            – Mallik Kumar
            Jan 28 at 13:41












          • @MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.

            – Niko Gambt
            Jan 28 at 13:43











          • My mistake. Should have put one in my post. Appreciate your help in this matter.

            – Mallik Kumar
            Jan 28 at 13:47











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497178%2ffind-count-of-unique-values-in-columns-based-on-date-in-column-1%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          # script.awk

          BEGIN
          FPAT = "[^"]+"



          # the first if is to avoid processing title row and rows that do not contain fields
          if (NF > 1)
          # the third field is the linking page column; the second field is a comma
          if ($3 != "" && $1 $3 in unique_linking_pages == 0)
          unique_linking_pages[$1 $3]
          unique_linking_page_counts[$1]++

          # the fifth field is the domain column; the fourth field is a comma
          if ($5 != "" && $1 $5 in unique_domains == 0)
          unique_domains[$1 $5]
          unique_domain_counts[$1]++


          # in case of empty fields in columns 2 and or 3 of the input file,
          # this ensures that all the dates are recorded
          dates[$1]



          END " unique_linking_page_counts[date] "





          share|improve this answer

























          • @NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts

            – Mallik Kumar
            Jan 28 at 13:13












          • So, for example, even though Nov 17, 2018 has two unique linking pages, you want the unique_linking_page_counts to be equal to 2 for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!

            – Niko Gambt
            Jan 28 at 13:35












          • Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.

            – Mallik Kumar
            Jan 28 at 13:41












          • @MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.

            – Niko Gambt
            Jan 28 at 13:43











          • My mistake. Should have put one in my post. Appreciate your help in this matter.

            – Mallik Kumar
            Jan 28 at 13:47
















          0














          # script.awk

          BEGIN
          FPAT = "[^"]+"



          # the first if is to avoid processing title row and rows that do not contain fields
          if (NF > 1)
          # the third field is the linking page column; the second field is a comma
          if ($3 != "" && $1 $3 in unique_linking_pages == 0)
          unique_linking_pages[$1 $3]
          unique_linking_page_counts[$1]++

          # the fifth field is the domain column; the fourth field is a comma
          if ($5 != "" && $1 $5 in unique_domains == 0)
          unique_domains[$1 $5]
          unique_domain_counts[$1]++


          # in case of empty fields in columns 2 and or 3 of the input file,
          # this ensures that all the dates are recorded
          dates[$1]



          END " unique_linking_page_counts[date] "





          share|improve this answer

























          • @NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts

            – Mallik Kumar
            Jan 28 at 13:13












          • So, for example, even though Nov 17, 2018 has two unique linking pages, you want the unique_linking_page_counts to be equal to 2 for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!

            – Niko Gambt
            Jan 28 at 13:35












          • Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.

            – Mallik Kumar
            Jan 28 at 13:41












          • @MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.

            – Niko Gambt
            Jan 28 at 13:43











          • My mistake. Should have put one in my post. Appreciate your help in this matter.

            – Mallik Kumar
            Jan 28 at 13:47














          0












          0








          0







          # script.awk

          BEGIN
          FPAT = "[^"]+"



          # the first if is to avoid processing title row and rows that do not contain fields
          if (NF > 1)
          # the third field is the linking page column; the second field is a comma
          if ($3 != "" && $1 $3 in unique_linking_pages == 0)
          unique_linking_pages[$1 $3]
          unique_linking_page_counts[$1]++

          # the fifth field is the domain column; the fourth field is a comma
          if ($5 != "" && $1 $5 in unique_domains == 0)
          unique_domains[$1 $5]
          unique_domain_counts[$1]++


          # in case of empty fields in columns 2 and or 3 of the input file,
          # this ensures that all the dates are recorded
          dates[$1]



          END " unique_linking_page_counts[date] "





          share|improve this answer















          # script.awk

          BEGIN
          FPAT = "[^"]+"



          # the first if is to avoid processing title row and rows that do not contain fields
          if (NF > 1)
          # the third field is the linking page column; the second field is a comma
          if ($3 != "" && $1 $3 in unique_linking_pages == 0)
          unique_linking_pages[$1 $3]
          unique_linking_page_counts[$1]++

          # the fifth field is the domain column; the fourth field is a comma
          if ($5 != "" && $1 $5 in unique_domains == 0)
          unique_domains[$1 $5]
          unique_domain_counts[$1]++


          # in case of empty fields in columns 2 and or 3 of the input file,
          # this ensures that all the dates are recorded
          dates[$1]



          END " unique_linking_page_counts[date] "






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 28 at 14:46

























          answered Jan 28 at 12:44









          Niko GambtNiko Gambt

          1836




          1836












          • @NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts

            – Mallik Kumar
            Jan 28 at 13:13












          • So, for example, even though Nov 17, 2018 has two unique linking pages, you want the unique_linking_page_counts to be equal to 2 for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!

            – Niko Gambt
            Jan 28 at 13:35












          • Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.

            – Mallik Kumar
            Jan 28 at 13:41












          • @MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.

            – Niko Gambt
            Jan 28 at 13:43











          • My mistake. Should have put one in my post. Appreciate your help in this matter.

            – Mallik Kumar
            Jan 28 at 13:47


















          • @NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts

            – Mallik Kumar
            Jan 28 at 13:13












          • So, for example, even though Nov 17, 2018 has two unique linking pages, you want the unique_linking_page_counts to be equal to 2 for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!

            – Niko Gambt
            Jan 28 at 13:35












          • Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.

            – Mallik Kumar
            Jan 28 at 13:41












          • @MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.

            – Niko Gambt
            Jan 28 at 13:43











          • My mistake. Should have put one in my post. Appreciate your help in this matter.

            – Mallik Kumar
            Jan 28 at 13:47

















          @NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts

          – Mallik Kumar
          Jan 28 at 13:13






          @NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts

          – Mallik Kumar
          Jan 28 at 13:13














          So, for example, even though Nov 17, 2018 has two unique linking pages, you want the unique_linking_page_counts to be equal to 2 for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!

          – Niko Gambt
          Jan 28 at 13:35






          So, for example, even though Nov 17, 2018 has two unique linking pages, you want the unique_linking_page_counts to be equal to 2 for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!

          – Niko Gambt
          Jan 28 at 13:35














          Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.

          – Mallik Kumar
          Jan 28 at 13:41






          Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.

          – Mallik Kumar
          Jan 28 at 13:41














          @MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.

          – Niko Gambt
          Jan 28 at 13:43





          @MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.

          – Niko Gambt
          Jan 28 at 13:43













          My mistake. Should have put one in my post. Appreciate your help in this matter.

          – Mallik Kumar
          Jan 28 at 13:47






          My mistake. Should have put one in my post. Appreciate your help in this matter.

          – Mallik Kumar
          Jan 28 at 13:47


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497178%2ffind-count-of-unique-values-in-columns-based-on-date-in-column-1%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Bahrain

          Postfix configuration issue with fips on centos 7; mailgun relay