Find count of unique values in columns based on date in column 1
Clash Royale CLAN TAG#URR8PPP
I have 3 comma separated quoted fields.
last crawled,linking page,domain
"Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
"Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
"Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
etc
I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :
awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
But I am a little confused on getting all 3 columns in a single go.
text-processing awk csv
add a comment |
I have 3 comma separated quoted fields.
last crawled,linking page,domain
"Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
"Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
"Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
etc
I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :
awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
But I am a little confused on getting all 3 columns in a single go.
text-processing awk csv
add a comment |
I have 3 comma separated quoted fields.
last crawled,linking page,domain
"Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
"Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
"Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
etc
I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :
awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
But I am a little confused on getting all 3 columns in a single go.
text-processing awk csv
I have 3 comma separated quoted fields.
last crawled,linking page,domain
"Nov 17, 2018","https://allestoringen.be/problemen/bwin/antwerpen","allestoringen.be"
"Aug 11, 2017","http://casino.linkplek.be/","linkplek.be"
"Nov 17, 2018","http://pronoroll.blogspot.com/p/blog-page_26.html","pronoroll.blogspot.com"
etc
I need to remove duplicates on the date fields and find the count of unique linking pages for each unique date (column $2) and count of unique domains for the unique date (column $3). I have tried :
awk 'A[$1 OFS $2]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
awk 'A[$1 OFS $3]++ END for(k in A) print k, A[k]' FPAT='([^,]*)|("[^"]+")' file
But I am a little confused on getting all 3 columns in a single go.
text-processing awk csv
text-processing awk csv
edited Jan 28 at 12:22
Mallik Kumar
asked Jan 28 at 12:02
Mallik KumarMallik Kumar
154
154
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
# script.awk
BEGIN
FPAT = "[^"]+"
# the first if is to avoid processing title row and rows that do not contain fields
if (NF > 1)
# the third field is the linking page column; the second field is a comma
if ($3 != "" && $1 $3 in unique_linking_pages == 0)
unique_linking_pages[$1 $3]
unique_linking_page_counts[$1]++
# the fifth field is the domain column; the fourth field is a comma
if ($5 != "" && $1 $5 in unique_domains == 0)
unique_domains[$1 $5]
unique_domain_counts[$1]++
# in case of empty fields in columns 2 and or 3 of the input file,
# this ensures that all the dates are recorded
dates[$1]
END " unique_linking_page_counts[date] "
@NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts
– Mallik Kumar
Jan 28 at 13:13
So, for example, even thoughNov 17, 2018
has two unique linking pages, you want theunique_linking_page_counts
to be equal to2
for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!
– Niko Gambt
Jan 28 at 13:35
Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.
– Mallik Kumar
Jan 28 at 13:41
@MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.
– Niko Gambt
Jan 28 at 13:43
My mistake. Should have put one in my post. Appreciate your help in this matter.
– Mallik Kumar
Jan 28 at 13:47
|
show 2 more comments
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497178%2ffind-count-of-unique-values-in-columns-based-on-date-in-column-1%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
# script.awk
BEGIN
FPAT = "[^"]+"
# the first if is to avoid processing title row and rows that do not contain fields
if (NF > 1)
# the third field is the linking page column; the second field is a comma
if ($3 != "" && $1 $3 in unique_linking_pages == 0)
unique_linking_pages[$1 $3]
unique_linking_page_counts[$1]++
# the fifth field is the domain column; the fourth field is a comma
if ($5 != "" && $1 $5 in unique_domains == 0)
unique_domains[$1 $5]
unique_domain_counts[$1]++
# in case of empty fields in columns 2 and or 3 of the input file,
# this ensures that all the dates are recorded
dates[$1]
END " unique_linking_page_counts[date] "
@NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts
– Mallik Kumar
Jan 28 at 13:13
So, for example, even thoughNov 17, 2018
has two unique linking pages, you want theunique_linking_page_counts
to be equal to2
for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!
– Niko Gambt
Jan 28 at 13:35
Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.
– Mallik Kumar
Jan 28 at 13:41
@MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.
– Niko Gambt
Jan 28 at 13:43
My mistake. Should have put one in my post. Appreciate your help in this matter.
– Mallik Kumar
Jan 28 at 13:47
|
show 2 more comments
# script.awk
BEGIN
FPAT = "[^"]+"
# the first if is to avoid processing title row and rows that do not contain fields
if (NF > 1)
# the third field is the linking page column; the second field is a comma
if ($3 != "" && $1 $3 in unique_linking_pages == 0)
unique_linking_pages[$1 $3]
unique_linking_page_counts[$1]++
# the fifth field is the domain column; the fourth field is a comma
if ($5 != "" && $1 $5 in unique_domains == 0)
unique_domains[$1 $5]
unique_domain_counts[$1]++
# in case of empty fields in columns 2 and or 3 of the input file,
# this ensures that all the dates are recorded
dates[$1]
END " unique_linking_page_counts[date] "
@NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts
– Mallik Kumar
Jan 28 at 13:13
So, for example, even thoughNov 17, 2018
has two unique linking pages, you want theunique_linking_page_counts
to be equal to2
for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!
– Niko Gambt
Jan 28 at 13:35
Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.
– Mallik Kumar
Jan 28 at 13:41
@MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.
– Niko Gambt
Jan 28 at 13:43
My mistake. Should have put one in my post. Appreciate your help in this matter.
– Mallik Kumar
Jan 28 at 13:47
|
show 2 more comments
# script.awk
BEGIN
FPAT = "[^"]+"
# the first if is to avoid processing title row and rows that do not contain fields
if (NF > 1)
# the third field is the linking page column; the second field is a comma
if ($3 != "" && $1 $3 in unique_linking_pages == 0)
unique_linking_pages[$1 $3]
unique_linking_page_counts[$1]++
# the fifth field is the domain column; the fourth field is a comma
if ($5 != "" && $1 $5 in unique_domains == 0)
unique_domains[$1 $5]
unique_domain_counts[$1]++
# in case of empty fields in columns 2 and or 3 of the input file,
# this ensures that all the dates are recorded
dates[$1]
END " unique_linking_page_counts[date] "
# script.awk
BEGIN
FPAT = "[^"]+"
# the first if is to avoid processing title row and rows that do not contain fields
if (NF > 1)
# the third field is the linking page column; the second field is a comma
if ($3 != "" && $1 $3 in unique_linking_pages == 0)
unique_linking_pages[$1 $3]
unique_linking_page_counts[$1]++
# the fifth field is the domain column; the fourth field is a comma
if ($5 != "" && $1 $5 in unique_domains == 0)
unique_domains[$1 $5]
unique_domain_counts[$1]++
# in case of empty fields in columns 2 and or 3 of the input file,
# this ensures that all the dates are recorded
dates[$1]
END " unique_linking_page_counts[date] "
edited Jan 28 at 14:46
answered Jan 28 at 12:44
Niko GambtNiko Gambt
1836
1836
@NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts
– Mallik Kumar
Jan 28 at 13:13
So, for example, even thoughNov 17, 2018
has two unique linking pages, you want theunique_linking_page_counts
to be equal to2
for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!
– Niko Gambt
Jan 28 at 13:35
Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.
– Mallik Kumar
Jan 28 at 13:41
@MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.
– Niko Gambt
Jan 28 at 13:43
My mistake. Should have put one in my post. Appreciate your help in this matter.
– Mallik Kumar
Jan 28 at 13:47
|
show 2 more comments
@NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts
– Mallik Kumar
Jan 28 at 13:13
So, for example, even thoughNov 17, 2018
has two unique linking pages, you want theunique_linking_page_counts
to be equal to2
for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!
– Niko Gambt
Jan 28 at 13:35
Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.
– Mallik Kumar
Jan 28 at 13:41
@MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.
– Niko Gambt
Jan 28 at 13:43
My mistake. Should have put one in my post. Appreciate your help in this matter.
– Mallik Kumar
Jan 28 at 13:47
@NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts
– Mallik Kumar
Jan 28 at 13:13
@NikoGambt..can the output be in the form of: date, unique_linking_page_counts, unique_domain_counts No pages or domains...just counts
– Mallik Kumar
Jan 28 at 13:13
So, for example, even though
Nov 17, 2018
has two unique linking pages, you want the unique_linking_page_counts
to be equal to 2
for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!– Niko Gambt
Jan 28 at 13:35
So, for example, even though
Nov 17, 2018
has two unique linking pages, you want the unique_linking_page_counts
to be equal to 2
for that date?? The only way to do what you're asking is to group the unique linking pages by date (and to group the unique domains by date too). However, doing so would lose the information of what linking pages (and domains) are unique!– Niko Gambt
Jan 28 at 13:35
Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.
– Mallik Kumar
Jan 28 at 13:41
Yes....I want to get only unique dates, ie. Nov 17, 2018 appears once and number of unique linking pages will be 2 as in this case...this is true for domains as well.
– Mallik Kumar
Jan 28 at 13:41
@MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.
– Niko Gambt
Jan 28 at 13:43
@MallikKumar Just an advice, you should have provided an example output in your OP. No one has the superpower to guess what you want. Just like me, anybody can misunderstand what you want.
– Niko Gambt
Jan 28 at 13:43
My mistake. Should have put one in my post. Appreciate your help in this matter.
– Mallik Kumar
Jan 28 at 13:47
My mistake. Should have put one in my post. Appreciate your help in this matter.
– Mallik Kumar
Jan 28 at 13:47
|
show 2 more comments
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497178%2ffind-count-of-unique-values-in-columns-based-on-date-in-column-1%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown