Simplest command to print unique values of some column data with count of repeated values

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.
sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.
Input :
A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00
Output :
A(2) -06:00
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00
or
Output :
A 2 -06:00
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00
text-processing awk
add a comment |Â
up vote
0
down vote
favorite
A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.
sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.
Input :
A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00
Output :
A(2) -06:00
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00
or
Output :
A 2 -06:00
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00
text-processing awk
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.
sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.
Input :
A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00
Output :
A(2) -06:00
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00
or
Output :
A 2 -06:00
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00
text-processing awk
A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.
sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.
Input :
A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00
Output :
A(2) -06:00
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00
or
Output :
A 2 -06:00
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00
text-processing awk
edited Apr 11 at 19:25
asked Apr 11 at 18:50
Bharat
3058
3058
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
2
down vote
accepted
Given Input use
cut,sort,uniqandsed:cut -d ' ' -f1,3 Input |
sort | uniq -c |
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'Using
datamashandsed:datamash -t ' ' -g1,3 -s countunique 2 < Input |
sed 's/(.*) (.*) (.*)/1 3 2/'
Output of either:
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â Bharat
Apr 11 at 19:29
add a comment |Â
up vote
3
down vote
Not precisely the format you want, but fits all other requirements:
awk 'print $1" "$3' <inFile> | sort | uniq -c
In english, use awk to print only the first and third columns, then sort, then uniq with count.
Anawkstatement that uses arrays to do the counting for you is handy.
â DopeGhoti
Apr 11 at 19:14
I see it works very well, nice one, I take my comments back ..
â Bharat
Apr 11 at 19:50
add a comment |Â
up vote
3
down vote
$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00
Note that the output may not be sorted. Pass it through sort if needed.
The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.
In the alternative format:
If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):
$ cut -d ' ' -f 1,3 file | sort | uniq -c
1 A -05:00
2 A -06:00
1 A -07:00
2 B -05:00
1 C -05:00
3 C -06:00
To swap the two first columns:
$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â Bharat
Apr 11 at 19:21
Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â Bharat
Apr 11 at 20:36
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Given Input use
cut,sort,uniqandsed:cut -d ' ' -f1,3 Input |
sort | uniq -c |
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'Using
datamashandsed:datamash -t ' ' -g1,3 -s countunique 2 < Input |
sed 's/(.*) (.*) (.*)/1 3 2/'
Output of either:
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â Bharat
Apr 11 at 19:29
add a comment |Â
up vote
2
down vote
accepted
Given Input use
cut,sort,uniqandsed:cut -d ' ' -f1,3 Input |
sort | uniq -c |
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'Using
datamashandsed:datamash -t ' ' -g1,3 -s countunique 2 < Input |
sed 's/(.*) (.*) (.*)/1 3 2/'
Output of either:
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â Bharat
Apr 11 at 19:29
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Given Input use
cut,sort,uniqandsed:cut -d ' ' -f1,3 Input |
sort | uniq -c |
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'Using
datamashandsed:datamash -t ' ' -g1,3 -s countunique 2 < Input |
sed 's/(.*) (.*) (.*)/1 3 2/'
Output of either:
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
Given Input use
cut,sort,uniqandsed:cut -d ' ' -f1,3 Input |
sort | uniq -c |
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'Using
datamashandsed:datamash -t ' ' -g1,3 -s countunique 2 < Input |
sed 's/(.*) (.*) (.*)/1 3 2/'
Output of either:
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
edited Apr 11 at 19:30
answered Apr 11 at 19:11
agc
4,0091935
4,0091935
Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â Bharat
Apr 11 at 19:29
add a comment |Â
Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â Bharat
Apr 11 at 19:29
Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â Bharat
Apr 11 at 19:29
Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â Bharat
Apr 11 at 19:29
add a comment |Â
up vote
3
down vote
Not precisely the format you want, but fits all other requirements:
awk 'print $1" "$3' <inFile> | sort | uniq -c
In english, use awk to print only the first and third columns, then sort, then uniq with count.
Anawkstatement that uses arrays to do the counting for you is handy.
â DopeGhoti
Apr 11 at 19:14
I see it works very well, nice one, I take my comments back ..
â Bharat
Apr 11 at 19:50
add a comment |Â
up vote
3
down vote
Not precisely the format you want, but fits all other requirements:
awk 'print $1" "$3' <inFile> | sort | uniq -c
In english, use awk to print only the first and third columns, then sort, then uniq with count.
Anawkstatement that uses arrays to do the counting for you is handy.
â DopeGhoti
Apr 11 at 19:14
I see it works very well, nice one, I take my comments back ..
â Bharat
Apr 11 at 19:50
add a comment |Â
up vote
3
down vote
up vote
3
down vote
Not precisely the format you want, but fits all other requirements:
awk 'print $1" "$3' <inFile> | sort | uniq -c
In english, use awk to print only the first and third columns, then sort, then uniq with count.
Not precisely the format you want, but fits all other requirements:
awk 'print $1" "$3' <inFile> | sort | uniq -c
In english, use awk to print only the first and third columns, then sort, then uniq with count.
answered Apr 11 at 19:03
hhoke1
31416
31416
Anawkstatement that uses arrays to do the counting for you is handy.
â DopeGhoti
Apr 11 at 19:14
I see it works very well, nice one, I take my comments back ..
â Bharat
Apr 11 at 19:50
add a comment |Â
Anawkstatement that uses arrays to do the counting for you is handy.
â DopeGhoti
Apr 11 at 19:14
I see it works very well, nice one, I take my comments back ..
â Bharat
Apr 11 at 19:50
An
awk statement that uses arrays to do the counting for you is handy.â DopeGhoti
Apr 11 at 19:14
An
awk statement that uses arrays to do the counting for you is handy.â DopeGhoti
Apr 11 at 19:14
I see it works very well, nice one, I take my comments back ..
â Bharat
Apr 11 at 19:50
I see it works very well, nice one, I take my comments back ..
â Bharat
Apr 11 at 19:50
add a comment |Â
up vote
3
down vote
$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00
Note that the output may not be sorted. Pass it through sort if needed.
The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.
In the alternative format:
If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):
$ cut -d ' ' -f 1,3 file | sort | uniq -c
1 A -05:00
2 A -06:00
1 A -07:00
2 B -05:00
1 C -05:00
3 C -06:00
To swap the two first columns:
$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â Bharat
Apr 11 at 19:21
Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â Bharat
Apr 11 at 20:36
add a comment |Â
up vote
3
down vote
$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00
Note that the output may not be sorted. Pass it through sort if needed.
The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.
In the alternative format:
If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):
$ cut -d ' ' -f 1,3 file | sort | uniq -c
1 A -05:00
2 A -06:00
1 A -07:00
2 B -05:00
1 C -05:00
3 C -06:00
To swap the two first columns:
$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â Bharat
Apr 11 at 19:21
Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â Bharat
Apr 11 at 20:36
add a comment |Â
up vote
3
down vote
up vote
3
down vote
$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00
Note that the output may not be sorted. Pass it through sort if needed.
The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.
In the alternative format:
If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):
$ cut -d ' ' -f 1,3 file | sort | uniq -c
1 A -05:00
2 A -06:00
1 A -07:00
2 B -05:00
1 C -05:00
3 C -06:00
To swap the two first columns:
$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00
Note that the output may not be sorted. Pass it through sort if needed.
The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.
In the alternative format:
If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):
$ cut -d ' ' -f 1,3 file | sort | uniq -c
1 A -05:00
2 A -06:00
1 A -07:00
2 B -05:00
1 C -05:00
3 C -06:00
To swap the two first columns:
$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00
edited Apr 11 at 19:37
answered Apr 11 at 19:13
Kusalananda
102k13199316
102k13199316
it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â Bharat
Apr 11 at 19:21
Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â Bharat
Apr 11 at 20:36
add a comment |Â
it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â Bharat
Apr 11 at 19:21
Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â Bharat
Apr 11 at 20:36
it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â Bharat
Apr 11 at 19:21
it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â Bharat
Apr 11 at 19:21
Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â Bharat
Apr 11 at 20:36
Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â Bharat
Apr 11 at 20:36
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f437116%2fsimplest-command-to-print-unique-values-of-some-column-data-with-count-of-repeat%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password