Simplest command to print unique values of some column data with count of repeated values

up vote
0
down vote

favorite

A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.

sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.

Input :


A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00

Output :


A(2) -06:00 
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00

Output :


A 2 -06:00 
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00

edited Apr 11 at 19:25

asked Apr 11 at 18:50

Bharat

3058

add a commentÂ |Â

up vote
0
down vote

favorite

A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.

sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.

Input :


A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00

Output :


A(2) -06:00 
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00

Output :


A 2 -06:00 
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00

edited Apr 11 at 19:25

asked Apr 11 at 18:50

Bharat

3058

add a commentÂ |Â

up vote
0
down vote

favorite

A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.

sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.

Input :


A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00

Output :


A(2) -06:00 
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00

Output :


A 2 -06:00 
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00

edited Apr 11 at 19:25

asked Apr 11 at 18:50

Bharat

3058

A sample input data with 3 columns, 1st and 3rd column has values in duplicates, need to print them uniquely with their repetition count.

sort -u does help in some sense but unable to print occurrence of repeated values relating 1st and 3rd column.

Input :


A 3210 -06:00
A 5172 -06:00
A 3335 -07:00
A 3258 -05:00
B 3322 -05:00
B 5097 -05:00
C 3238 -06:00
C 5364 -05:00
C 3366 -06:00
C 3293 -06:00

Output :


A(2) -06:00 
A(1) -07:00
A(1) -05:00
B(2) -05:00
C(3) -06:00
C(1) -05:00

Output :


A 2 -06:00 
A 1 -07:00
A 1 -05:00
B 2 -05:00
C 3 -06:00
C 1 -05:00

edited Apr 11 at 19:25

asked Apr 11 at 18:50

Bharat

3058

edited Apr 11 at 19:25

asked Apr 11 at 18:50

Bharat

3058

asked Apr 11 at 18:50

Bharat

3058

asked Apr 11 at 18:50

Bharat

3058

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
2
down vote

accepted

Given Input use cut, sort, uniq and sed:

cut -d ' ' -f1,3 Input | 
sort | uniq -c | 
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'

Using datamash and sed:

datamash -t ' ' -g1,3 -s countunique 2 < Input | 
sed 's/(.*) (.*) (.*)/1 3 2/'

Output of either:

A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:30

answered Apr 11 at 19:11

agc

4,0091935

Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â€“Â Bharat
Apr 11 at 19:29

add a commentÂ |Â

up vote
3
down vote

Not precisely the format you want, but fits all other requirements:

awk 'print $1" "$3' <inFile> | sort | uniq -c

In english, use awk to print only the first and third columns, then sort, then uniq with count.

answered Apr 11 at 19:03

hhoke1

31416

An awk statement that uses arrays to do the counting for you is handy.
â€“Â DopeGhoti
Apr 11 at 19:14

I see it works very well, nice one, I take my comments back ..
â€“Â Bharat
Apr 11 at 19:50

add a commentÂ |Â

up vote
3
down vote

$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00

Note that the output may not be sorted. Pass it through sort if needed.

The code stores the count for how many times the first and third fields of the input has occurred together as a pair, in the count array (with the first and third fields as the index). At the end, we loop over the indexes of the array, splitting them up into the original first and third fields (as field[1] and field[2] respectively) and output these together with the count in the wanted format.

In the alternative format:

If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):

$ cut -d ' ' -f 1,3 file | sort | uniq -c
 1 A -05:00
 2 A -06:00
 1 A -07:00
 2 B -05:00
 1 C -05:00
 3 C -06:00

To swap the two first columns:

$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:37

answered Apr 11 at 19:13

Kusalananda

102k13199316

it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â€“Â Bharat
Apr 11 at 19:21

Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â€“Â Bharat
Apr 11 at 20:36

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f437116%2fsimplest-command-to-print-unique-values-of-some-column-data-with-count-of-repeat%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
2
down vote

accepted

Given Input use cut, sort, uniq and sed:

cut -d ' ' -f1,3 Input | 
sort | uniq -c | 
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'

Using datamash and sed:

datamash -t ' ' -g1,3 -s countunique 2 < Input | 
sed 's/(.*) (.*) (.*)/1 3 2/'

Output of either:

A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:30

answered Apr 11 at 19:11

agc

4,0091935

Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â€“Â Bharat
Apr 11 at 19:29

add a commentÂ |Â

up vote
2
down vote

accepted

Given Input use cut, sort, uniq and sed:

cut -d ' ' -f1,3 Input | 
sort | uniq -c | 
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'

Using datamash and sed:

datamash -t ' ' -g1,3 -s countunique 2 < Input | 
sed 's/(.*) (.*) (.*)/1 3 2/'

Output of either:

A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:30

answered Apr 11 at 19:11

agc

4,0091935

Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â€“Â Bharat
Apr 11 at 19:29

add a commentÂ |Â

up vote
2
down vote

accepted

Given Input use cut, sort, uniq and sed:

cut -d ' ' -f1,3 Input | 
sort | uniq -c | 
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'

Using datamash and sed:

datamash -t ' ' -g1,3 -s countunique 2 < Input | 
sed 's/(.*) (.*) (.*)/1 3 2/'

Output of either:

A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:30

answered Apr 11 at 19:11

agc

4,0091935

Given Input use cut, sort, uniq and sed:

cut -d ' ' -f1,3 Input | 
sort | uniq -c | 
sed 's/^ *//;s/^([0-9]*) ([^ ]*)/2 1/'

Using datamash and sed:

datamash -t ' ' -g1,3 -s countunique 2 < Input | 
sed 's/(.*) (.*) (.*)/1 3 2/'

Output of either:

A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:30

answered Apr 11 at 19:11

agc

4,0091935

edited Apr 11 at 19:30

answered Apr 11 at 19:11

agc

4,0091935

answered Apr 11 at 19:11

agc

4,0091935

answered Apr 11 at 19:11

agc

4,0091935

Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â€“Â Bharat
Apr 11 at 19:29

add a commentÂ |Â

Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â€“Â Bharat
Apr 11 at 19:29

Nice, it is indeed handy , cut -d ' ' -f1,3 d.txt | sort | uniq -c I updated question to exclude those unnecessary (),
â€“Â Bharat
Apr 11 at 19:29

add a commentÂ |Â

up vote
3
down vote

Not precisely the format you want, but fits all other requirements:

awk 'print $1" "$3' <inFile> | sort | uniq -c

In english, use awk to print only the first and third columns, then sort, then uniq with count.

answered Apr 11 at 19:03

hhoke1

31416

An awk statement that uses arrays to do the counting for you is handy.
â€“Â DopeGhoti
Apr 11 at 19:14

I see it works very well, nice one, I take my comments back ..
â€“Â Bharat
Apr 11 at 19:50

add a commentÂ |Â

up vote
3
down vote

Not precisely the format you want, but fits all other requirements:

awk 'print $1" "$3' <inFile> | sort | uniq -c

In english, use awk to print only the first and third columns, then sort, then uniq with count.

answered Apr 11 at 19:03

hhoke1

31416

An awk statement that uses arrays to do the counting for you is handy.
â€“Â DopeGhoti
Apr 11 at 19:14

I see it works very well, nice one, I take my comments back ..
â€“Â Bharat
Apr 11 at 19:50

add a commentÂ |Â

up vote
3
down vote

Not precisely the format you want, but fits all other requirements:

awk 'print $1" "$3' <inFile> | sort | uniq -c

In english, use awk to print only the first and third columns, then sort, then uniq with count.

answered Apr 11 at 19:03

hhoke1

31416

Not precisely the format you want, but fits all other requirements:

awk 'print $1" "$3' <inFile> | sort | uniq -c

In english, use awk to print only the first and third columns, then sort, then uniq with count.

answered Apr 11 at 19:03

hhoke1

31416

answered Apr 11 at 19:03

hhoke1

31416

answered Apr 11 at 19:03

hhoke1

31416

answered Apr 11 at 19:03

hhoke1

31416

An awk statement that uses arrays to do the counting for you is handy.
â€“Â DopeGhoti
Apr 11 at 19:14

I see it works very well, nice one, I take my comments back ..
â€“Â Bharat
Apr 11 at 19:50

add a commentÂ |Â

An awk statement that uses arrays to do the counting for you is handy.
â€“Â DopeGhoti
Apr 11 at 19:14

I see it works very well, nice one, I take my comments back ..
â€“Â Bharat
Apr 11 at 19:50

An awk statement that uses arrays to do the counting for you is handy.
â€“Â DopeGhoti
Apr 11 at 19:14

I see it works very well, nice one, I take my comments back ..
â€“Â Bharat
Apr 11 at 19:50

add a commentÂ |Â

up vote
3
down vote

$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00

Note that the output may not be sorted. Pass it through sort if needed.

In the alternative format:

If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):

$ cut -d ' ' -f 1,3 file | sort | uniq -c
 1 A -05:00
 2 A -06:00
 1 A -07:00
 2 B -05:00
 1 C -05:00
 3 C -06:00

To swap the two first columns:

$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:37

answered Apr 11 at 19:13

Kusalananda

102k13199316

it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â€“Â Bharat
Apr 11 at 19:21

Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â€“Â Bharat
Apr 11 at 20:36

add a commentÂ |Â

up vote
3
down vote

$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00

Note that the output may not be sorted. Pass it through sort if needed.

In the alternative format:

If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):

$ cut -d ' ' -f 1,3 file | sort | uniq -c
 1 A -05:00
 2 A -06:00
 1 A -07:00
 2 B -05:00
 1 C -05:00
 3 C -06:00

To swap the two first columns:

$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:37

answered Apr 11 at 19:13

Kusalananda

102k13199316

it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â€“Â Bharat
Apr 11 at 19:21

Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â€“Â Bharat
Apr 11 at 20:36

add a commentÂ |Â

up vote
3
down vote

$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00

Note that the output may not be sorted. Pass it through sort if needed.

In the alternative format:

If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):

$ cut -d ' ' -f 1,3 file | sort | uniq -c
 1 A -05:00
 2 A -06:00
 1 A -07:00
 2 B -05:00
 1 C -05:00
 3 C -06:00

To swap the two first columns:

$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:37

answered Apr 11 at 19:13

Kusalananda

102k13199316

$ awk ' count[$1,$3]++ END for (i in count) split(i, field, SUBSEP); printf("%s(%d)%s%sn", field[1], count[i], OFS, field[2]) ' file
A(1) -07:00
B(2) -05:00
A(2) -06:00
A(1) -05:00
C(3) -06:00
C(1) -05:00

Note that the output may not be sorted. Pass it through sort if needed.

In the alternative format:

If the input file uses a single space for field separator (otherwise use awk ' print $1,$3 ' instead of the cut):

$ cut -d ' ' -f 1,3 file | sort | uniq -c
 1 A -05:00
 2 A -06:00
 1 A -07:00
 2 B -05:00
 1 C -05:00
 3 C -06:00

To swap the two first columns:

$ cut -d ' ' -f 1,3 file | sort | uniq -c | awk ' print $2, $1, $3 '
A 1 -05:00
A 2 -06:00
A 1 -07:00
B 2 -05:00
C 1 -05:00
C 3 -06:00

edited Apr 11 at 19:37

answered Apr 11 at 19:13

Kusalananda

102k13199316

edited Apr 11 at 19:37

answered Apr 11 at 19:13

Kusalananda

102k13199316

answered Apr 11 at 19:13

Kusalananda

102k13199316

answered Apr 11 at 19:13

Kusalananda

102k13199316

it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â€“Â Bharat
Apr 11 at 19:21

Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â€“Â Bharat
Apr 11 at 20:36

add a commentÂ |Â

it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â€“Â Bharat
Apr 11 at 19:21

Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â€“Â Bharat
Apr 11 at 20:36

it does the job, but not quite handy, I would have to go back to find it out whenever needed, still looking ....
â€“Â Bharat
Apr 11 at 19:21

Nice, I end up with this myself, awk 'count[$1" "$3]++;END for (key in count) print key,count[key]' file .. but got some quite easy from here..
â€“Â Bharat
Apr 11 at 20:36

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu