Listed Frequency of Different Strings in a Particular Column
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I need to figure it out how many times a particular string shows up in column 4.
This is my data:
25 48656721 48656734 FAM132B ENSCAFT00000019683 4 0.51
X 53969937 53969950 FAM155B ENSCAFT00000026508 5 0.57
3 42203721 42203906 FAM169B ENSCAFT00000017307 5 0.54
36 28947780 28947831 FAM171B ENSCAFT00000046981 5 0.51
10 45080519 45080773 FAM171B ENSCAFT00000003744 9 -0.53
3 61627122 61627446 FAM193A ENSCAFT00000023571 13 0.64
3 61626373 61626466 FAM193A ENSCAFT00000023571 6 0.51
15 55348822 55349196 FAM193A ENSCAFT00000045012 5 0.52
This is a portion of my data. So, I'd want the output to be:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
And so on - for the rest of my data. What's a command that would work?
shell command-line text-processing uniq
|
show 2 more comments
up vote
2
down vote
favorite
I need to figure it out how many times a particular string shows up in column 4.
This is my data:
25 48656721 48656734 FAM132B ENSCAFT00000019683 4 0.51
X 53969937 53969950 FAM155B ENSCAFT00000026508 5 0.57
3 42203721 42203906 FAM169B ENSCAFT00000017307 5 0.54
36 28947780 28947831 FAM171B ENSCAFT00000046981 5 0.51
10 45080519 45080773 FAM171B ENSCAFT00000003744 9 -0.53
3 61627122 61627446 FAM193A ENSCAFT00000023571 13 0.64
3 61626373 61626466 FAM193A ENSCAFT00000023571 6 0.51
15 55348822 55349196 FAM193A ENSCAFT00000045012 5 0.52
This is a portion of my data. So, I'd want the output to be:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
And so on - for the rest of my data. What's a command that would work?
shell command-line text-processing uniq
Are you looking for a dynamic count on all data? IE You need to know how many times each occurance of an entry appears, you don't know how many different types of entries there may be, or what t hose entries may be? Or do you have a set number of potential entries that you are aware of, and want a count of those known entries?
– Gravy
Sep 23 '15 at 17:55
I only see two "FAM193A" in your sample data? And, do you care if the output is sorted by column 4?
– Jeff Schaller
Sep 23 '15 at 17:57
@Gravy My data consists of 2066 lines. Above I just have 8 sample lines.
– Justin
Sep 23 '15 at 18:09
@JeffSchaller You're absolutely right! That was a mistake on my part. I've edited it now. Thanks! And yes I would like it sorted by column 4
– Justin
Sep 23 '15 at 18:09
@Justin use sort -k -k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syn‐ tax below
– vfbsilva
Sep 23 '15 at 18:16
|
show 2 more comments
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I need to figure it out how many times a particular string shows up in column 4.
This is my data:
25 48656721 48656734 FAM132B ENSCAFT00000019683 4 0.51
X 53969937 53969950 FAM155B ENSCAFT00000026508 5 0.57
3 42203721 42203906 FAM169B ENSCAFT00000017307 5 0.54
36 28947780 28947831 FAM171B ENSCAFT00000046981 5 0.51
10 45080519 45080773 FAM171B ENSCAFT00000003744 9 -0.53
3 61627122 61627446 FAM193A ENSCAFT00000023571 13 0.64
3 61626373 61626466 FAM193A ENSCAFT00000023571 6 0.51
15 55348822 55349196 FAM193A ENSCAFT00000045012 5 0.52
This is a portion of my data. So, I'd want the output to be:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
And so on - for the rest of my data. What's a command that would work?
shell command-line text-processing uniq
I need to figure it out how many times a particular string shows up in column 4.
This is my data:
25 48656721 48656734 FAM132B ENSCAFT00000019683 4 0.51
X 53969937 53969950 FAM155B ENSCAFT00000026508 5 0.57
3 42203721 42203906 FAM169B ENSCAFT00000017307 5 0.54
36 28947780 28947831 FAM171B ENSCAFT00000046981 5 0.51
10 45080519 45080773 FAM171B ENSCAFT00000003744 9 -0.53
3 61627122 61627446 FAM193A ENSCAFT00000023571 13 0.64
3 61626373 61626466 FAM193A ENSCAFT00000023571 6 0.51
15 55348822 55349196 FAM193A ENSCAFT00000045012 5 0.52
This is a portion of my data. So, I'd want the output to be:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
And so on - for the rest of my data. What's a command that would work?
shell command-line text-processing uniq
shell command-line text-processing uniq
edited Nov 17 at 20:23
Rui F Ribeiro
38.2k1475123
38.2k1475123
asked Sep 23 '15 at 17:47
Justin
155
155
Are you looking for a dynamic count on all data? IE You need to know how many times each occurance of an entry appears, you don't know how many different types of entries there may be, or what t hose entries may be? Or do you have a set number of potential entries that you are aware of, and want a count of those known entries?
– Gravy
Sep 23 '15 at 17:55
I only see two "FAM193A" in your sample data? And, do you care if the output is sorted by column 4?
– Jeff Schaller
Sep 23 '15 at 17:57
@Gravy My data consists of 2066 lines. Above I just have 8 sample lines.
– Justin
Sep 23 '15 at 18:09
@JeffSchaller You're absolutely right! That was a mistake on my part. I've edited it now. Thanks! And yes I would like it sorted by column 4
– Justin
Sep 23 '15 at 18:09
@Justin use sort -k -k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syn‐ tax below
– vfbsilva
Sep 23 '15 at 18:16
|
show 2 more comments
Are you looking for a dynamic count on all data? IE You need to know how many times each occurance of an entry appears, you don't know how many different types of entries there may be, or what t hose entries may be? Or do you have a set number of potential entries that you are aware of, and want a count of those known entries?
– Gravy
Sep 23 '15 at 17:55
I only see two "FAM193A" in your sample data? And, do you care if the output is sorted by column 4?
– Jeff Schaller
Sep 23 '15 at 17:57
@Gravy My data consists of 2066 lines. Above I just have 8 sample lines.
– Justin
Sep 23 '15 at 18:09
@JeffSchaller You're absolutely right! That was a mistake on my part. I've edited it now. Thanks! And yes I would like it sorted by column 4
– Justin
Sep 23 '15 at 18:09
@Justin use sort -k -k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syn‐ tax below
– vfbsilva
Sep 23 '15 at 18:16
Are you looking for a dynamic count on all data? IE You need to know how many times each occurance of an entry appears, you don't know how many different types of entries there may be, or what t hose entries may be? Or do you have a set number of potential entries that you are aware of, and want a count of those known entries?
– Gravy
Sep 23 '15 at 17:55
Are you looking for a dynamic count on all data? IE You need to know how many times each occurance of an entry appears, you don't know how many different types of entries there may be, or what t hose entries may be? Or do you have a set number of potential entries that you are aware of, and want a count of those known entries?
– Gravy
Sep 23 '15 at 17:55
I only see two "FAM193A" in your sample data? And, do you care if the output is sorted by column 4?
– Jeff Schaller
Sep 23 '15 at 17:57
I only see two "FAM193A" in your sample data? And, do you care if the output is sorted by column 4?
– Jeff Schaller
Sep 23 '15 at 17:57
@Gravy My data consists of 2066 lines. Above I just have 8 sample lines.
– Justin
Sep 23 '15 at 18:09
@Gravy My data consists of 2066 lines. Above I just have 8 sample lines.
– Justin
Sep 23 '15 at 18:09
@JeffSchaller You're absolutely right! That was a mistake on my part. I've edited it now. Thanks! And yes I would like it sorted by column 4
– Justin
Sep 23 '15 at 18:09
@JeffSchaller You're absolutely right! That was a mistake on my part. I've edited it now. Thanks! And yes I would like it sorted by column 4
– Justin
Sep 23 '15 at 18:09
@Justin use sort -k -k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syn‐ tax below
– vfbsilva
Sep 23 '15 at 18:16
@Justin use sort -k -k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syn‐ tax below
– vfbsilva
Sep 23 '15 at 18:16
|
show 2 more comments
3 Answers
3
active
oldest
votes
up vote
2
down vote
accepted
One simplistic solution would be to use awk
to pull column 4; uniq -c
to count them; and another sort
to put them in order by the second column (the old column 4 data):
awk 'print $4' < data | uniq -c | sort -k2
On your (updated) sample input, this gives:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
Ooo, this one works great as well! Thank you for that and your explanation!
– Justin
Sep 23 '15 at 18:26
@Jeff Schaller I messed up here can you give me a hand? I was with: sort -k4 | awk 'print $4, $3,$2,$1' | uniq -c How did you get the first field as a counter?
– vfbsilva
Sep 23 '15 at 18:28
@vfbsilva you are including columns 3, 2, and 1 in your awk output when you should not; doing so changes uniq's input (and thus output)
– Jeff Schaller
Sep 23 '15 at 18:38
add a comment |
up vote
1
down vote
Use awk
:
awk 'a[$4]++ ENDfor(s in a)print a[s]" "s' file
a[$4]++
increments the array element whose index has the name of the 4th column. When finishing trough the file, that array contains counters of all occurences of the 4th column.END
: indicates a block of code that runs when awk is trough the file.for(s in a)
run trough the array...print a[s]" "s}
... and print its values and indexes.
The output:
1 FAM169B
3 FAM193A
1 FAM132B
1 FAM155B
2 FAM171B
print a[s], s
seen better
– Costas
Sep 23 '15 at 19:57
add a comment |
up vote
0
down vote
Assuming the delimiter is a single space:
cut -d' ' -f4 infile | sort | uniq -c
Note that uniq
filters adjacent matching lines so you need to sort
first e.g. with this input:
FAM193A
FAM155B
FAM169B
FAM171B
FAM132B
FAM193A
FAM132A
FAM132B
FAM155B
FAM169B
FAM171B
FAM171A
FAM193A
FAM132A
using sort | uniq -c
produces:
2 FAM132A
2 FAM132B
2 FAM155B
2 FAM169B
1 FAM171A
2 FAM171B
3 FAM193A
while uniq -c | sort -k2
produces:
1 FAM132A
1 FAM132A
1 FAM132B
1 FAM132B
1 FAM155B
1 FAM155B
1 FAM169B
1 FAM169B
1 FAM171A
1 FAM171B
1 FAM171B
1 FAM193A
1 FAM193A
1 FAM193A
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
One simplistic solution would be to use awk
to pull column 4; uniq -c
to count them; and another sort
to put them in order by the second column (the old column 4 data):
awk 'print $4' < data | uniq -c | sort -k2
On your (updated) sample input, this gives:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
Ooo, this one works great as well! Thank you for that and your explanation!
– Justin
Sep 23 '15 at 18:26
@Jeff Schaller I messed up here can you give me a hand? I was with: sort -k4 | awk 'print $4, $3,$2,$1' | uniq -c How did you get the first field as a counter?
– vfbsilva
Sep 23 '15 at 18:28
@vfbsilva you are including columns 3, 2, and 1 in your awk output when you should not; doing so changes uniq's input (and thus output)
– Jeff Schaller
Sep 23 '15 at 18:38
add a comment |
up vote
2
down vote
accepted
One simplistic solution would be to use awk
to pull column 4; uniq -c
to count them; and another sort
to put them in order by the second column (the old column 4 data):
awk 'print $4' < data | uniq -c | sort -k2
On your (updated) sample input, this gives:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
Ooo, this one works great as well! Thank you for that and your explanation!
– Justin
Sep 23 '15 at 18:26
@Jeff Schaller I messed up here can you give me a hand? I was with: sort -k4 | awk 'print $4, $3,$2,$1' | uniq -c How did you get the first field as a counter?
– vfbsilva
Sep 23 '15 at 18:28
@vfbsilva you are including columns 3, 2, and 1 in your awk output when you should not; doing so changes uniq's input (and thus output)
– Jeff Schaller
Sep 23 '15 at 18:38
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
One simplistic solution would be to use awk
to pull column 4; uniq -c
to count them; and another sort
to put them in order by the second column (the old column 4 data):
awk 'print $4' < data | uniq -c | sort -k2
On your (updated) sample input, this gives:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
One simplistic solution would be to use awk
to pull column 4; uniq -c
to count them; and another sort
to put them in order by the second column (the old column 4 data):
awk 'print $4' < data | uniq -c | sort -k2
On your (updated) sample input, this gives:
1 FAM132B
1 FAM155B
1 FAM169B
2 FAM171B
3 FAM193A
answered Sep 23 '15 at 18:22
Jeff Schaller
36.2k952119
36.2k952119
Ooo, this one works great as well! Thank you for that and your explanation!
– Justin
Sep 23 '15 at 18:26
@Jeff Schaller I messed up here can you give me a hand? I was with: sort -k4 | awk 'print $4, $3,$2,$1' | uniq -c How did you get the first field as a counter?
– vfbsilva
Sep 23 '15 at 18:28
@vfbsilva you are including columns 3, 2, and 1 in your awk output when you should not; doing so changes uniq's input (and thus output)
– Jeff Schaller
Sep 23 '15 at 18:38
add a comment |
Ooo, this one works great as well! Thank you for that and your explanation!
– Justin
Sep 23 '15 at 18:26
@Jeff Schaller I messed up here can you give me a hand? I was with: sort -k4 | awk 'print $4, $3,$2,$1' | uniq -c How did you get the first field as a counter?
– vfbsilva
Sep 23 '15 at 18:28
@vfbsilva you are including columns 3, 2, and 1 in your awk output when you should not; doing so changes uniq's input (and thus output)
– Jeff Schaller
Sep 23 '15 at 18:38
Ooo, this one works great as well! Thank you for that and your explanation!
– Justin
Sep 23 '15 at 18:26
Ooo, this one works great as well! Thank you for that and your explanation!
– Justin
Sep 23 '15 at 18:26
@Jeff Schaller I messed up here can you give me a hand? I was with: sort -k4 | awk 'print $4, $3,$2,$1' | uniq -c How did you get the first field as a counter?
– vfbsilva
Sep 23 '15 at 18:28
@Jeff Schaller I messed up here can you give me a hand? I was with: sort -k4 | awk 'print $4, $3,$2,$1' | uniq -c How did you get the first field as a counter?
– vfbsilva
Sep 23 '15 at 18:28
@vfbsilva you are including columns 3, 2, and 1 in your awk output when you should not; doing so changes uniq's input (and thus output)
– Jeff Schaller
Sep 23 '15 at 18:38
@vfbsilva you are including columns 3, 2, and 1 in your awk output when you should not; doing so changes uniq's input (and thus output)
– Jeff Schaller
Sep 23 '15 at 18:38
add a comment |
up vote
1
down vote
Use awk
:
awk 'a[$4]++ ENDfor(s in a)print a[s]" "s' file
a[$4]++
increments the array element whose index has the name of the 4th column. When finishing trough the file, that array contains counters of all occurences of the 4th column.END
: indicates a block of code that runs when awk is trough the file.for(s in a)
run trough the array...print a[s]" "s}
... and print its values and indexes.
The output:
1 FAM169B
3 FAM193A
1 FAM132B
1 FAM155B
2 FAM171B
print a[s], s
seen better
– Costas
Sep 23 '15 at 19:57
add a comment |
up vote
1
down vote
Use awk
:
awk 'a[$4]++ ENDfor(s in a)print a[s]" "s' file
a[$4]++
increments the array element whose index has the name of the 4th column. When finishing trough the file, that array contains counters of all occurences of the 4th column.END
: indicates a block of code that runs when awk is trough the file.for(s in a)
run trough the array...print a[s]" "s}
... and print its values and indexes.
The output:
1 FAM169B
3 FAM193A
1 FAM132B
1 FAM155B
2 FAM171B
print a[s], s
seen better
– Costas
Sep 23 '15 at 19:57
add a comment |
up vote
1
down vote
up vote
1
down vote
Use awk
:
awk 'a[$4]++ ENDfor(s in a)print a[s]" "s' file
a[$4]++
increments the array element whose index has the name of the 4th column. When finishing trough the file, that array contains counters of all occurences of the 4th column.END
: indicates a block of code that runs when awk is trough the file.for(s in a)
run trough the array...print a[s]" "s}
... and print its values and indexes.
The output:
1 FAM169B
3 FAM193A
1 FAM132B
1 FAM155B
2 FAM171B
Use awk
:
awk 'a[$4]++ ENDfor(s in a)print a[s]" "s' file
a[$4]++
increments the array element whose index has the name of the 4th column. When finishing trough the file, that array contains counters of all occurences of the 4th column.END
: indicates a block of code that runs when awk is trough the file.for(s in a)
run trough the array...print a[s]" "s}
... and print its values and indexes.
The output:
1 FAM169B
3 FAM193A
1 FAM132B
1 FAM155B
2 FAM171B
answered Sep 23 '15 at 18:15
chaos
34.7k772115
34.7k772115
print a[s], s
seen better
– Costas
Sep 23 '15 at 19:57
add a comment |
print a[s], s
seen better
– Costas
Sep 23 '15 at 19:57
print a[s], s
seen better– Costas
Sep 23 '15 at 19:57
print a[s], s
seen better– Costas
Sep 23 '15 at 19:57
add a comment |
up vote
0
down vote
Assuming the delimiter is a single space:
cut -d' ' -f4 infile | sort | uniq -c
Note that uniq
filters adjacent matching lines so you need to sort
first e.g. with this input:
FAM193A
FAM155B
FAM169B
FAM171B
FAM132B
FAM193A
FAM132A
FAM132B
FAM155B
FAM169B
FAM171B
FAM171A
FAM193A
FAM132A
using sort | uniq -c
produces:
2 FAM132A
2 FAM132B
2 FAM155B
2 FAM169B
1 FAM171A
2 FAM171B
3 FAM193A
while uniq -c | sort -k2
produces:
1 FAM132A
1 FAM132A
1 FAM132B
1 FAM132B
1 FAM155B
1 FAM155B
1 FAM169B
1 FAM169B
1 FAM171A
1 FAM171B
1 FAM171B
1 FAM193A
1 FAM193A
1 FAM193A
add a comment |
up vote
0
down vote
Assuming the delimiter is a single space:
cut -d' ' -f4 infile | sort | uniq -c
Note that uniq
filters adjacent matching lines so you need to sort
first e.g. with this input:
FAM193A
FAM155B
FAM169B
FAM171B
FAM132B
FAM193A
FAM132A
FAM132B
FAM155B
FAM169B
FAM171B
FAM171A
FAM193A
FAM132A
using sort | uniq -c
produces:
2 FAM132A
2 FAM132B
2 FAM155B
2 FAM169B
1 FAM171A
2 FAM171B
3 FAM193A
while uniq -c | sort -k2
produces:
1 FAM132A
1 FAM132A
1 FAM132B
1 FAM132B
1 FAM155B
1 FAM155B
1 FAM169B
1 FAM169B
1 FAM171A
1 FAM171B
1 FAM171B
1 FAM193A
1 FAM193A
1 FAM193A
add a comment |
up vote
0
down vote
up vote
0
down vote
Assuming the delimiter is a single space:
cut -d' ' -f4 infile | sort | uniq -c
Note that uniq
filters adjacent matching lines so you need to sort
first e.g. with this input:
FAM193A
FAM155B
FAM169B
FAM171B
FAM132B
FAM193A
FAM132A
FAM132B
FAM155B
FAM169B
FAM171B
FAM171A
FAM193A
FAM132A
using sort | uniq -c
produces:
2 FAM132A
2 FAM132B
2 FAM155B
2 FAM169B
1 FAM171A
2 FAM171B
3 FAM193A
while uniq -c | sort -k2
produces:
1 FAM132A
1 FAM132A
1 FAM132B
1 FAM132B
1 FAM155B
1 FAM155B
1 FAM169B
1 FAM169B
1 FAM171A
1 FAM171B
1 FAM171B
1 FAM193A
1 FAM193A
1 FAM193A
Assuming the delimiter is a single space:
cut -d' ' -f4 infile | sort | uniq -c
Note that uniq
filters adjacent matching lines so you need to sort
first e.g. with this input:
FAM193A
FAM155B
FAM169B
FAM171B
FAM132B
FAM193A
FAM132A
FAM132B
FAM155B
FAM169B
FAM171B
FAM171A
FAM193A
FAM132A
using sort | uniq -c
produces:
2 FAM132A
2 FAM132B
2 FAM155B
2 FAM169B
1 FAM171A
2 FAM171B
3 FAM193A
while uniq -c | sort -k2
produces:
1 FAM132A
1 FAM132A
1 FAM132B
1 FAM132B
1 FAM155B
1 FAM155B
1 FAM169B
1 FAM169B
1 FAM171A
1 FAM171B
1 FAM171B
1 FAM193A
1 FAM193A
1 FAM193A
answered Sep 23 '15 at 18:55
community wiki
don_crissti
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f231630%2flisted-frequency-of-different-strings-in-a-particular-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you looking for a dynamic count on all data? IE You need to know how many times each occurance of an entry appears, you don't know how many different types of entries there may be, or what t hose entries may be? Or do you have a set number of potential entries that you are aware of, and want a count of those known entries?
– Gravy
Sep 23 '15 at 17:55
I only see two "FAM193A" in your sample data? And, do you care if the output is sorted by column 4?
– Jeff Schaller
Sep 23 '15 at 17:57
@Gravy My data consists of 2066 lines. Above I just have 8 sample lines.
– Justin
Sep 23 '15 at 18:09
@JeffSchaller You're absolutely right! That was a mistake on my part. I've edited it now. Thanks! And yes I would like it sorted by column 4
– Justin
Sep 23 '15 at 18:09
@Justin use sort -k -k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line). See POS syn‐ tax below
– vfbsilva
Sep 23 '15 at 18:16