Extracting a single line containing the highest value in a given column from a text file from each consecutively numbered sub-group/family
Clash Royale CLAN TAG#URR8PPP
Within my text file, I would like to take the line containing the highest value present in column 3, from each consecutively numbered family (i.e. family_1, family_2 etc.) from column 2 and input these data into a new text file.
Input data:
TTGSCA family_1 18.123083 681 36349 1
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
CTYAAG family_2 16.95983 657 36170 1
.GCCAAR family_3 19.436863 698 35844 1
WGCCAA. family_3 19.99668 747 38506 1
.GCCAAS family_3 17.037859 599 31922 1
WGCCAA. family_3 19.99668 747 38506 1
CCACTK family_4 17.200712 776 44550 1
CCACTY family_4 18.86465 727 38616 1
MCACTT family_4 18.0871 737 40399 1
MCACTT family_4 18.0871 737 40399 1
YCACTT family_4 19.369513 804 43376 -1
CCAYTT family_4 16.193245 752 44296 1
CCAYTT family_4 16.193245 752 44296 1
SCACTT family_4 19.759317 687 34686 1
Output data:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
I'm not sure whether to use grep or awk, and how to combine these into a single function.
shell-script shell awk grep
add a comment |
Within my text file, I would like to take the line containing the highest value present in column 3, from each consecutively numbered family (i.e. family_1, family_2 etc.) from column 2 and input these data into a new text file.
Input data:
TTGSCA family_1 18.123083 681 36349 1
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
CTYAAG family_2 16.95983 657 36170 1
.GCCAAR family_3 19.436863 698 35844 1
WGCCAA. family_3 19.99668 747 38506 1
.GCCAAS family_3 17.037859 599 31922 1
WGCCAA. family_3 19.99668 747 38506 1
CCACTK family_4 17.200712 776 44550 1
CCACTY family_4 18.86465 727 38616 1
MCACTT family_4 18.0871 737 40399 1
MCACTT family_4 18.0871 737 40399 1
YCACTT family_4 19.369513 804 43376 -1
CCAYTT family_4 16.193245 752 44296 1
CCAYTT family_4 16.193245 752 44296 1
SCACTT family_4 19.759317 687 34686 1
Output data:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
I'm not sure whether to use grep or awk, and how to combine these into a single function.
shell-script shell awk grep
2
Please don't post images of text, as they are hard to read, will not allow us to use the data in our tests, use more bandwidth, and are no good for screen readers. Please edit your question and replace it with raw text. Welcome to U/L!
– Sparhawk
Jan 4 at 4:10
Thanks! Apologies, have now replaced it!
– Alex
Jan 4 at 4:16
No worries, thank you for the fix. I'm fairly sure I know what you want, but could you also please provide your expected output for your sample. Thanks!
– Sparhawk
Jan 4 at 4:17
Done! Cheers for the help (clearly new to this!!)
– Alex
Jan 4 at 4:21
grep
is no good for this;awk
is probably the perfect tool. If you search this site, you’ll find hundreds of questions very much like this. We encourage you to do that; find a working solution and try to adapt it to your problem. If you get stuck; edit your question to show what progress you made and what trouble you’re having. … … P.S. Since your data has “ties”, you should say how you want them broken.
– G-Man
Jan 4 at 4:25
add a comment |
Within my text file, I would like to take the line containing the highest value present in column 3, from each consecutively numbered family (i.e. family_1, family_2 etc.) from column 2 and input these data into a new text file.
Input data:
TTGSCA family_1 18.123083 681 36349 1
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
CTYAAG family_2 16.95983 657 36170 1
.GCCAAR family_3 19.436863 698 35844 1
WGCCAA. family_3 19.99668 747 38506 1
.GCCAAS family_3 17.037859 599 31922 1
WGCCAA. family_3 19.99668 747 38506 1
CCACTK family_4 17.200712 776 44550 1
CCACTY family_4 18.86465 727 38616 1
MCACTT family_4 18.0871 737 40399 1
MCACTT family_4 18.0871 737 40399 1
YCACTT family_4 19.369513 804 43376 -1
CCAYTT family_4 16.193245 752 44296 1
CCAYTT family_4 16.193245 752 44296 1
SCACTT family_4 19.759317 687 34686 1
Output data:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
I'm not sure whether to use grep or awk, and how to combine these into a single function.
shell-script shell awk grep
Within my text file, I would like to take the line containing the highest value present in column 3, from each consecutively numbered family (i.e. family_1, family_2 etc.) from column 2 and input these data into a new text file.
Input data:
TTGSCA family_1 18.123083 681 36349 1
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
CTYAAG family_2 16.95983 657 36170 1
.GCCAAR family_3 19.436863 698 35844 1
WGCCAA. family_3 19.99668 747 38506 1
.GCCAAS family_3 17.037859 599 31922 1
WGCCAA. family_3 19.99668 747 38506 1
CCACTK family_4 17.200712 776 44550 1
CCACTY family_4 18.86465 727 38616 1
MCACTT family_4 18.0871 737 40399 1
MCACTT family_4 18.0871 737 40399 1
YCACTT family_4 19.369513 804 43376 -1
CCAYTT family_4 16.193245 752 44296 1
CCAYTT family_4 16.193245 752 44296 1
SCACTT family_4 19.759317 687 34686 1
Output data:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
I'm not sure whether to use grep or awk, and how to combine these into a single function.
shell-script shell awk grep
shell-script shell awk grep
edited Jan 4 at 5:10
P_Yadav
1,6603924
1,6603924
asked Jan 4 at 4:06
AlexAlex
84
84
2
Please don't post images of text, as they are hard to read, will not allow us to use the data in our tests, use more bandwidth, and are no good for screen readers. Please edit your question and replace it with raw text. Welcome to U/L!
– Sparhawk
Jan 4 at 4:10
Thanks! Apologies, have now replaced it!
– Alex
Jan 4 at 4:16
No worries, thank you for the fix. I'm fairly sure I know what you want, but could you also please provide your expected output for your sample. Thanks!
– Sparhawk
Jan 4 at 4:17
Done! Cheers for the help (clearly new to this!!)
– Alex
Jan 4 at 4:21
grep
is no good for this;awk
is probably the perfect tool. If you search this site, you’ll find hundreds of questions very much like this. We encourage you to do that; find a working solution and try to adapt it to your problem. If you get stuck; edit your question to show what progress you made and what trouble you’re having. … … P.S. Since your data has “ties”, you should say how you want them broken.
– G-Man
Jan 4 at 4:25
add a comment |
2
Please don't post images of text, as they are hard to read, will not allow us to use the data in our tests, use more bandwidth, and are no good for screen readers. Please edit your question and replace it with raw text. Welcome to U/L!
– Sparhawk
Jan 4 at 4:10
Thanks! Apologies, have now replaced it!
– Alex
Jan 4 at 4:16
No worries, thank you for the fix. I'm fairly sure I know what you want, but could you also please provide your expected output for your sample. Thanks!
– Sparhawk
Jan 4 at 4:17
Done! Cheers for the help (clearly new to this!!)
– Alex
Jan 4 at 4:21
grep
is no good for this;awk
is probably the perfect tool. If you search this site, you’ll find hundreds of questions very much like this. We encourage you to do that; find a working solution and try to adapt it to your problem. If you get stuck; edit your question to show what progress you made and what trouble you’re having. … … P.S. Since your data has “ties”, you should say how you want them broken.
– G-Man
Jan 4 at 4:25
2
2
Please don't post images of text, as they are hard to read, will not allow us to use the data in our tests, use more bandwidth, and are no good for screen readers. Please edit your question and replace it with raw text. Welcome to U/L!
– Sparhawk
Jan 4 at 4:10
Please don't post images of text, as they are hard to read, will not allow us to use the data in our tests, use more bandwidth, and are no good for screen readers. Please edit your question and replace it with raw text. Welcome to U/L!
– Sparhawk
Jan 4 at 4:10
Thanks! Apologies, have now replaced it!
– Alex
Jan 4 at 4:16
Thanks! Apologies, have now replaced it!
– Alex
Jan 4 at 4:16
No worries, thank you for the fix. I'm fairly sure I know what you want, but could you also please provide your expected output for your sample. Thanks!
– Sparhawk
Jan 4 at 4:17
No worries, thank you for the fix. I'm fairly sure I know what you want, but could you also please provide your expected output for your sample. Thanks!
– Sparhawk
Jan 4 at 4:17
Done! Cheers for the help (clearly new to this!!)
– Alex
Jan 4 at 4:21
Done! Cheers for the help (clearly new to this!!)
– Alex
Jan 4 at 4:21
grep
is no good for this; awk
is probably the perfect tool. If you search this site, you’ll find hundreds of questions very much like this. We encourage you to do that; find a working solution and try to adapt it to your problem. If you get stuck; edit your question to show what progress you made and what trouble you’re having. … … P.S. Since your data has “ties”, you should say how you want them broken.– G-Man
Jan 4 at 4:25
grep
is no good for this; awk
is probably the perfect tool. If you search this site, you’ll find hundreds of questions very much like this. We encourage you to do that; find a working solution and try to adapt it to your problem. If you get stuck; edit your question to show what progress you made and what trouble you’re having. … … P.S. Since your data has “ties”, you should say how you want them broken.– G-Man
Jan 4 at 4:25
add a comment |
3 Answers
3
active
oldest
votes
With GNU datamash (and a little help from cut
):
$ datamash -Wf groupby 2 max 3 < file.txt | cut -f1-6
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
Thanks steeldriver! Datamash worked a treat.
– Alex
Jan 7 at 0:21
add a comment |
I think datamash
is probably the best tool, but here is a sort-unique alternative:
<infile sort -k2,2V -k3,3n | awk 'NR==1 || $2!=p; p=$2 '
add a comment |
Below is a cleaner way of getting the desired output than my previous answer. It does require sort
to be used twice but it's a lot better than having sort
, grep
, and tail
being used four times.
sort -k3r numbers | awk '!seen[$2]++' | sort -k2
Output:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492379%2fextracting-a-single-line-containing-the-highest-value-in-a-given-column-from-a-t%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
With GNU datamash (and a little help from cut
):
$ datamash -Wf groupby 2 max 3 < file.txt | cut -f1-6
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
Thanks steeldriver! Datamash worked a treat.
– Alex
Jan 7 at 0:21
add a comment |
With GNU datamash (and a little help from cut
):
$ datamash -Wf groupby 2 max 3 < file.txt | cut -f1-6
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
Thanks steeldriver! Datamash worked a treat.
– Alex
Jan 7 at 0:21
add a comment |
With GNU datamash (and a little help from cut
):
$ datamash -Wf groupby 2 max 3 < file.txt | cut -f1-6
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
With GNU datamash (and a little help from cut
):
$ datamash -Wf groupby 2 max 3 < file.txt | cut -f1-6
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
answered Jan 4 at 4:38
steeldriversteeldriver
35.5k35286
35.5k35286
Thanks steeldriver! Datamash worked a treat.
– Alex
Jan 7 at 0:21
add a comment |
Thanks steeldriver! Datamash worked a treat.
– Alex
Jan 7 at 0:21
Thanks steeldriver! Datamash worked a treat.
– Alex
Jan 7 at 0:21
Thanks steeldriver! Datamash worked a treat.
– Alex
Jan 7 at 0:21
add a comment |
I think datamash
is probably the best tool, but here is a sort-unique alternative:
<infile sort -k2,2V -k3,3n | awk 'NR==1 || $2!=p; p=$2 '
add a comment |
I think datamash
is probably the best tool, but here is a sort-unique alternative:
<infile sort -k2,2V -k3,3n | awk 'NR==1 || $2!=p; p=$2 '
add a comment |
I think datamash
is probably the best tool, but here is a sort-unique alternative:
<infile sort -k2,2V -k3,3n | awk 'NR==1 || $2!=p; p=$2 '
I think datamash
is probably the best tool, but here is a sort-unique alternative:
<infile sort -k2,2V -k3,3n | awk 'NR==1 || $2!=p; p=$2 '
answered Jan 4 at 9:37
ThorThor
11.7k13359
11.7k13359
add a comment |
add a comment |
Below is a cleaner way of getting the desired output than my previous answer. It does require sort
to be used twice but it's a lot better than having sort
, grep
, and tail
being used four times.
sort -k3r numbers | awk '!seen[$2]++' | sort -k2
Output:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
add a comment |
Below is a cleaner way of getting the desired output than my previous answer. It does require sort
to be used twice but it's a lot better than having sort
, grep
, and tail
being used four times.
sort -k3r numbers | awk '!seen[$2]++' | sort -k2
Output:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
add a comment |
Below is a cleaner way of getting the desired output than my previous answer. It does require sort
to be used twice but it's a lot better than having sort
, grep
, and tail
being used four times.
sort -k3r numbers | awk '!seen[$2]++' | sort -k2
Output:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
Below is a cleaner way of getting the desired output than my previous answer. It does require sort
to be used twice but it's a lot better than having sort
, grep
, and tail
being used four times.
sort -k3r numbers | awk '!seen[$2]++' | sort -k2
Output:
TTGSCA family_1 18.123083 681 36349 1
CTTRAG family_2 17.844843 685 37001 1
WGCCAA. family_3 19.99668 747 38506 1
SCACTT family_4 19.759317 687 34686 1
answered Jan 4 at 19:41
Nasir RileyNasir Riley
2,441249
2,441249
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492379%2fextracting-a-single-line-containing-the-highest-value-in-a-given-column-from-a-t%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Please don't post images of text, as they are hard to read, will not allow us to use the data in our tests, use more bandwidth, and are no good for screen readers. Please edit your question and replace it with raw text. Welcome to U/L!
– Sparhawk
Jan 4 at 4:10
Thanks! Apologies, have now replaced it!
– Alex
Jan 4 at 4:16
No worries, thank you for the fix. I'm fairly sure I know what you want, but could you also please provide your expected output for your sample. Thanks!
– Sparhawk
Jan 4 at 4:17
Done! Cheers for the help (clearly new to this!!)
– Alex
Jan 4 at 4:21
grep
is no good for this;awk
is probably the perfect tool. If you search this site, you’ll find hundreds of questions very much like this. We encourage you to do that; find a working solution and try to adapt it to your problem. If you get stuck; edit your question to show what progress you made and what trouble you’re having. … … P.S. Since your data has “ties”, you should say how you want them broken.– G-Man
Jan 4 at 4:25