Selectively retrieve portions of a large file if a condition is met

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a large file with many sections like this:



 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
Biol. Evol. 22:1107-1118)
Positively selected sites (*: P>95%; **: P>99%)
(amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

Pr(w>1) post mean +- SE for w

271 A 0.911 1.524 +- 0.000
369 D 0.955* 1.467 +- 0.153
492 S 0.916 1.439 +- 0.203



The grid (...)


I need a command that says something like: if after "BEB" and before "The grid" there is a "*" or "**" right after a number, print that whole row and add what is after "(amino acids refer to 1st sequence:" and before ")" in a new column. For example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M


note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
378 R 0.987* 2.323 +- 0.254









share|improve this question



















  • 1




    this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
    – mosvy
    Nov 21 at 23:29











  • Worked perfectly. Thanks @mosvy!
    – Manuel
    Nov 21 at 23:36














up vote
0
down vote

favorite












I have a large file with many sections like this:



 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
Biol. Evol. 22:1107-1118)
Positively selected sites (*: P>95%; **: P>99%)
(amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

Pr(w>1) post mean +- SE for w

271 A 0.911 1.524 +- 0.000
369 D 0.955* 1.467 +- 0.153
492 S 0.916 1.439 +- 0.203



The grid (...)


I need a command that says something like: if after "BEB" and before "The grid" there is a "*" or "**" right after a number, print that whole row and add what is after "(amino acids refer to 1st sequence:" and before ")" in a new column. For example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M


note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
378 R 0.987* 2.323 +- 0.254









share|improve this question



















  • 1




    this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
    – mosvy
    Nov 21 at 23:29











  • Worked perfectly. Thanks @mosvy!
    – Manuel
    Nov 21 at 23:36












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a large file with many sections like this:



 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
Biol. Evol. 22:1107-1118)
Positively selected sites (*: P>95%; **: P>99%)
(amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

Pr(w>1) post mean +- SE for w

271 A 0.911 1.524 +- 0.000
369 D 0.955* 1.467 +- 0.153
492 S 0.916 1.439 +- 0.203



The grid (...)


I need a command that says something like: if after "BEB" and before "The grid" there is a "*" or "**" right after a number, print that whole row and add what is after "(amino acids refer to 1st sequence:" and before ")" in a new column. For example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M


note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
378 R 0.987* 2.323 +- 0.254









share|improve this question















I have a large file with many sections like this:



 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
Biol. Evol. 22:1107-1118)
Positively selected sites (*: P>95%; **: P>99%)
(amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

Pr(w>1) post mean +- SE for w

271 A 0.911 1.524 +- 0.000
369 D 0.955* 1.467 +- 0.153
492 S 0.916 1.439 +- 0.203



The grid (...)


I need a command that says something like: if after "BEB" and before "The grid" there is a "*" or "**" right after a number, print that whole row and add what is after "(amino acids refer to 1st sequence:" and before ")" in a new column. For example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M


note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:



 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
378 R 0.987* 2.323 +- 0.254






text-processing text-formatting bioinformatics






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 at 23:13

























asked Nov 21 at 23:07









Manuel

1399




1399







  • 1




    this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
    – mosvy
    Nov 21 at 23:29











  • Worked perfectly. Thanks @mosvy!
    – Manuel
    Nov 21 at 23:36












  • 1




    this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
    – mosvy
    Nov 21 at 23:29











  • Worked perfectly. Thanks @mosvy!
    – Manuel
    Nov 21 at 23:36







1




1




this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
– mosvy
Nov 21 at 23:29





this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
– mosvy
Nov 21 at 23:29













Worked perfectly. Thanks @mosvy!
– Manuel
Nov 21 at 23:36




Worked perfectly. Thanks @mosvy!
– Manuel
Nov 21 at 23:36















active

oldest

votes











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483322%2fselectively-retrieve-portions-of-a-large-file-if-a-condition-is-met%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483322%2fselectively-retrieve-portions-of-a-large-file-if-a-condition-is-met%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

Peggy Mitchell

Palaiologos

The Forum (Inglewood, California)