Selectively retrieve portions of a large file if a condition is met

up vote
0
down vote

favorite

I have a large file with many sections like this:

 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
 Biol. Evol. 22:1107-1118)
 Positively selected sites (*: P>95%; **: P>99%)
 (amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

 Pr(w>1) post mean +- SE for w

 271 A 0.911 1.524 +- 0.000
 369 D 0.955* 1.467 +- 0.153
 492 S 0.916 1.439 +- 0.203



 The grid (...)

I need a command that says something like: if after "BEB" and before "The grid" there is a "*" or "**" right after a number, print that whole row and add what is after "(amino acids refer to 1st sequence:" and before ")" in a new column. For example:

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M

note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
 378 R 0.987* 2.323 +- 0.254

edited Nov 21 at 23:13

asked Nov 21 at 23:07

Manuel

1399

1

this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
– mosvy
Nov 21 at 23:29

Worked perfectly. Thanks @mosvy!
– Manuel
Nov 21 at 23:36

add a comment |

up vote
0
down vote

favorite

I have a large file with many sections like this:

 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
 Biol. Evol. 22:1107-1118)
 Positively selected sites (*: P>95%; **: P>99%)
 (amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

 Pr(w>1) post mean +- SE for w

 271 A 0.911 1.524 +- 0.000
 369 D 0.955* 1.467 +- 0.153
 492 S 0.916 1.439 +- 0.203



 The grid (...)

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M

note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
 378 R 0.987* 2.323 +- 0.254

edited Nov 21 at 23:13

asked Nov 21 at 23:07

Manuel

1399

1

this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
– mosvy
Nov 21 at 23:29

Worked perfectly. Thanks @mosvy!
– Manuel
Nov 21 at 23:36

add a comment |

up vote
0
down vote

favorite

I have a large file with many sections like this:

 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
 Biol. Evol. 22:1107-1118)
 Positively selected sites (*: P>95%; **: P>99%)
 (amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

 Pr(w>1) post mean +- SE for w

 271 A 0.911 1.524 +- 0.000
 369 D 0.955* 1.467 +- 0.153
 492 S 0.916 1.439 +- 0.203



 The grid (...)

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M

note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
 378 R 0.987* 2.323 +- 0.254

edited Nov 21 at 23:13

asked Nov 21 at 23:07

Manuel

1399

I have a large file with many sections like this:

 Bayes Empirical Bayes (BEB) analysis (Yang, Wong & Nielsen 2005. Mol.
 Biol. Evol. 22:1107-1118)
 Positively selected sites (*: P>95%; **: P>99%)
 (amino acids refer to 1st sequence: 33134_Pseudomonas_10M)

 Pr(w>1) post mean +- SE for w

 271 A 0.911 1.524 +- 0.000
 369 D 0.955* 1.467 +- 0.153
 492 S 0.916 1.439 +- 0.203



 The grid (...)

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M

note: if there were two rows with "*" and/or "**" on the same section, I only need the added text once, Example:

 369 D 0.955* 1.467 +- 0.153 33134_Pseudomonas_10M
 378 R 0.987* 2.323 +- 0.254

text-processing text-formatting bioinformatics

edited Nov 21 at 23:13

asked Nov 21 at 23:07

Manuel

1399

edited Nov 21 at 23:13

asked Nov 21 at 23:07

Manuel

1399

edited Nov 21 at 23:13

asked Nov 21 at 23:07

Manuel

1399

asked Nov 21 at 23:07

Manuel

1399

asked Nov 21 at 23:07

Manuel

1399

1

this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
– mosvy
Nov 21 at 23:29

Worked perfectly. Thanks @mosvy!
– Manuel
Nov 21 at 23:36

add a comment |

1

this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
– mosvy
Nov 21 at 23:29

Worked perfectly. Thanks @mosvy!
– Manuel
Nov 21 at 23:36

this could do it awk '/BEB/,/The grid/)/,"")==2) seq=$0; else if($3~/^[0-9.]+*+$/) print $0, seq; seq = "" ' the_file. but it's hard to know from your snippet. You should probably make the regexps (/BEB/, etc) more narrow.
– mosvy
Nov 21 at 23:29

Worked perfectly. Thanks @mosvy!
– Manuel
Nov 21 at 23:36

add a comment |

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483322%2fselectively-retrieve-portions-of-a-large-file-if-a-condition-is-met%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu