For a set of line numbers …Extract content between first and last occurence of different patterns

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.



  1. Can feed all the required line #s

  2. Extract the contents between the first occurence of and last occurence of </book>

Data:



</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>


Input Line #s: 1, 2, 4 (Which I want to feed in the command)



Desired Output:



<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>









share|improve this question























  • In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
    – Egor Vasilyev
    Sep 19 '17 at 7:15










  • Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
    – Samuel Finny
    Sep 19 '17 at 7:27










  • The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
    – Olaf Dietsche
    Sep 19 '17 at 7:34










  • yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
    – Samuel Finny
    Sep 19 '17 at 7:44














up vote
1
down vote

favorite












I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.



  1. Can feed all the required line #s

  2. Extract the contents between the first occurence of and last occurence of </book>

Data:



</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>


Input Line #s: 1, 2, 4 (Which I want to feed in the command)



Desired Output:



<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>









share|improve this question























  • In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
    – Egor Vasilyev
    Sep 19 '17 at 7:15










  • Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
    – Samuel Finny
    Sep 19 '17 at 7:27










  • The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
    – Olaf Dietsche
    Sep 19 '17 at 7:34










  • yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
    – Samuel Finny
    Sep 19 '17 at 7:44












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.



  1. Can feed all the required line #s

  2. Extract the contents between the first occurence of and last occurence of </book>

Data:



</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>


Input Line #s: 1, 2, 4 (Which I want to feed in the command)



Desired Output:



<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>









share|improve this question















I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.



  1. Can feed all the required line #s

  2. Extract the contents between the first occurence of and last occurence of </book>

Data:



</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>


Input Line #s: 1, 2, 4 (Which I want to feed in the command)



Desired Output:



<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>






shell-script command-line






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 17 at 0:36









Rui F Ribeiro

38.2k1475123




38.2k1475123










asked Sep 19 '17 at 6:35









Samuel Finny

154




154











  • In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
    – Egor Vasilyev
    Sep 19 '17 at 7:15










  • Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
    – Samuel Finny
    Sep 19 '17 at 7:27










  • The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
    – Olaf Dietsche
    Sep 19 '17 at 7:34










  • yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
    – Samuel Finny
    Sep 19 '17 at 7:44
















  • In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
    – Egor Vasilyev
    Sep 19 '17 at 7:15










  • Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
    – Samuel Finny
    Sep 19 '17 at 7:27










  • The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
    – Olaf Dietsche
    Sep 19 '17 at 7:34










  • yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
    – Samuel Finny
    Sep 19 '17 at 7:44















In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15




In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15












Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27




Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27












The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34




The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34












yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44




yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44










2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










1) Extract specific lines



In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:



sed 3d file


But your file is probably more complicated, so a more general solution would be to do



sed -e 1b -e 2b -e 4b -e d file


So for each line that should be kept you jump to the end of the script with b and delete all remaining files.



For a longer list of line numbers you may want to generate the script:



sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file


But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do



sed '/<book>/!d' 


2) extracting the contents



Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:



sed '/<book>/!d;s_<book>.*</book>_&_o' 


But this won't work for you, so you need some more regex juggling:



sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file


If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):



sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file





share|improve this answer




















  • First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
    – Samuel Finny
    Sep 19 '17 at 9:53


















up vote
1
down vote













With perl:



#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say






share|improve this answer




















  • @Sato . . Please guide me on where will I provide the input file. Thanks.
    – Samuel Finny
    Sep 19 '17 at 9:51










  • On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
    – Satō Katsura
    Sep 19 '17 at 10:10










Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f393111%2ffor-a-set-of-line-numbers-extract-content-between-first-and-last-occurence-of%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










1) Extract specific lines



In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:



sed 3d file


But your file is probably more complicated, so a more general solution would be to do



sed -e 1b -e 2b -e 4b -e d file


So for each line that should be kept you jump to the end of the script with b and delete all remaining files.



For a longer list of line numbers you may want to generate the script:



sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file


But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do



sed '/<book>/!d' 


2) extracting the contents



Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:



sed '/<book>/!d;s_<book>.*</book>_&_o' 


But this won't work for you, so you need some more regex juggling:



sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file


If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):



sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file





share|improve this answer




















  • First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
    – Samuel Finny
    Sep 19 '17 at 9:53















up vote
3
down vote



accepted










1) Extract specific lines



In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:



sed 3d file


But your file is probably more complicated, so a more general solution would be to do



sed -e 1b -e 2b -e 4b -e d file


So for each line that should be kept you jump to the end of the script with b and delete all remaining files.



For a longer list of line numbers you may want to generate the script:



sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file


But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do



sed '/<book>/!d' 


2) extracting the contents



Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:



sed '/<book>/!d;s_<book>.*</book>_&_o' 


But this won't work for you, so you need some more regex juggling:



sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file


If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):



sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file





share|improve this answer




















  • First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
    – Samuel Finny
    Sep 19 '17 at 9:53













up vote
3
down vote



accepted







up vote
3
down vote



accepted






1) Extract specific lines



In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:



sed 3d file


But your file is probably more complicated, so a more general solution would be to do



sed -e 1b -e 2b -e 4b -e d file


So for each line that should be kept you jump to the end of the script with b and delete all remaining files.



For a longer list of line numbers you may want to generate the script:



sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file


But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do



sed '/<book>/!d' 


2) extracting the contents



Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:



sed '/<book>/!d;s_<book>.*</book>_&_o' 


But this won't work for you, so you need some more regex juggling:



sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file


If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):



sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file





share|improve this answer












1) Extract specific lines



In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:



sed 3d file


But your file is probably more complicated, so a more general solution would be to do



sed -e 1b -e 2b -e 4b -e d file


So for each line that should be kept you jump to the end of the script with b and delete all remaining files.



For a longer list of line numbers you may want to generate the script:



sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file


But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do



sed '/<book>/!d' 


2) extracting the contents



Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:



sed '/<book>/!d;s_<book>.*</book>_&_o' 


But this won't work for you, so you need some more regex juggling:



sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file


If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):



sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file






share|improve this answer












share|improve this answer



share|improve this answer










answered Sep 19 '17 at 8:04









Philippos

5,98211547




5,98211547











  • First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
    – Samuel Finny
    Sep 19 '17 at 9:53

















  • First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
    – Samuel Finny
    Sep 19 '17 at 9:53
















First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53





First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53













up vote
1
down vote













With perl:



#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say






share|improve this answer




















  • @Sato . . Please guide me on where will I provide the input file. Thanks.
    – Samuel Finny
    Sep 19 '17 at 9:51










  • On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
    – Satō Katsura
    Sep 19 '17 at 10:10














up vote
1
down vote













With perl:



#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say






share|improve this answer




















  • @Sato . . Please guide me on where will I provide the input file. Thanks.
    – Samuel Finny
    Sep 19 '17 at 9:51










  • On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
    – Satō Katsura
    Sep 19 '17 at 10:10












up vote
1
down vote










up vote
1
down vote









With perl:



#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say






share|improve this answer












With perl:



#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say







share|improve this answer












share|improve this answer



share|improve this answer










answered Sep 19 '17 at 8:48









Satō Katsura

10.8k11534




10.8k11534











  • @Sato . . Please guide me on where will I provide the input file. Thanks.
    – Samuel Finny
    Sep 19 '17 at 9:51










  • On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
    – Satō Katsura
    Sep 19 '17 at 10:10
















  • @Sato . . Please guide me on where will I provide the input file. Thanks.
    – Samuel Finny
    Sep 19 '17 at 9:51










  • On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
    – Satō Katsura
    Sep 19 '17 at 10:10















@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51




@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51












On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10




On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f393111%2ffor-a-set-of-line-numbers-extract-content-between-first-and-last-occurence-of%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Nur Jahan