For a set of line numbers …Extract content between first and last occurence of different patterns

up vote
1
down vote

favorite

I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.

Can feed all the required line #s

Extract the contents between the first occurence of and last occurence of </book>

Data:

</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p> 
<div><p>nothing !!!</p></div> 
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>

Input Line #s: 1, 2, 4 (Which I want to feed in the command)

Desired Output:

<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

asked Sep 19 '17 at 6:35

Samuel Finny

154

In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15

Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27

The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34

yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44

add a comment |

up vote
1
down vote

favorite

I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.

Can feed all the required line #s

Extract the contents between the first occurence of and last occurence of </book>

Data:

</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p> 
<div><p>nothing !!!</p></div> 
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>

Input Line #s: 1, 2, 4 (Which I want to feed in the command)

Desired Output:

<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

asked Sep 19 '17 at 6:35

Samuel Finny

154

In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15

Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27

The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34

yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44

add a comment |

up vote
1
down vote

favorite

I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.

Can feed all the required line #s

Extract the contents between the first occurence of and last occurence of </book>

Data:

</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p> 
<div><p>nothing !!!</p></div> 
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>

Input Line #s: 1, 2, 4 (Which I want to feed in the command)

Desired Output:

<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

asked Sep 19 '17 at 6:35

Samuel Finny

154

I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.

Can feed all the required line #s

Extract the contents between the first occurence of and last occurence of </book>

Data:

</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p> 
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p> 
<div><p>nothing !!!</p></div> 
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>

Input Line #s: 1, 2, 4 (Which I want to feed in the command)

Desired Output:

<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>

shell-script command-line

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

asked Sep 19 '17 at 6:35

Samuel Finny

154

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

asked Sep 19 '17 at 6:35

Samuel Finny

154

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

edited Nov 17 at 0:36

Rui F Ribeiro

38.2k1475123

asked Sep 19 '17 at 6:35

Samuel Finny

154

asked Sep 19 '17 at 6:35

Samuel Finny

154

asked Sep 19 '17 at 6:35

Samuel Finny

154

In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15

Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27

The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34

yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44

add a comment |

In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15

Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27

The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34

yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44

In your output text-indent: 0em; string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15

Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27

The input is not valid XML. If it were, you might try xmllint --xpath ... or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34

yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44

add a comment |

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

1) Extract specific lines

In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:

sed 3d file

But your file is probably more complicated, so a more general solution would be to do

sed -e 1b -e 2b -e 4b -e d file

So for each line that should be kept you jump to the end of the script with b and delete all remaining files.

For a longer list of line numbers you may want to generate the script:

sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file

But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do

sed '/<book>/!d'

2) extracting the contents

Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:

sed '/<book>/!d;s_<book>.*</book>_&_o'

But this won't work for you, so you need some more regex juggling:

sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file

If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):

sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file

answered Sep 19 '17 at 8:04

Philippos

5,98211547

First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53

add a comment |

up vote
1
down vote

With perl:

#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>) 
 next unless $. ~~ @lines;
 chomp;
 s#.*?(<book>.*</book>).*#$1#;
 say

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51

On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f393111%2ffor-a-set-of-line-numbers-extract-content-between-first-and-last-occurence-of%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

1) Extract specific lines

In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:

sed 3d file

But your file is probably more complicated, so a more general solution would be to do

sed -e 1b -e 2b -e 4b -e d file

So for each line that should be kept you jump to the end of the script with b and delete all remaining files.

For a longer list of line numbers you may want to generate the script:

sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file

But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do

sed '/<book>/!d'

2) extracting the contents

Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:

sed '/<book>/!d;s_<book>.*</book>_&_o'

But this won't work for you, so you need some more regex juggling:

sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file

If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):

sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file

answered Sep 19 '17 at 8:04

Philippos

5,98211547

First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53

add a comment |

up vote
3
down vote

accepted

1) Extract specific lines

In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:

sed 3d file

But your file is probably more complicated, so a more general solution would be to do

sed -e 1b -e 2b -e 4b -e d file

So for each line that should be kept you jump to the end of the script with b and delete all remaining files.

For a longer list of line numbers you may want to generate the script:

sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file

But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do

sed '/<book>/!d'

2) extracting the contents

Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:

sed '/<book>/!d;s_<book>.*</book>_&_o'

But this won't work for you, so you need some more regex juggling:

sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file

If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):

sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file

answered Sep 19 '17 at 8:04

Philippos

5,98211547

First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53

add a comment |

up vote
3
down vote

accepted

1) Extract specific lines

In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:

sed 3d file

But your file is probably more complicated, so a more general solution would be to do

sed -e 1b -e 2b -e 4b -e d file

So for each line that should be kept you jump to the end of the script with b and delete all remaining files.

For a longer list of line numbers you may want to generate the script:

sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file

But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do

sed '/<book>/!d'

2) extracting the contents

Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:

sed '/<book>/!d;s_<book>.*</book>_&_o'

But this won't work for you, so you need some more regex juggling:

sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file

If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):

sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file

answered Sep 19 '17 at 8:04

Philippos

5,98211547

1) Extract specific lines

In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:

sed 3d file

But your file is probably more complicated, so a more general solution would be to do

sed -e 1b -e 2b -e 4b -e d file

So for each line that should be kept you jump to the end of the script with b and delete all remaining files.

For a longer list of line numbers you may want to generate the script:

sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file

But it seems that it's not about the line numbers, but whether there are <book>s on that line. If this is true, you better forget about the line numbers and do

sed '/<book>/!d'

2) extracting the contents

Greedy * of regexp is not a friend for tasks like this. That's why my personal version of sed has an option o to the s command to replace only by the matched part:

sed '/<book>/!d;s_<book>.*</book>_&_o'

But this won't work for you, so you need some more regex juggling:

sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file

If your version of sed doesn't support n in the replacement string, use an actual newline (escaped by a backslash):

sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file

answered Sep 19 '17 at 8:04

Philippos

5,98211547

answered Sep 19 '17 at 8:04

Philippos

5,98211547

answered Sep 19 '17 at 8:04

Philippos

5,98211547

answered Sep 19 '17 at 8:04

Philippos

5,98211547

First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53

add a comment |

First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53

First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53

add a comment |

up vote
1
down vote

With perl:

#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>) 
 next unless $. ~~ @lines;
 chomp;
 s#.*?(<book>.*</book>).*#$1#;
 say

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51

On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10

add a comment |

up vote
1
down vote

With perl:

#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>) 
 next unless $. ~~ @lines;
 chomp;
 s#.*?(<book>.*</book>).*#$1#;
 say

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51

On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10

add a comment |

up vote
1
down vote

With perl:

#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>) 
 next unless $. ~~ @lines;
 chomp;
 s#.*?(<book>.*</book>).*#$1#;
 say

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

With perl:

#!/usr/bin/env perl

use strict;
use warnings;

use v5.10;

my @lines = (1, 2, 4);

while(<>) 
 next unless $. ~~ @lines;
 chomp;
 s#.*?(<book>.*</book>).*#$1#;
 say

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

answered Sep 19 '17 at 8:48

Satō Katsura

10.8k11534

@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51

On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10

add a comment |

@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51

On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10

@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51

On the command line or on stdin: perl script.pl file.html, or perl script.pl <file.html.
– Satō Katsura
Sep 19 '17 at 10:10

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu