For a set of line numbers …Extract content between first and last occurence of different patterns
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.
- Can feed all the required line #s
- Extract the contents between the first occurence of and last occurence of
</book>
Data:
</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p>
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>
Input Line #s: 1, 2, 4 (Which I want to feed in the command)
Desired Output:
<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>
shell-script command-line
add a comment |
up vote
1
down vote
favorite
I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.
- Can feed all the required line #s
- Extract the contents between the first occurence of and last occurence of
</book>
Data:
</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p>
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>
Input Line #s: 1, 2, 4 (Which I want to feed in the command)
Desired Output:
<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>
shell-script command-line
In your outputtext-indent: 0em;
string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15
Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27
The input is not valid XML. If it were, you might tryxmllint --xpath ...
or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34
yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.
- Can feed all the required line #s
- Extract the contents between the first occurence of and last occurence of
</book>
Data:
</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p>
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>
Input Line #s: 1, 2, 4 (Which I want to feed in the command)
Desired Output:
<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>
shell-script command-line
I have a similar content in a file. I have a list of line numbers with me say 1,2, 4.
- Can feed all the required line #s
- Extract the contents between the first occurence of and last occurence of
</book>
Data:
</p><p>abc</p></book><book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book><book><div><p>
</div><p>123</p></book><book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book><book><div><p>
<div><p>nothing !!!</p></div>
</p><p>ABC</p></book><book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book><div>
Input Line #s: 1, 2, 4 (Which I want to feed in the command)
Desired Output:
<book><p style="text-indent:0em;">def</p></book><book><p>ghi</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">456</p><p>789</p><p style="text-indent:0em;"></book>
<book><p style="text-indent:0em;">DEF</p></book><book><p>GHI</p><p style="text-indent:0em;"></book><book><div><p>JKL</p></div></book>
shell-script command-line
shell-script command-line
edited Nov 17 at 0:36
Rui F Ribeiro
38.2k1475123
38.2k1475123
asked Sep 19 '17 at 6:35
Samuel Finny
154
154
In your outputtext-indent: 0em;
string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15
Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27
The input is not valid XML. If it were, you might tryxmllint --xpath ...
or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34
yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44
add a comment |
In your outputtext-indent: 0em;
string was occurs once in first and second line, but once in the third line. Can you explain why?
– Egor Vasilyev
Sep 19 '17 at 7:15
Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27
The input is not valid XML. If it were, you might tryxmllint --xpath ...
or some similar tool.
– Olaf Dietsche
Sep 19 '17 at 7:34
yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44
In your output
text-indent: 0em;
string was occurs once in first and second line, but once in the third line. Can you explain why?– Egor Vasilyev
Sep 19 '17 at 7:15
In your output
text-indent: 0em;
string was occurs once in first and second line, but once in the third line. Can you explain why?– Egor Vasilyev
Sep 19 '17 at 7:15
Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27
Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27
The input is not valid XML. If it were, you might try
xmllint --xpath ...
or some similar tool.– Olaf Dietsche
Sep 19 '17 at 7:34
The input is not valid XML. If it were, you might try
xmllint --xpath ...
or some similar tool.– Olaf Dietsche
Sep 19 '17 at 7:34
yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44
yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44
add a comment |
2 Answers
2
active
oldest
votes
up vote
3
down vote
accepted
1) Extract specific lines
In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:
sed 3d file
But your file is probably more complicated, so a more general solution would be to do
sed -e 1b -e 2b -e 4b -e d file
So for each line that should be kept you jump to the end of the script with b
and delete all remaining files.
For a longer list of line numbers you may want to generate the script:
sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file
But it seems that it's not about the line numbers, but whether there are <book>
s on that line. If this is true, you better forget about the line numbers and do
sed '/<book>/!d'
2) extracting the contents
Greedy *
of regexp is not a friend for tasks like this. That's why my personal version of sed
has an option o
to the s
command to replace o
nly by the matched part:
sed '/<book>/!d;s_<book>.*</book>_&_o'
But this won't work for you, so you need some more regex juggling:
sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file
If your version of sed
doesn't support n
in the replacement string, use an actual newline (escaped by a backslash):
sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file
First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53
add a comment |
up vote
1
down vote
With perl
:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
my @lines = (1, 2, 4);
while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say
@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51
On the command line or onstdin
:perl script.pl file.html
, orperl script.pl <file.html
.
– Satō Katsura
Sep 19 '17 at 10:10
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
1) Extract specific lines
In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:
sed 3d file
But your file is probably more complicated, so a more general solution would be to do
sed -e 1b -e 2b -e 4b -e d file
So for each line that should be kept you jump to the end of the script with b
and delete all remaining files.
For a longer list of line numbers you may want to generate the script:
sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file
But it seems that it's not about the line numbers, but whether there are <book>
s on that line. If this is true, you better forget about the line numbers and do
sed '/<book>/!d'
2) extracting the contents
Greedy *
of regexp is not a friend for tasks like this. That's why my personal version of sed
has an option o
to the s
command to replace o
nly by the matched part:
sed '/<book>/!d;s_<book>.*</book>_&_o'
But this won't work for you, so you need some more regex juggling:
sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file
If your version of sed
doesn't support n
in the replacement string, use an actual newline (escaped by a backslash):
sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file
First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53
add a comment |
up vote
3
down vote
accepted
1) Extract specific lines
In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:
sed 3d file
But your file is probably more complicated, so a more general solution would be to do
sed -e 1b -e 2b -e 4b -e d file
So for each line that should be kept you jump to the end of the script with b
and delete all remaining files.
For a longer list of line numbers you may want to generate the script:
sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file
But it seems that it's not about the line numbers, but whether there are <book>
s on that line. If this is true, you better forget about the line numbers and do
sed '/<book>/!d'
2) extracting the contents
Greedy *
of regexp is not a friend for tasks like this. That's why my personal version of sed
has an option o
to the s
command to replace o
nly by the matched part:
sed '/<book>/!d;s_<book>.*</book>_&_o'
But this won't work for you, so you need some more regex juggling:
sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file
If your version of sed
doesn't support n
in the replacement string, use an actual newline (escaped by a backslash):
sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file
First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
1) Extract specific lines
In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:
sed 3d file
But your file is probably more complicated, so a more general solution would be to do
sed -e 1b -e 2b -e 4b -e d file
So for each line that should be kept you jump to the end of the script with b
and delete all remaining files.
For a longer list of line numbers you may want to generate the script:
sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file
But it seems that it's not about the line numbers, but whether there are <book>
s on that line. If this is true, you better forget about the line numbers and do
sed '/<book>/!d'
2) extracting the contents
Greedy *
of regexp is not a friend for tasks like this. That's why my personal version of sed
has an option o
to the s
command to replace o
nly by the matched part:
sed '/<book>/!d;s_<book>.*</book>_&_o'
But this won't work for you, so you need some more regex juggling:
sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file
If your version of sed
doesn't support n
in the replacement string, use an actual newline (escaped by a backslash):
sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file
1) Extract specific lines
In your four-line example to extract the 1st, 2nd and 4th line would be easy by deleting the 3rd line:
sed 3d file
But your file is probably more complicated, so a more general solution would be to do
sed -e 1b -e 2b -e 4b -e d file
So for each line that should be kept you jump to the end of the script with b
and delete all remaining files.
For a longer list of line numbers you may want to generate the script:
sed $(for i in 1 2 4; do echo "-e $ib"; done) -e d file
But it seems that it's not about the line numbers, but whether there are <book>
s on that line. If this is true, you better forget about the line numbers and do
sed '/<book>/!d'
2) extracting the contents
Greedy *
of regexp is not a friend for tasks like this. That's why my personal version of sed
has an option o
to the s
command to replace o
nly by the matched part:
sed '/<book>/!d;s_<book>.*</book>_&_o'
But this won't work for you, so you need some more regex juggling:
sed '/<book>/!d;s_<book>_n&_;s_.*n__;s_(.*</book>).*_1_' file
If your version of sed
doesn't support n
in the replacement string, use an actual newline (escaped by a backslash):
sed '/<book>/!d;s_<book>_
&_;s_.*n__;s_(.*</book>).*_1_' file
answered Sep 19 '17 at 8:04
Philippos
5,98211547
5,98211547
First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53
add a comment |
First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53
First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53
First of all Appreciate you for the clear and neat presentation of Answer. I am trying out more examples and let you know soon. Thanks again.
– Samuel Finny
Sep 19 '17 at 9:53
add a comment |
up vote
1
down vote
With perl
:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
my @lines = (1, 2, 4);
while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say
@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51
On the command line or onstdin
:perl script.pl file.html
, orperl script.pl <file.html
.
– Satō Katsura
Sep 19 '17 at 10:10
add a comment |
up vote
1
down vote
With perl
:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
my @lines = (1, 2, 4);
while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say
@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51
On the command line or onstdin
:perl script.pl file.html
, orperl script.pl <file.html
.
– Satō Katsura
Sep 19 '17 at 10:10
add a comment |
up vote
1
down vote
up vote
1
down vote
With perl
:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
my @lines = (1, 2, 4);
while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say
With perl
:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
my @lines = (1, 2, 4);
while(<>)
next unless $. ~~ @lines;
chomp;
s#.*?(<book>.*</book>).*#$1#;
say
answered Sep 19 '17 at 8:48
Satō Katsura
10.8k11534
10.8k11534
@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51
On the command line or onstdin
:perl script.pl file.html
, orperl script.pl <file.html
.
– Satō Katsura
Sep 19 '17 at 10:10
add a comment |
@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51
On the command line or onstdin
:perl script.pl file.html
, orperl script.pl <file.html
.
– Satō Katsura
Sep 19 '17 at 10:10
@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51
@Sato . . Please guide me on where will I provide the input file. Thanks.
– Samuel Finny
Sep 19 '17 at 9:51
On the command line or on
stdin
: perl script.pl file.html
, or perl script.pl <file.html
.– Satō Katsura
Sep 19 '17 at 10:10
On the command line or on
stdin
: perl script.pl file.html
, or perl script.pl <file.html
.– Satō Katsura
Sep 19 '17 at 10:10
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f393111%2ffor-a-set-of-line-numbers-extract-content-between-first-and-last-occurence-of%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
In your output
text-indent: 0em;
string was occurs once in first and second line, but once in the third line. Can you explain why?– Egor Vasilyev
Sep 19 '17 at 7:15
Thanks Egor for pointing out the error. Its now corrected. Pls look at the updated desired output.
– Samuel Finny
Sep 19 '17 at 7:27
The input is not valid XML. If it were, you might try
xmllint --xpath ...
or some similar tool.– Olaf Dietsche
Sep 19 '17 at 7:34
yes @Olaf . We can't expect a valid exml in each line in input. But inside <book> it will be a properly formed xml. Hence using a shell command, want to fetch between first occurence of <book> and last occurence of </book>. After that, I need to apply XSLT.
– Samuel Finny
Sep 19 '17 at 7:44