How to remove content before a pattern in xml using unix
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
Source file example:
<HDR></HDR><b></b><c></c>
(XML file created in a single line)
OR
Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>
I need to remove all the content of the file before <b>
in both of the source format.
I tried using the below method
sed 's/^.*b/b/'
But this is not replacing it. Please let me know if there is an alternative way.
shell-script awk sed xml
add a comment |Â
up vote
0
down vote
favorite
Source file example:
<HDR></HDR><b></b><c></c>
(XML file created in a single line)
OR
Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>
I need to remove all the content of the file before <b>
in both of the source format.
I tried using the below method
sed 's/^.*b/b/'
But this is not replacing it. Please let me know if there is an alternative way.
shell-script awk sed xml
Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â simlev
Sep 3 at 7:01
2
The XML is not well formed as there is no root tag around the whole document.
â Kusalananda
Sep 3 at 7:02
I have a header in the XML which needs to be replaced with different header
â user7952074
Sep 3 at 7:14
It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â simlev
Sep 3 at 7:42
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Source file example:
<HDR></HDR><b></b><c></c>
(XML file created in a single line)
OR
Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>
I need to remove all the content of the file before <b>
in both of the source format.
I tried using the below method
sed 's/^.*b/b/'
But this is not replacing it. Please let me know if there is an alternative way.
shell-script awk sed xml
Source file example:
<HDR></HDR><b></b><c></c>
(XML file created in a single line)
OR
Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>
I need to remove all the content of the file before <b>
in both of the source format.
I tried using the below method
sed 's/^.*b/b/'
But this is not replacing it. Please let me know if there is an alternative way.
shell-script awk sed xml
shell-script awk sed xml
edited Sep 3 at 7:15
asked Sep 3 at 6:57
user7952074
335
335
Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â simlev
Sep 3 at 7:01
2
The XML is not well formed as there is no root tag around the whole document.
â Kusalananda
Sep 3 at 7:02
I have a header in the XML which needs to be replaced with different header
â user7952074
Sep 3 at 7:14
It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â simlev
Sep 3 at 7:42
add a comment |Â
Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â simlev
Sep 3 at 7:01
2
The XML is not well formed as there is no root tag around the whole document.
â Kusalananda
Sep 3 at 7:02
I have a header in the XML which needs to be replaced with different header
â user7952074
Sep 3 at 7:14
It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â simlev
Sep 3 at 7:42
Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â simlev
Sep 3 at 7:01
Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â simlev
Sep 3 at 7:01
2
2
The XML is not well formed as there is no root tag around the whole document.
â Kusalananda
Sep 3 at 7:02
The XML is not well formed as there is no root tag around the whole document.
â Kusalananda
Sep 3 at 7:02
I have a header in the XML which needs to be replaced with different header
â user7952074
Sep 3 at 7:14
I have a header in the XML which needs to be replaced with different header
â user7952074
Sep 3 at 7:14
It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â simlev
Sep 3 at 7:42
It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â simlev
Sep 3 at 7:42
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
5
down vote
Assuming your XML document is well formed, like
<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>
Then you may use XMLStarlet to remove all HDR
tags like so:
xmlstarlet ed -d '//HDR' file.xml >newfile.xml
To only remove the HDR
tags that are immediately followed by a b
tag:
xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml
XMLStarlet may also be used to modify the contents of tags:
$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
<HDR>New header value</HDR>
<b/>
<c/>
</document>
$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
<HDR new_attribute="hello"/>
<b/>
<c/>
</document>
3
It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â simlev
Sep 3 at 7:47
This command is not added to our version, we can't ask the admins to install it. Thanks !!
â user7952074
Sep 3 at 12:20
add a comment |Â
up vote
1
down vote
Type 1:
echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/'
<b></b><c></c>
- will replace everything up to
<b>
with<b>
Type 2:
sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>
- will print the first occurrence of
<b>
to end of the file ($).
add a comment |Â
up vote
1
down vote
Question:
remove all contents of the file before
<b>
Answer:
perl -0777 -lape 's/^.*<b>/<b>/s'
Test run:
==> in1.txt <==
<HDR></HDR><b></b><c></c>
==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>
$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt
==> in1.txt <==
<b></b><c></c>
==> in2.txt <==
<b>
</b>
<c>
</c>
I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
â user7952074
Sep 3 at 7:25
@user7952074 that means there's a<b>
tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â simlev
Sep 3 at 7:39
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
Assuming your XML document is well formed, like
<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>
Then you may use XMLStarlet to remove all HDR
tags like so:
xmlstarlet ed -d '//HDR' file.xml >newfile.xml
To only remove the HDR
tags that are immediately followed by a b
tag:
xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml
XMLStarlet may also be used to modify the contents of tags:
$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
<HDR>New header value</HDR>
<b/>
<c/>
</document>
$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
<HDR new_attribute="hello"/>
<b/>
<c/>
</document>
3
It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â simlev
Sep 3 at 7:47
This command is not added to our version, we can't ask the admins to install it. Thanks !!
â user7952074
Sep 3 at 12:20
add a comment |Â
up vote
5
down vote
Assuming your XML document is well formed, like
<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>
Then you may use XMLStarlet to remove all HDR
tags like so:
xmlstarlet ed -d '//HDR' file.xml >newfile.xml
To only remove the HDR
tags that are immediately followed by a b
tag:
xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml
XMLStarlet may also be used to modify the contents of tags:
$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
<HDR>New header value</HDR>
<b/>
<c/>
</document>
$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
<HDR new_attribute="hello"/>
<b/>
<c/>
</document>
3
It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â simlev
Sep 3 at 7:47
This command is not added to our version, we can't ask the admins to install it. Thanks !!
â user7952074
Sep 3 at 12:20
add a comment |Â
up vote
5
down vote
up vote
5
down vote
Assuming your XML document is well formed, like
<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>
Then you may use XMLStarlet to remove all HDR
tags like so:
xmlstarlet ed -d '//HDR' file.xml >newfile.xml
To only remove the HDR
tags that are immediately followed by a b
tag:
xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml
XMLStarlet may also be used to modify the contents of tags:
$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
<HDR>New header value</HDR>
<b/>
<c/>
</document>
$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
<HDR new_attribute="hello"/>
<b/>
<c/>
</document>
Assuming your XML document is well formed, like
<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>
Then you may use XMLStarlet to remove all HDR
tags like so:
xmlstarlet ed -d '//HDR' file.xml >newfile.xml
To only remove the HDR
tags that are immediately followed by a b
tag:
xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml
XMLStarlet may also be used to modify the contents of tags:
$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
<HDR>New header value</HDR>
<b/>
<c/>
</document>
$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
<HDR new_attribute="hello"/>
<b/>
<c/>
</document>
edited Sep 3 at 11:44
answered Sep 3 at 7:22
Kusalananda
107k14209331
107k14209331
3
It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â simlev
Sep 3 at 7:47
This command is not added to our version, we can't ask the admins to install it. Thanks !!
â user7952074
Sep 3 at 12:20
add a comment |Â
3
It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â simlev
Sep 3 at 7:47
This command is not added to our version, we can't ask the admins to install it. Thanks !!
â user7952074
Sep 3 at 12:20
3
3
It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â simlev
Sep 3 at 7:47
It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â simlev
Sep 3 at 7:47
This command is not added to our version, we can't ask the admins to install it. Thanks !!
â user7952074
Sep 3 at 12:20
This command is not added to our version, we can't ask the admins to install it. Thanks !!
â user7952074
Sep 3 at 12:20
add a comment |Â
up vote
1
down vote
Type 1:
echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/'
<b></b><c></c>
- will replace everything up to
<b>
with<b>
Type 2:
sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>
- will print the first occurrence of
<b>
to end of the file ($).
add a comment |Â
up vote
1
down vote
Type 1:
echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/'
<b></b><c></c>
- will replace everything up to
<b>
with<b>
Type 2:
sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>
- will print the first occurrence of
<b>
to end of the file ($).
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Type 1:
echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/'
<b></b><c></c>
- will replace everything up to
<b>
with<b>
Type 2:
sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>
- will print the first occurrence of
<b>
to end of the file ($).
Type 1:
echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/'
<b></b><c></c>
- will replace everything up to
<b>
with<b>
Type 2:
sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>
- will print the first occurrence of
<b>
to end of the file ($).
answered Sep 3 at 7:27
msp9011
3,46643862
3,46643862
add a comment |Â
add a comment |Â
up vote
1
down vote
Question:
remove all contents of the file before
<b>
Answer:
perl -0777 -lape 's/^.*<b>/<b>/s'
Test run:
==> in1.txt <==
<HDR></HDR><b></b><c></c>
==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>
$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt
==> in1.txt <==
<b></b><c></c>
==> in2.txt <==
<b>
</b>
<c>
</c>
I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
â user7952074
Sep 3 at 7:25
@user7952074 that means there's a<b>
tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â simlev
Sep 3 at 7:39
add a comment |Â
up vote
1
down vote
Question:
remove all contents of the file before
<b>
Answer:
perl -0777 -lape 's/^.*<b>/<b>/s'
Test run:
==> in1.txt <==
<HDR></HDR><b></b><c></c>
==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>
$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt
==> in1.txt <==
<b></b><c></c>
==> in2.txt <==
<b>
</b>
<c>
</c>
I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
â user7952074
Sep 3 at 7:25
@user7952074 that means there's a<b>
tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â simlev
Sep 3 at 7:39
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Question:
remove all contents of the file before
<b>
Answer:
perl -0777 -lape 's/^.*<b>/<b>/s'
Test run:
==> in1.txt <==
<HDR></HDR><b></b><c></c>
==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>
$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt
==> in1.txt <==
<b></b><c></c>
==> in2.txt <==
<b>
</b>
<c>
</c>
Question:
remove all contents of the file before
<b>
Answer:
perl -0777 -lape 's/^.*<b>/<b>/s'
Test run:
==> in1.txt <==
<HDR></HDR><b></b><c></c>
==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>
$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt
==> in1.txt <==
<b></b><c></c>
==> in2.txt <==
<b>
</b>
<c>
</c>
edited Sep 3 at 7:31
answered Sep 3 at 7:15
simlev
552114
552114
I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
â user7952074
Sep 3 at 7:25
@user7952074 that means there's a<b>
tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â simlev
Sep 3 at 7:39
add a comment |Â
I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
â user7952074
Sep 3 at 7:25
@user7952074 that means there's a<b>
tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â simlev
Sep 3 at 7:39
I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
â user7952074
Sep 3 at 7:25
I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
â user7952074
Sep 3 at 7:25
@user7952074 that means there's a
<b>
tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.â simlev
Sep 3 at 7:39
@user7952074 that means there's a
<b>
tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.â simlev
Sep 3 at 7:39
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f466503%2fhow-to-remove-content-before-a-pattern-in-xml-using-unix%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â simlev
Sep 3 at 7:01
2
The XML is not well formed as there is no root tag around the whole document.
â Kusalananda
Sep 3 at 7:02
I have a header in the XML which needs to be replaced with different header
â user7952074
Sep 3 at 7:14
It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â simlev
Sep 3 at 7:42