How to remove content before a pattern in xml using unix

up vote
0
down vote

favorite

Source file example:
<HDR></HDR><b></b><c></c>

(XML file created in a single line)

Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>

I need to remove all the content of the file before  in both of the source format.
I tried using the below method

sed 's/^.*b/b/'

But this is not replacing it. Please let me know if there is an alternative way.

edited Sep 3 at 7:15

asked Sep 3 at 6:57

user7952074

335

Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â€“Â simlev
Sep 3 at 7:01

2

The XML is not well formed as there is no root tag around the whole document.
â€“Â Kusalananda
Sep 3 at 7:02

I have a header in the XML which needs to be replaced with different header
â€“Â user7952074
Sep 3 at 7:14

It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â€“Â simlev
Sep 3 at 7:42

add a commentÂ |Â

up vote
0
down vote

favorite

Source file example:
<HDR></HDR><b></b><c></c>

(XML file created in a single line)

Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>

I need to remove all the content of the file before  in both of the source format.
I tried using the below method

sed 's/^.*b/b/'

But this is not replacing it. Please let me know if there is an alternative way.

edited Sep 3 at 7:15

asked Sep 3 at 6:57

user7952074

335

Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â€“Â simlev
Sep 3 at 7:01

2

The XML is not well formed as there is no root tag around the whole document.
â€“Â Kusalananda
Sep 3 at 7:02

I have a header in the XML which needs to be replaced with different header
â€“Â user7952074
Sep 3 at 7:14

It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â€“Â simlev
Sep 3 at 7:42

add a commentÂ |Â

up vote
0
down vote

favorite

Source file example:
<HDR></HDR><b></b><c></c>

(XML file created in a single line)

Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>

I need to remove all the content of the file before  in both of the source format.
I tried using the below method

sed 's/^.*b/b/'

But this is not replacing it. Please let me know if there is an alternative way.

edited Sep 3 at 7:15

asked Sep 3 at 6:57

user7952074

335

Source file example:
<HDR></HDR><b></b><c></c>

(XML file created in a single line)

Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>

I need to remove all the content of the file before  in both of the source format.
I tried using the below method

sed 's/^.*b/b/'

But this is not replacing it. Please let me know if there is an alternative way.

shell-script awk sed xml

edited Sep 3 at 7:15

asked Sep 3 at 6:57

user7952074

335

edited Sep 3 at 7:15

asked Sep 3 at 6:57

user7952074

335

edited Sep 3 at 7:15

asked Sep 3 at 6:57

user7952074

335

asked Sep 3 at 6:57

user7952074

335

asked Sep 3 at 6:57

user7952074

335

Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â€“Â simlev
Sep 3 at 7:01

2

The XML is not well formed as there is no root tag around the whole document.
â€“Â Kusalananda
Sep 3 at 7:02

I have a header in the XML which needs to be replaced with different header
â€“Â user7952074
Sep 3 at 7:14

It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â€“Â simlev
Sep 3 at 7:42

add a commentÂ |Â

Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â€“Â simlev
Sep 3 at 7:01

2

The XML is not well formed as there is no root tag around the whole document.
â€“Â Kusalananda
Sep 3 at 7:02

I have a header in the XML which needs to be replaced with different header
â€“Â user7952074
Sep 3 at 7:14

It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â€“Â simlev
Sep 3 at 7:42

Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
â€“Â simlev
Sep 3 at 7:01

The XML is not well formed as there is no root tag around the whole document.
â€“Â Kusalananda
Sep 3 at 7:02

I have a header in the XML which needs to be replaced with different header
â€“Â user7952074
Sep 3 at 7:14

It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
â€“Â simlev
Sep 3 at 7:42

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
5
down vote

Assuming your XML document is well formed, like

<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>

Then you may use XMLStarlet to remove all HDR tags like so:

xmlstarlet ed -d '//HDR' file.xml >newfile.xml

To only remove the HDR tags that are immediately followed by a b tag:

xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml

XMLStarlet may also be used to modify the contents of tags:

$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
 <HDR>New header value</HDR>
 <b/>
 <c/>
</document>

$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
 <HDR new_attribute="hello"/>
 <b/>
 <c/>
</document>

edited Sep 3 at 11:44

answered Sep 3 at 7:22

Kusalananda

107k14209331

3

It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â€“Â simlev
Sep 3 at 7:47

This command is not added to our version, we can't ask the admins to install it. Thanks !!
â€“Â user7952074
Sep 3 at 12:20

add a commentÂ |Â

up vote
1
down vote

Type 1:

 echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
 <b></b><c></c>

will replace everything up to  with

Type 2:

sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>

will print the first occurrence of  to end of the file ($).

answered Sep 3 at 7:27

msp9011

3,46643862

add a commentÂ |Â

up vote
1
down vote

Question:

remove all contents of the file before 

Answer:

perl -0777 -lape 's/^.*<b>/<b>/s'

Test run:

==> in1.txt <==
<HDR></HDR><b></b><c></c>

==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>

$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

==> in1.txt <==
<b></b><c></c>

==> in2.txt <==
<b>
</b>
<c>
</c>

edited Sep 3 at 7:31

answered Sep 3 at 7:15

simlev

552114

I tried this , It removed the every content after the tag as well, now there tag alone in the new file
â€“Â user7952074
Sep 3 at 7:25

@user7952074 that means there's a  tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â€“Â simlev
Sep 3 at 7:39

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f466503%2fhow-to-remove-content-before-a-pattern-in-xml-using-unix%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
5
down vote

Assuming your XML document is well formed, like

<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>

Then you may use XMLStarlet to remove all HDR tags like so:

xmlstarlet ed -d '//HDR' file.xml >newfile.xml

To only remove the HDR tags that are immediately followed by a b tag:

xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml

XMLStarlet may also be used to modify the contents of tags:

$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
 <HDR>New header value</HDR>
 <b/>
 <c/>
</document>

$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
 <HDR new_attribute="hello"/>
 <b/>
 <c/>
</document>

edited Sep 3 at 11:44

answered Sep 3 at 7:22

Kusalananda

107k14209331

3

It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â€“Â simlev
Sep 3 at 7:47

This command is not added to our version, we can't ask the admins to install it. Thanks !!
â€“Â user7952074
Sep 3 at 12:20

add a commentÂ |Â

up vote
5
down vote

Assuming your XML document is well formed, like

<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>

Then you may use XMLStarlet to remove all HDR tags like so:

xmlstarlet ed -d '//HDR' file.xml >newfile.xml

To only remove the HDR tags that are immediately followed by a b tag:

xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml

XMLStarlet may also be used to modify the contents of tags:

$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
 <HDR>New header value</HDR>
 <b/>
 <c/>
</document>

$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
 <HDR new_attribute="hello"/>
 <b/>
 <c/>
</document>

edited Sep 3 at 11:44

answered Sep 3 at 7:22

Kusalananda

107k14209331

3

It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â€“Â simlev
Sep 3 at 7:47

This command is not added to our version, we can't ask the admins to install it. Thanks !!
â€“Â user7952074
Sep 3 at 12:20

add a commentÂ |Â

up vote
5
down vote

Assuming your XML document is well formed, like

<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>

Then you may use XMLStarlet to remove all HDR tags like so:

xmlstarlet ed -d '//HDR' file.xml >newfile.xml

To only remove the HDR tags that are immediately followed by a b tag:

xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml

XMLStarlet may also be used to modify the contents of tags:

$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
 <HDR>New header value</HDR>
 <b/>
 <c/>
</document>

$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
 <HDR new_attribute="hello"/>
 <b/>
 <c/>
</document>

edited Sep 3 at 11:44

answered Sep 3 at 7:22

Kusalananda

107k14209331

Assuming your XML document is well formed, like

<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>

Then you may use XMLStarlet to remove all HDR tags like so:

xmlstarlet ed -d '//HDR' file.xml >newfile.xml

To only remove the HDR tags that are immediately followed by a b tag:

xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml

XMLStarlet may also be used to modify the contents of tags:

$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
 <HDR>New header value</HDR>
 <b/>
 <c/>
</document>

$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
 <HDR new_attribute="hello"/>
 <b/>
 <c/>
</document>

edited Sep 3 at 11:44

answered Sep 3 at 7:22

Kusalananda

107k14209331

edited Sep 3 at 11:44

answered Sep 3 at 7:22

Kusalananda

107k14209331

answered Sep 3 at 7:22

Kusalananda

107k14209331

answered Sep 3 at 7:22

Kusalananda

107k14209331

3

It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â€“Â simlev
Sep 3 at 7:47

This command is not added to our version, we can't ask the admins to install it. Thanks !!
â€“Â user7952074
Sep 3 at 12:20

add a commentÂ |Â

3

It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â€“Â simlev
Sep 3 at 7:47

This command is not added to our version, we can't ask the admins to install it. Thanks !!
â€“Â user7952074
Sep 3 at 12:20

It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
â€“Â simlev
Sep 3 at 7:47

This command is not added to our version, we can't ask the admins to install it. Thanks !!
â€“Â user7952074
Sep 3 at 12:20

add a commentÂ |Â

up vote
1
down vote

Type 1:

 echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
 <b></b><c></c>

will replace everything up to  with

Type 2:

sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>

will print the first occurrence of  to end of the file ($).

answered Sep 3 at 7:27

msp9011

3,46643862

add a commentÂ |Â

up vote
1
down vote

Type 1:

 echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
 <b></b><c></c>

will replace everything up to  with

Type 2:

sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>

will print the first occurrence of  to end of the file ($).

answered Sep 3 at 7:27

msp9011

3,46643862

add a commentÂ |Â

up vote
1
down vote

Type 1:

 echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
 <b></b><c></c>

will replace everything up to  with

Type 2:

sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>

will print the first occurrence of  to end of the file ($).

answered Sep 3 at 7:27

msp9011

3,46643862

Type 1:

 echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
 <b></b><c></c>

will replace everything up to  with

Type 2:

sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>

will print the first occurrence of  to end of the file ($).

answered Sep 3 at 7:27

msp9011

3,46643862

answered Sep 3 at 7:27

msp9011

3,46643862

answered Sep 3 at 7:27

msp9011

3,46643862

answered Sep 3 at 7:27

msp9011

3,46643862

add a commentÂ |Â

up vote
1
down vote

Question:

remove all contents of the file before 

Answer:

perl -0777 -lape 's/^.*<b>/<b>/s'

Test run:

==> in1.txt <==
<HDR></HDR><b></b><c></c>

==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>

$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

==> in1.txt <==
<b></b><c></c>

==> in2.txt <==
<b>
</b>
<c>
</c>

edited Sep 3 at 7:31

answered Sep 3 at 7:15

simlev

552114

I tried this , It removed the every content after the tag as well, now there tag alone in the new file
â€“Â user7952074
Sep 3 at 7:25

@user7952074 that means there's a  tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â€“Â simlev
Sep 3 at 7:39

add a commentÂ |Â

up vote
1
down vote

Question:

remove all contents of the file before 

Answer:

perl -0777 -lape 's/^.*<b>/<b>/s'

Test run:

==> in1.txt <==
<HDR></HDR><b></b><c></c>

==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>

$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

==> in1.txt <==
<b></b><c></c>

==> in2.txt <==
<b>
</b>
<c>
</c>

edited Sep 3 at 7:31

answered Sep 3 at 7:15

simlev

552114

I tried this , It removed the every content after the tag as well, now there tag alone in the new file
â€“Â user7952074
Sep 3 at 7:25

@user7952074 that means there's a  tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â€“Â simlev
Sep 3 at 7:39

add a commentÂ |Â

up vote
1
down vote

Question:

remove all contents of the file before 

Answer:

perl -0777 -lape 's/^.*<b>/<b>/s'

Test run:

==> in1.txt <==
<HDR></HDR><b></b><c></c>

==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>

$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

==> in1.txt <==
<b></b><c></c>

==> in2.txt <==
<b>
</b>
<c>
</c>

edited Sep 3 at 7:31

answered Sep 3 at 7:15

simlev

552114

Question:

remove all contents of the file before 

Answer:

perl -0777 -lape 's/^.*<b>/<b>/s'

Test run:

==> in1.txt <==
<HDR></HDR><b></b><c></c>

==> in2.txt <==
<HDR>
</HDR>
<b>
</b>
<c>
</c>

$ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

==> in1.txt <==
<b></b><c></c>

==> in2.txt <==
<b>
</b>
<c>
</c>

edited Sep 3 at 7:31

answered Sep 3 at 7:15

simlev

552114

edited Sep 3 at 7:31

answered Sep 3 at 7:15

simlev

552114

answered Sep 3 at 7:15

simlev

552114

answered Sep 3 at 7:15

simlev

552114

I tried this , It removed the every content after the tag as well, now there tag alone in the new file
â€“Â user7952074
Sep 3 at 7:25

@user7952074 that means there's a  tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â€“Â simlev
Sep 3 at 7:39

add a commentÂ |Â

I tried this , It removed the every content after the tag as well, now there tag alone in the new file
â€“Â user7952074
Sep 3 at 7:25

@user7952074 that means there's a  tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â€“Â simlev
Sep 3 at 7:39

I tried this , It removed the every content after the tag as well, now there tag alone in the new file
â€“Â user7952074
Sep 3 at 7:25

@user7952074 that means there's a  tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
â€“Â simlev
Sep 3 at 7:39

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu