How to remove content before a pattern in xml using unix

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












Source file example:
<HDR></HDR><b></b><c></c>


(XML file created in a single line)



OR



Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>


I need to remove all the content of the file before <b> in both of the source format.
I tried using the below method



sed 's/^.*b/b/'


But this is not replacing it. Please let me know if there is an alternative way.










share|improve this question























  • Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
    – simlev
    Sep 3 at 7:01







  • 2




    The XML is not well formed as there is no root tag around the whole document.
    – Kusalananda
    Sep 3 at 7:02










  • I have a header in the XML which needs to be replaced with different header
    – user7952074
    Sep 3 at 7:14










  • It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
    – simlev
    Sep 3 at 7:42















up vote
0
down vote

favorite












Source file example:
<HDR></HDR><b></b><c></c>


(XML file created in a single line)



OR



Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>


I need to remove all the content of the file before <b> in both of the source format.
I tried using the below method



sed 's/^.*b/b/'


But this is not replacing it. Please let me know if there is an alternative way.










share|improve this question























  • Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
    – simlev
    Sep 3 at 7:01







  • 2




    The XML is not well formed as there is no root tag around the whole document.
    – Kusalananda
    Sep 3 at 7:02










  • I have a header in the XML which needs to be replaced with different header
    – user7952074
    Sep 3 at 7:14










  • It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
    – simlev
    Sep 3 at 7:42













up vote
0
down vote

favorite









up vote
0
down vote

favorite











Source file example:
<HDR></HDR><b></b><c></c>


(XML file created in a single line)



OR



Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>


I need to remove all the content of the file before <b> in both of the source format.
I tried using the below method



sed 's/^.*b/b/'


But this is not replacing it. Please let me know if there is an alternative way.










share|improve this question















Source file example:
<HDR></HDR><b></b><c></c>


(XML file created in a single line)



OR



Source file example:
<HDR>
</HDR>
<b>
</b>
<c>
</c>


I need to remove all the content of the file before <b> in both of the source format.
I tried using the below method



sed 's/^.*b/b/'


But this is not replacing it. Please let me know if there is an alternative way.







shell-script awk sed xml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 3 at 7:15

























asked Sep 3 at 6:57









user7952074

335




335











  • Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
    – simlev
    Sep 3 at 7:01







  • 2




    The XML is not well formed as there is no root tag around the whole document.
    – Kusalananda
    Sep 3 at 7:02










  • I have a header in the XML which needs to be replaced with different header
    – user7952074
    Sep 3 at 7:14










  • It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
    – simlev
    Sep 3 at 7:42

















  • Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
    – simlev
    Sep 3 at 7:01







  • 2




    The XML is not well formed as there is no root tag around the whole document.
    – Kusalananda
    Sep 3 at 7:02










  • I have a header in the XML which needs to be replaced with different header
    – user7952074
    Sep 3 at 7:14










  • It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
    – simlev
    Sep 3 at 7:42
















Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
– simlev
Sep 3 at 7:01





Please be clearer as to what you want to accomplish and why. A little context might be useful in understanding your needs and helping you find a solution.
– simlev
Sep 3 at 7:01





2




2




The XML is not well formed as there is no root tag around the whole document.
– Kusalananda
Sep 3 at 7:02




The XML is not well formed as there is no root tag around the whole document.
– Kusalananda
Sep 3 at 7:02












I have a header in the XML which needs to be replaced with different header
– user7952074
Sep 3 at 7:14




I have a header in the XML which needs to be replaced with different header
– user7952074
Sep 3 at 7:14












It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
– simlev
Sep 3 at 7:42





It seems then your problem is different from what you describe. Please add complete specifications of what you're trying to accomplish (replacing header tag?) in order to avoid the XY problem.
– simlev
Sep 3 at 7:42











3 Answers
3






active

oldest

votes

















up vote
5
down vote













Assuming your XML document is well formed, like



<document>
<HDR>
</HDR>
<b>
</b>
<c>
</c>
</document>


Then you may use XMLStarlet to remove all HDR tags like so:



xmlstarlet ed -d '//HDR' file.xml >newfile.xml


To only remove the HDR tags that are immediately followed by a b tag:



xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml



XMLStarlet may also be used to modify the contents of tags:



$ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
<?xml version="1.0"?>
<document>
<HDR>New header value</HDR>
<b/>
<c/>
</document>

$ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
<?xml version="1.0"?>
<document>
<HDR new_attribute="hello"/>
<b/>
<c/>
</document>





share|improve this answer


















  • 3




    It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
    – simlev
    Sep 3 at 7:47










  • This command is not added to our version, we can't ask the admins to install it. Thanks !!
    – user7952074
    Sep 3 at 12:20

















up vote
1
down vote













Type 1:



 echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
<b></b><c></c>


  • will replace everything up to <b> with <b>

Type 2:



sed -n '/<b>/,$p' file
<b>
</b>
<c>
</c>


  • will print the first occurrence of <b> to end of the file ($).





share|improve this answer



























    up vote
    1
    down vote













    Question:




    remove all contents of the file before <b>




    Answer:



    perl -0777 -lape 's/^.*<b>/<b>/s'


    Test run:



    ==> in1.txt <==
    <HDR></HDR><b></b><c></c>

    ==> in2.txt <==
    <HDR>
    </HDR>
    <b>
    </b>
    <c>
    </c>

    $ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

    ==> in1.txt <==
    <b></b><c></c>

    ==> in2.txt <==
    <b>
    </b>
    <c>
    </c>





    share|improve this answer






















    • I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
      – user7952074
      Sep 3 at 7:25










    • @user7952074 that means there's a <b> tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
      – simlev
      Sep 3 at 7:39










    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f466503%2fhow-to-remove-content-before-a-pattern-in-xml-using-unix%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    5
    down vote













    Assuming your XML document is well formed, like



    <document>
    <HDR>
    </HDR>
    <b>
    </b>
    <c>
    </c>
    </document>


    Then you may use XMLStarlet to remove all HDR tags like so:



    xmlstarlet ed -d '//HDR' file.xml >newfile.xml


    To only remove the HDR tags that are immediately followed by a b tag:



    xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml



    XMLStarlet may also be used to modify the contents of tags:



    $ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR>New header value</HDR>
    <b/>
    <c/>
    </document>

    $ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR new_attribute="hello"/>
    <b/>
    <c/>
    </document>





    share|improve this answer


















    • 3




      It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
      – simlev
      Sep 3 at 7:47










    • This command is not added to our version, we can't ask the admins to install it. Thanks !!
      – user7952074
      Sep 3 at 12:20














    up vote
    5
    down vote













    Assuming your XML document is well formed, like



    <document>
    <HDR>
    </HDR>
    <b>
    </b>
    <c>
    </c>
    </document>


    Then you may use XMLStarlet to remove all HDR tags like so:



    xmlstarlet ed -d '//HDR' file.xml >newfile.xml


    To only remove the HDR tags that are immediately followed by a b tag:



    xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml



    XMLStarlet may also be used to modify the contents of tags:



    $ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR>New header value</HDR>
    <b/>
    <c/>
    </document>

    $ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR new_attribute="hello"/>
    <b/>
    <c/>
    </document>





    share|improve this answer


















    • 3




      It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
      – simlev
      Sep 3 at 7:47










    • This command is not added to our version, we can't ask the admins to install it. Thanks !!
      – user7952074
      Sep 3 at 12:20












    up vote
    5
    down vote










    up vote
    5
    down vote









    Assuming your XML document is well formed, like



    <document>
    <HDR>
    </HDR>
    <b>
    </b>
    <c>
    </c>
    </document>


    Then you may use XMLStarlet to remove all HDR tags like so:



    xmlstarlet ed -d '//HDR' file.xml >newfile.xml


    To only remove the HDR tags that are immediately followed by a b tag:



    xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml



    XMLStarlet may also be used to modify the contents of tags:



    $ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR>New header value</HDR>
    <b/>
    <c/>
    </document>

    $ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR new_attribute="hello"/>
    <b/>
    <c/>
    </document>





    share|improve this answer














    Assuming your XML document is well formed, like



    <document>
    <HDR>
    </HDR>
    <b>
    </b>
    <c>
    </c>
    </document>


    Then you may use XMLStarlet to remove all HDR tags like so:



    xmlstarlet ed -d '//HDR' file.xml >newfile.xml


    To only remove the HDR tags that are immediately followed by a b tag:



    xmlstarlet ed -d '//HDR[following-sibling::*[1][name() = "b"]]' file.xml >newfile.xml



    XMLStarlet may also be used to modify the contents of tags:



    $ xmlstarlet ed -u '//HDR[following-sibling::*[1][name() = "b"]]' -v 'New header value' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR>New header value</HDR>
    <b/>
    <c/>
    </document>

    $ xmlstarlet ed -i '//HDR[following-sibling::*[1][name() = "b"]]' -t attr -n 'new_attribute' -v 'hello' file.xml
    <?xml version="1.0"?>
    <document>
    <HDR new_attribute="hello"/>
    <b/>
    <c/>
    </document>






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Sep 3 at 11:44

























    answered Sep 3 at 7:22









    Kusalananda

    107k14209331




    107k14209331







    • 3




      It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
      – simlev
      Sep 3 at 7:47










    • This command is not added to our version, we can't ask the admins to install it. Thanks !!
      – user7952074
      Sep 3 at 12:20












    • 3




      It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
      – simlev
      Sep 3 at 7:47










    • This command is not added to our version, we can't ask the admins to install it. Thanks !!
      – user7952074
      Sep 3 at 12:20







    3




    3




    It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
    – simlev
    Sep 3 at 7:47




    It has not been stressed enough that XML files should be treated as such and handled by dedicated tools, as opposed to be regarded as text files and messed with with the wrong tools.
    – simlev
    Sep 3 at 7:47












    This command is not added to our version, we can't ask the admins to install it. Thanks !!
    – user7952074
    Sep 3 at 12:20




    This command is not added to our version, we can't ask the admins to install it. Thanks !!
    – user7952074
    Sep 3 at 12:20












    up vote
    1
    down vote













    Type 1:



     echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
    <b></b><c></c>


    • will replace everything up to <b> with <b>

    Type 2:



    sed -n '/<b>/,$p' file
    <b>
    </b>
    <c>
    </c>


    • will print the first occurrence of <b> to end of the file ($).





    share|improve this answer
























      up vote
      1
      down vote













      Type 1:



       echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
      <b></b><c></c>


      • will replace everything up to <b> with <b>

      Type 2:



      sed -n '/<b>/,$p' file
      <b>
      </b>
      <c>
      </c>


      • will print the first occurrence of <b> to end of the file ($).





      share|improve this answer






















        up vote
        1
        down vote










        up vote
        1
        down vote









        Type 1:



         echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
        <b></b><c></c>


        • will replace everything up to <b> with <b>

        Type 2:



        sed -n '/<b>/,$p' file
        <b>
        </b>
        <c>
        </c>


        • will print the first occurrence of <b> to end of the file ($).





        share|improve this answer












        Type 1:



         echo "<HDR></HDR><b></b><c></c>" | sed 's/^.*<b>/<b>/' 
        <b></b><c></c>


        • will replace everything up to <b> with <b>

        Type 2:



        sed -n '/<b>/,$p' file
        <b>
        </b>
        <c>
        </c>


        • will print the first occurrence of <b> to end of the file ($).






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Sep 3 at 7:27









        msp9011

        3,46643862




        3,46643862




















            up vote
            1
            down vote













            Question:




            remove all contents of the file before <b>




            Answer:



            perl -0777 -lape 's/^.*<b>/<b>/s'


            Test run:



            ==> in1.txt <==
            <HDR></HDR><b></b><c></c>

            ==> in2.txt <==
            <HDR>
            </HDR>
            <b>
            </b>
            <c>
            </c>

            $ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

            ==> in1.txt <==
            <b></b><c></c>

            ==> in2.txt <==
            <b>
            </b>
            <c>
            </c>





            share|improve this answer






















            • I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
              – user7952074
              Sep 3 at 7:25










            • @user7952074 that means there's a <b> tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
              – simlev
              Sep 3 at 7:39














            up vote
            1
            down vote













            Question:




            remove all contents of the file before <b>




            Answer:



            perl -0777 -lape 's/^.*<b>/<b>/s'


            Test run:



            ==> in1.txt <==
            <HDR></HDR><b></b><c></c>

            ==> in2.txt <==
            <HDR>
            </HDR>
            <b>
            </b>
            <c>
            </c>

            $ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

            ==> in1.txt <==
            <b></b><c></c>

            ==> in2.txt <==
            <b>
            </b>
            <c>
            </c>





            share|improve this answer






















            • I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
              – user7952074
              Sep 3 at 7:25










            • @user7952074 that means there's a <b> tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
              – simlev
              Sep 3 at 7:39












            up vote
            1
            down vote










            up vote
            1
            down vote









            Question:




            remove all contents of the file before <b>




            Answer:



            perl -0777 -lape 's/^.*<b>/<b>/s'


            Test run:



            ==> in1.txt <==
            <HDR></HDR><b></b><c></c>

            ==> in2.txt <==
            <HDR>
            </HDR>
            <b>
            </b>
            <c>
            </c>

            $ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

            ==> in1.txt <==
            <b></b><c></c>

            ==> in2.txt <==
            <b>
            </b>
            <c>
            </c>





            share|improve this answer














            Question:




            remove all contents of the file before <b>




            Answer:



            perl -0777 -lape 's/^.*<b>/<b>/s'


            Test run:



            ==> in1.txt <==
            <HDR></HDR><b></b><c></c>

            ==> in2.txt <==
            <HDR>
            </HDR>
            <b>
            </b>
            <c>
            </c>

            $ perl -i -0777 -lape 's/^.*<b>/<b>/s' in1,2.txt

            ==> in1.txt <==
            <b></b><c></c>

            ==> in2.txt <==
            <b>
            </b>
            <c>
            </c>






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Sep 3 at 7:31

























            answered Sep 3 at 7:15









            simlev

            552114




            552114











            • I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
              – user7952074
              Sep 3 at 7:25










            • @user7952074 that means there's a <b> tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
              – simlev
              Sep 3 at 7:39
















            • I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
              – user7952074
              Sep 3 at 7:25










            • @user7952074 that means there's a <b> tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
              – simlev
              Sep 3 at 7:39















            I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
            – user7952074
            Sep 3 at 7:25




            I tried this , It removed the every content after the <b> tag as well, now there <b> tag alone in the new file
            – user7952074
            Sep 3 at 7:25












            @user7952074 that means there's a <b> tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
            – simlev
            Sep 3 at 7:39




            @user7952074 that means there's a <b> tag at the end of the file :-) Please follow @Kusalananda's advice and stick to well-formed XML and proper XML manipulation tools.
            – simlev
            Sep 3 at 7:39

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f466503%2fhow-to-remove-content-before-a-pattern-in-xml-using-unix%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?