How to extract data between two different xml tags

Multi tool use
Multi tool use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question
















bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47














up vote
0
down vote

favorite












I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question
















bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question















I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>






text-processing xml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 27 '17 at 23:36

























asked Feb 27 '17 at 1:37









averagescripter

1124




1124





bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.









  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47












  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47







2




2




I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14




I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14












The data is not well formed XML. It's lacking a root node.
– Kusalananda
Jul 11 at 20:47




The data is not well formed XML. It's lacking a root node.
– Kusalananda
Jul 11 at 20:47










4 Answers
4






active

oldest

votes

















up vote
0
down vote













Grep



grep -oE '<data>[^<]*</data>' yourxmlfile


Bash



tag='data'
tL="<$tag>" tR="</$tag>"
xml=$(< yourxmlfile)
while case $xml in *"$tL"* ) :;; * ) break;; esac; do
t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
echo "$tL$t2$tR"
done


Perl



perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


Sed



sed -e "
s|<$tag>|n&|
s/.*n//
s|</$tag>|&n|
/n/P;D
" yourxmlfile


Output



 <data>asdf</data>
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>





share|improve this answer





























    up vote
    0
    down vote













    As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



    With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



    #!/usr/bin/env python
    # coding: ascii
    """extract.py

    Extract everything between two XML tags
    in a (possibly poorly formed) XML document."""

    from bs4 import BeautifulSoup
    import sys

    # Set the opening tag name and value
    opening_name = "ID"
    opening_text = "2"

    # Set the closing tag name
    closing_name = "dateAccessed"

    # Get the XML data from a file and instantiate a BeautifulSoup parser
    # We add a root node because the input data is missing a root
    with open(sys.argv[1], 'r') as xmlfile:
    xmldoc = "<root>" + xmlfile.read() + "</root>"
    soup = BeautifulSoup(xmldoc, 'xml')

    # Iterate through the elements of the XML data and collect
    # all of the elements inbetween the opening and closing tags
    elements =
    match = False
    for e in soup.find_all():
    if match is True:
    elements.append(str(e))
    if e.name==closing_name:
    break
    else:
    try:
    if e.name==opening_name and e.text==opening_text:
    match = True
    elements.append(str(e))
    except AttributeError:
    pass

    # Output the results on a single line
    print("".join(elements))


    You would run it something like this:



    python extract.py data.xml


    For your given example data:



    <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


    It produces the following output:



    <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





    share|improve this answer



























      up vote
      -1
      down vote













      if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



      awk -F"[<>]" 'print $3' input.txt


      if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



      awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





      share|improve this answer



























        up vote
        -3
        down vote













        provided XML has no line breaks.
        why don't you try inserting n between >< which will make the XML in standard format



        Example:-
        i have created a file called stack with the given xml.



        below is the sed operation to introduce line breaks.



         cat stack|sed -e 's/></>n</g'

        <ID>2</ID>
        <data>asdf</data>
        <data2>asdf</data2>
        <dataX>asdf</dataX>
        <dateAccessed>somedate</dateAccessed>


        now you can access the tags you want






        share|improve this answer




















          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');

          );

          Post as a guest






























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          Grep



          grep -oE '<data>[^<]*</data>' yourxmlfile


          Bash



          tag='data'
          tL="<$tag>" tR="</$tag>"
          xml=$(< yourxmlfile)
          while case $xml in *"$tL"* ) :;; * ) break;; esac; do
          t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
          echo "$tL$t2$tR"
          done


          Perl



          perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


          Sed



          sed -e "
          s|<$tag>|n&|
          s/.*n//
          s|</$tag>|&n|
          /n/P;D
          " yourxmlfile


          Output



           <data>asdf</data>
          <data>asdf</data>
          <data>asdf</data>
          <data>asdf</data>





          share|improve this answer


























            up vote
            0
            down vote













            Grep



            grep -oE '<data>[^<]*</data>' yourxmlfile


            Bash



            tag='data'
            tL="<$tag>" tR="</$tag>"
            xml=$(< yourxmlfile)
            while case $xml in *"$tL"* ) :;; * ) break;; esac; do
            t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
            echo "$tL$t2$tR"
            done


            Perl



            perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


            Sed



            sed -e "
            s|<$tag>|n&|
            s/.*n//
            s|</$tag>|&n|
            /n/P;D
            " yourxmlfile


            Output



             <data>asdf</data>
            <data>asdf</data>
            <data>asdf</data>
            <data>asdf</data>





            share|improve this answer
























              up vote
              0
              down vote










              up vote
              0
              down vote









              Grep



              grep -oE '<data>[^<]*</data>' yourxmlfile


              Bash



              tag='data'
              tL="<$tag>" tR="</$tag>"
              xml=$(< yourxmlfile)
              while case $xml in *"$tL"* ) :;; * ) break;; esac; do
              t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
              echo "$tL$t2$tR"
              done


              Perl



              perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


              Sed



              sed -e "
              s|<$tag>|n&|
              s/.*n//
              s|</$tag>|&n|
              /n/P;D
              " yourxmlfile


              Output



               <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>





              share|improve this answer














              Grep



              grep -oE '<data>[^<]*</data>' yourxmlfile


              Bash



              tag='data'
              tL="<$tag>" tR="</$tag>"
              xml=$(< yourxmlfile)
              while case $xml in *"$tL"* ) :;; * ) break;; esac; do
              t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
              echo "$tL$t2$tR"
              done


              Perl



              perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


              Sed



              sed -e "
              s|<$tag>|n&|
              s/.*n//
              s|</$tag>|&n|
              /n/P;D
              " yourxmlfile


              Output



               <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Feb 27 '17 at 4:33

























              answered Feb 27 '17 at 3:48









              Rakesh Sharma

              62213




              62213






















                  up vote
                  0
                  down vote













                  As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                  With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                  #!/usr/bin/env python
                  # coding: ascii
                  """extract.py

                  Extract everything between two XML tags
                  in a (possibly poorly formed) XML document."""

                  from bs4 import BeautifulSoup
                  import sys

                  # Set the opening tag name and value
                  opening_name = "ID"
                  opening_text = "2"

                  # Set the closing tag name
                  closing_name = "dateAccessed"

                  # Get the XML data from a file and instantiate a BeautifulSoup parser
                  # We add a root node because the input data is missing a root
                  with open(sys.argv[1], 'r') as xmlfile:
                  xmldoc = "<root>" + xmlfile.read() + "</root>"
                  soup = BeautifulSoup(xmldoc, 'xml')

                  # Iterate through the elements of the XML data and collect
                  # all of the elements inbetween the opening and closing tags
                  elements =
                  match = False
                  for e in soup.find_all():
                  if match is True:
                  elements.append(str(e))
                  if e.name==closing_name:
                  break
                  else:
                  try:
                  if e.name==opening_name and e.text==opening_text:
                  match = True
                  elements.append(str(e))
                  except AttributeError:
                  pass

                  # Output the results on a single line
                  print("".join(elements))


                  You would run it something like this:



                  python extract.py data.xml


                  For your given example data:



                  <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                  It produces the following output:



                  <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





                  share|improve this answer
























                    up vote
                    0
                    down vote













                    As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                    With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                    #!/usr/bin/env python
                    # coding: ascii
                    """extract.py

                    Extract everything between two XML tags
                    in a (possibly poorly formed) XML document."""

                    from bs4 import BeautifulSoup
                    import sys

                    # Set the opening tag name and value
                    opening_name = "ID"
                    opening_text = "2"

                    # Set the closing tag name
                    closing_name = "dateAccessed"

                    # Get the XML data from a file and instantiate a BeautifulSoup parser
                    # We add a root node because the input data is missing a root
                    with open(sys.argv[1], 'r') as xmlfile:
                    xmldoc = "<root>" + xmlfile.read() + "</root>"
                    soup = BeautifulSoup(xmldoc, 'xml')

                    # Iterate through the elements of the XML data and collect
                    # all of the elements inbetween the opening and closing tags
                    elements =
                    match = False
                    for e in soup.find_all():
                    if match is True:
                    elements.append(str(e))
                    if e.name==closing_name:
                    break
                    else:
                    try:
                    if e.name==opening_name and e.text==opening_text:
                    match = True
                    elements.append(str(e))
                    except AttributeError:
                    pass

                    # Output the results on a single line
                    print("".join(elements))


                    You would run it something like this:



                    python extract.py data.xml


                    For your given example data:



                    <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                    It produces the following output:



                    <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





                    share|improve this answer






















                      up vote
                      0
                      down vote










                      up vote
                      0
                      down vote









                      As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                      With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                      #!/usr/bin/env python
                      # coding: ascii
                      """extract.py

                      Extract everything between two XML tags
                      in a (possibly poorly formed) XML document."""

                      from bs4 import BeautifulSoup
                      import sys

                      # Set the opening tag name and value
                      opening_name = "ID"
                      opening_text = "2"

                      # Set the closing tag name
                      closing_name = "dateAccessed"

                      # Get the XML data from a file and instantiate a BeautifulSoup parser
                      # We add a root node because the input data is missing a root
                      with open(sys.argv[1], 'r') as xmlfile:
                      xmldoc = "<root>" + xmlfile.read() + "</root>"
                      soup = BeautifulSoup(xmldoc, 'xml')

                      # Iterate through the elements of the XML data and collect
                      # all of the elements inbetween the opening and closing tags
                      elements =
                      match = False
                      for e in soup.find_all():
                      if match is True:
                      elements.append(str(e))
                      if e.name==closing_name:
                      break
                      else:
                      try:
                      if e.name==opening_name and e.text==opening_text:
                      match = True
                      elements.append(str(e))
                      except AttributeError:
                      pass

                      # Output the results on a single line
                      print("".join(elements))


                      You would run it something like this:



                      python extract.py data.xml


                      For your given example data:



                      <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                      It produces the following output:



                      <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





                      share|improve this answer












                      As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                      With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                      #!/usr/bin/env python
                      # coding: ascii
                      """extract.py

                      Extract everything between two XML tags
                      in a (possibly poorly formed) XML document."""

                      from bs4 import BeautifulSoup
                      import sys

                      # Set the opening tag name and value
                      opening_name = "ID"
                      opening_text = "2"

                      # Set the closing tag name
                      closing_name = "dateAccessed"

                      # Get the XML data from a file and instantiate a BeautifulSoup parser
                      # We add a root node because the input data is missing a root
                      with open(sys.argv[1], 'r') as xmlfile:
                      xmldoc = "<root>" + xmlfile.read() + "</root>"
                      soup = BeautifulSoup(xmldoc, 'xml')

                      # Iterate through the elements of the XML data and collect
                      # all of the elements inbetween the opening and closing tags
                      elements =
                      match = False
                      for e in soup.find_all():
                      if match is True:
                      elements.append(str(e))
                      if e.name==closing_name:
                      break
                      else:
                      try:
                      if e.name==opening_name and e.text==opening_text:
                      match = True
                      elements.append(str(e))
                      except AttributeError:
                      pass

                      # Output the results on a single line
                      print("".join(elements))


                      You would run it something like this:



                      python extract.py data.xml


                      For your given example data:



                      <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                      It produces the following output:



                      <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jul 11 at 22:23









                      igal

                      4,992930




                      4,992930




















                          up vote
                          -1
                          down vote













                          if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                          awk -F"[<>]" 'print $3' input.txt


                          if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                          awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





                          share|improve this answer
























                            up vote
                            -1
                            down vote













                            if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                            awk -F"[<>]" 'print $3' input.txt


                            if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                            awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





                            share|improve this answer






















                              up vote
                              -1
                              down vote










                              up vote
                              -1
                              down vote









                              if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                              awk -F"[<>]" 'print $3' input.txt


                              if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                              awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





                              share|improve this answer












                              if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                              awk -F"[<>]" 'print $3' input.txt


                              if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                              awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Feb 27 '17 at 3:32









                              Kamaraj

                              2,9081513




                              2,9081513




















                                  up vote
                                  -3
                                  down vote













                                  provided XML has no line breaks.
                                  why don't you try inserting n between >< which will make the XML in standard format



                                  Example:-
                                  i have created a file called stack with the given xml.



                                  below is the sed operation to introduce line breaks.



                                   cat stack|sed -e 's/></>n</g'

                                  <ID>2</ID>
                                  <data>asdf</data>
                                  <data2>asdf</data2>
                                  <dataX>asdf</dataX>
                                  <dateAccessed>somedate</dateAccessed>


                                  now you can access the tags you want






                                  share|improve this answer
























                                    up vote
                                    -3
                                    down vote













                                    provided XML has no line breaks.
                                    why don't you try inserting n between >< which will make the XML in standard format



                                    Example:-
                                    i have created a file called stack with the given xml.



                                    below is the sed operation to introduce line breaks.



                                     cat stack|sed -e 's/></>n</g'

                                    <ID>2</ID>
                                    <data>asdf</data>
                                    <data2>asdf</data2>
                                    <dataX>asdf</dataX>
                                    <dateAccessed>somedate</dateAccessed>


                                    now you can access the tags you want






                                    share|improve this answer






















                                      up vote
                                      -3
                                      down vote










                                      up vote
                                      -3
                                      down vote









                                      provided XML has no line breaks.
                                      why don't you try inserting n between >< which will make the XML in standard format



                                      Example:-
                                      i have created a file called stack with the given xml.



                                      below is the sed operation to introduce line breaks.



                                       cat stack|sed -e 's/></>n</g'

                                      <ID>2</ID>
                                      <data>asdf</data>
                                      <data2>asdf</data2>
                                      <dataX>asdf</dataX>
                                      <dateAccessed>somedate</dateAccessed>


                                      now you can access the tags you want






                                      share|improve this answer












                                      provided XML has no line breaks.
                                      why don't you try inserting n between >< which will make the XML in standard format



                                      Example:-
                                      i have created a file called stack with the given xml.



                                      below is the sed operation to introduce line breaks.



                                       cat stack|sed -e 's/></>n</g'

                                      <ID>2</ID>
                                      <data>asdf</data>
                                      <data2>asdf</data2>
                                      <dataX>asdf</dataX>
                                      <dateAccessed>somedate</dateAccessed>


                                      now you can access the tags you want







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Oct 18 '17 at 7:08









                                      user256118

                                      1




                                      1



























                                           

                                          draft saved


                                          draft discarded















































                                           


                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');

                                          );

                                          Post as a guest













































































                                          TiQp,x79ZgCFwewS2,cZPBpoueMHWSgL3uJEk8p3K,49k6sVQ,tD,wBgNx Xgi7doQd,UK8BGDZeS0UB8 XZAUky,EkGvfMpmWx 8YV,eBr
                                          Rr8X05,0sP6bl,Yk1AHc21gDEW185jaPSFnT0m0Ru5qPAlHUJI3iWnV08LMOG 74kWeHoWhGBJupVfqFKwwlOi93wptMbOn

                                          Popular posts from this blog

                                          How to check contact read email or not when send email to Individual?

                                          How many registers does an x86_64 CPU actually have?

                                          Displaying single band from multi-band raster using QGIS