How to extract data between two different xml tags

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question
















bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47














up vote
0
down vote

favorite












I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question
















bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>









share|improve this question















I have looked but haven't been able to find anyone else with the same sort of problem I have.



I have an xml file like this:



<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.



I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?



EDIT:



I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID



So I can get something like this result:



<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>






text-processing xml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 27 '17 at 23:36

























asked Feb 27 '17 at 1:37









averagescripter

1124




1124





bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community♦ 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.









  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47












  • 2




    I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
    – roaima
    Oct 18 '17 at 7:14










  • The data is not well formed XML. It's lacking a root node.
    – Kusalananda
    Jul 11 at 20:47







2




2




I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14




I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as xmlstarlet). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
– roaima
Oct 18 '17 at 7:14












The data is not well formed XML. It's lacking a root node.
– Kusalananda
Jul 11 at 20:47




The data is not well formed XML. It's lacking a root node.
– Kusalananda
Jul 11 at 20:47










4 Answers
4






active

oldest

votes

















up vote
0
down vote













Grep



grep -oE '<data>[^<]*</data>' yourxmlfile


Bash



tag='data'
tL="<$tag>" tR="</$tag>"
xml=$(< yourxmlfile)
while case $xml in *"$tL"* ) :;; * ) break;; esac; do
t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
echo "$tL$t2$tR"
done


Perl



perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


Sed



sed -e "
s|<$tag>|n&|
s/.*n//
s|</$tag>|&n|
/n/P;D
" yourxmlfile


Output



 <data>asdf</data>
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>





share|improve this answer





























    up vote
    0
    down vote













    As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



    With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



    #!/usr/bin/env python
    # coding: ascii
    """extract.py

    Extract everything between two XML tags
    in a (possibly poorly formed) XML document."""

    from bs4 import BeautifulSoup
    import sys

    # Set the opening tag name and value
    opening_name = "ID"
    opening_text = "2"

    # Set the closing tag name
    closing_name = "dateAccessed"

    # Get the XML data from a file and instantiate a BeautifulSoup parser
    # We add a root node because the input data is missing a root
    with open(sys.argv[1], 'r') as xmlfile:
    xmldoc = "<root>" + xmlfile.read() + "</root>"
    soup = BeautifulSoup(xmldoc, 'xml')

    # Iterate through the elements of the XML data and collect
    # all of the elements inbetween the opening and closing tags
    elements =
    match = False
    for e in soup.find_all():
    if match is True:
    elements.append(str(e))
    if e.name==closing_name:
    break
    else:
    try:
    if e.name==opening_name and e.text==opening_text:
    match = True
    elements.append(str(e))
    except AttributeError:
    pass

    # Output the results on a single line
    print("".join(elements))


    You would run it something like this:



    python extract.py data.xml


    For your given example data:



    <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


    It produces the following output:



    <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





    share|improve this answer



























      up vote
      -1
      down vote













      if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



      awk -F"[<>]" 'print $3' input.txt


      if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



      awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





      share|improve this answer



























        up vote
        -3
        down vote













        provided XML has no line breaks.
        why don't you try inserting n between >< which will make the XML in standard format



        Example:-
        i have created a file called stack with the given xml.



        below is the sed operation to introduce line breaks.



         cat stack|sed -e 's/></>n</g'

        <ID>2</ID>
        <data>asdf</data>
        <data2>asdf</data2>
        <dataX>asdf</dataX>
        <dateAccessed>somedate</dateAccessed>


        now you can access the tags you want






        share|improve this answer




















          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');

          );

          Post as a guest






























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          Grep



          grep -oE '<data>[^<]*</data>' yourxmlfile


          Bash



          tag='data'
          tL="<$tag>" tR="</$tag>"
          xml=$(< yourxmlfile)
          while case $xml in *"$tL"* ) :;; * ) break;; esac; do
          t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
          echo "$tL$t2$tR"
          done


          Perl



          perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


          Sed



          sed -e "
          s|<$tag>|n&|
          s/.*n//
          s|</$tag>|&n|
          /n/P;D
          " yourxmlfile


          Output



           <data>asdf</data>
          <data>asdf</data>
          <data>asdf</data>
          <data>asdf</data>





          share|improve this answer


























            up vote
            0
            down vote













            Grep



            grep -oE '<data>[^<]*</data>' yourxmlfile


            Bash



            tag='data'
            tL="<$tag>" tR="</$tag>"
            xml=$(< yourxmlfile)
            while case $xml in *"$tL"* ) :;; * ) break;; esac; do
            t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
            echo "$tL$t2$tR"
            done


            Perl



            perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


            Sed



            sed -e "
            s|<$tag>|n&|
            s/.*n//
            s|</$tag>|&n|
            /n/P;D
            " yourxmlfile


            Output



             <data>asdf</data>
            <data>asdf</data>
            <data>asdf</data>
            <data>asdf</data>





            share|improve this answer
























              up vote
              0
              down vote










              up vote
              0
              down vote









              Grep



              grep -oE '<data>[^<]*</data>' yourxmlfile


              Bash



              tag='data'
              tL="<$tag>" tR="</$tag>"
              xml=$(< yourxmlfile)
              while case $xml in *"$tL"* ) :;; * ) break;; esac; do
              t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
              echo "$tL$t2$tR"
              done


              Perl



              perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


              Sed



              sed -e "
              s|<$tag>|n&|
              s/.*n//
              s|</$tag>|&n|
              /n/P;D
              " yourxmlfile


              Output



               <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>





              share|improve this answer














              Grep



              grep -oE '<data>[^<]*</data>' yourxmlfile


              Bash



              tag='data'
              tL="<$tag>" tR="</$tag>"
              xml=$(< yourxmlfile)
              while case $xml in *"$tL"* ) :;; * ) break;; esac; do
              t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
              echo "$tL$t2$tR"
              done


              Perl



              perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile


              Sed



              sed -e "
              s|<$tag>|n&|
              s/.*n//
              s|</$tag>|&n|
              /n/P;D
              " yourxmlfile


              Output



               <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>
              <data>asdf</data>






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Feb 27 '17 at 4:33

























              answered Feb 27 '17 at 3:48









              Rakesh Sharma

              62213




              62213






















                  up vote
                  0
                  down vote













                  As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                  With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                  #!/usr/bin/env python
                  # coding: ascii
                  """extract.py

                  Extract everything between two XML tags
                  in a (possibly poorly formed) XML document."""

                  from bs4 import BeautifulSoup
                  import sys

                  # Set the opening tag name and value
                  opening_name = "ID"
                  opening_text = "2"

                  # Set the closing tag name
                  closing_name = "dateAccessed"

                  # Get the XML data from a file and instantiate a BeautifulSoup parser
                  # We add a root node because the input data is missing a root
                  with open(sys.argv[1], 'r') as xmlfile:
                  xmldoc = "<root>" + xmlfile.read() + "</root>"
                  soup = BeautifulSoup(xmldoc, 'xml')

                  # Iterate through the elements of the XML data and collect
                  # all of the elements inbetween the opening and closing tags
                  elements =
                  match = False
                  for e in soup.find_all():
                  if match is True:
                  elements.append(str(e))
                  if e.name==closing_name:
                  break
                  else:
                  try:
                  if e.name==opening_name and e.text==opening_text:
                  match = True
                  elements.append(str(e))
                  except AttributeError:
                  pass

                  # Output the results on a single line
                  print("".join(elements))


                  You would run it something like this:



                  python extract.py data.xml


                  For your given example data:



                  <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                  It produces the following output:



                  <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





                  share|improve this answer
























                    up vote
                    0
                    down vote













                    As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                    With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                    #!/usr/bin/env python
                    # coding: ascii
                    """extract.py

                    Extract everything between two XML tags
                    in a (possibly poorly formed) XML document."""

                    from bs4 import BeautifulSoup
                    import sys

                    # Set the opening tag name and value
                    opening_name = "ID"
                    opening_text = "2"

                    # Set the closing tag name
                    closing_name = "dateAccessed"

                    # Get the XML data from a file and instantiate a BeautifulSoup parser
                    # We add a root node because the input data is missing a root
                    with open(sys.argv[1], 'r') as xmlfile:
                    xmldoc = "<root>" + xmlfile.read() + "</root>"
                    soup = BeautifulSoup(xmldoc, 'xml')

                    # Iterate through the elements of the XML data and collect
                    # all of the elements inbetween the opening and closing tags
                    elements =
                    match = False
                    for e in soup.find_all():
                    if match is True:
                    elements.append(str(e))
                    if e.name==closing_name:
                    break
                    else:
                    try:
                    if e.name==opening_name and e.text==opening_text:
                    match = True
                    elements.append(str(e))
                    except AttributeError:
                    pass

                    # Output the results on a single line
                    print("".join(elements))


                    You would run it something like this:



                    python extract.py data.xml


                    For your given example data:



                    <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                    It produces the following output:



                    <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





                    share|improve this answer






















                      up vote
                      0
                      down vote










                      up vote
                      0
                      down vote









                      As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                      With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                      #!/usr/bin/env python
                      # coding: ascii
                      """extract.py

                      Extract everything between two XML tags
                      in a (possibly poorly formed) XML document."""

                      from bs4 import BeautifulSoup
                      import sys

                      # Set the opening tag name and value
                      opening_name = "ID"
                      opening_text = "2"

                      # Set the closing tag name
                      closing_name = "dateAccessed"

                      # Get the XML data from a file and instantiate a BeautifulSoup parser
                      # We add a root node because the input data is missing a root
                      with open(sys.argv[1], 'r') as xmlfile:
                      xmldoc = "<root>" + xmlfile.read() + "</root>"
                      soup = BeautifulSoup(xmldoc, 'xml')

                      # Iterate through the elements of the XML data and collect
                      # all of the elements inbetween the opening and closing tags
                      elements =
                      match = False
                      for e in soup.find_all():
                      if match is True:
                      elements.append(str(e))
                      if e.name==closing_name:
                      break
                      else:
                      try:
                      if e.name==opening_name and e.text==opening_text:
                      match = True
                      elements.append(str(e))
                      except AttributeError:
                      pass

                      # Output the results on a single line
                      print("".join(elements))


                      You would run it something like this:



                      python extract.py data.xml


                      For your given example data:



                      <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                      It produces the following output:



                      <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>





                      share|improve this answer












                      As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?



                      With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):



                      #!/usr/bin/env python
                      # coding: ascii
                      """extract.py

                      Extract everything between two XML tags
                      in a (possibly poorly formed) XML document."""

                      from bs4 import BeautifulSoup
                      import sys

                      # Set the opening tag name and value
                      opening_name = "ID"
                      opening_text = "2"

                      # Set the closing tag name
                      closing_name = "dateAccessed"

                      # Get the XML data from a file and instantiate a BeautifulSoup parser
                      # We add a root node because the input data is missing a root
                      with open(sys.argv[1], 'r') as xmlfile:
                      xmldoc = "<root>" + xmlfile.read() + "</root>"
                      soup = BeautifulSoup(xmldoc, 'xml')

                      # Iterate through the elements of the XML data and collect
                      # all of the elements inbetween the opening and closing tags
                      elements =
                      match = False
                      for e in soup.find_all():
                      if match is True:
                      elements.append(str(e))
                      if e.name==closing_name:
                      break
                      else:
                      try:
                      if e.name==opening_name and e.text==opening_text:
                      match = True
                      elements.append(str(e))
                      except AttributeError:
                      pass

                      # Output the results on a single line
                      print("".join(elements))


                      You would run it something like this:



                      python extract.py data.xml


                      For your given example data:



                      <ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>


                      It produces the following output:



                      <ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jul 11 at 22:23









                      igal

                      4,992930




                      4,992930




















                          up vote
                          -1
                          down vote













                          if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                          awk -F"[<>]" 'print $3' input.txt


                          if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                          awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





                          share|improve this answer
























                            up vote
                            -1
                            down vote













                            if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                            awk -F"[<>]" 'print $3' input.txt


                            if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                            awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





                            share|improve this answer






















                              up vote
                              -1
                              down vote










                              up vote
                              -1
                              down vote









                              if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                              awk -F"[<>]" 'print $3' input.txt


                              if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                              awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt





                              share|improve this answer












                              if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this



                              awk -F"[<>]" 'print $3' input.txt


                              if you want to search for specific tag, then try this awk command. you need to change the value of input=ID



                              awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Feb 27 '17 at 3:32









                              Kamaraj

                              2,9081513




                              2,9081513




















                                  up vote
                                  -3
                                  down vote













                                  provided XML has no line breaks.
                                  why don't you try inserting n between >< which will make the XML in standard format



                                  Example:-
                                  i have created a file called stack with the given xml.



                                  below is the sed operation to introduce line breaks.



                                   cat stack|sed -e 's/></>n</g'

                                  <ID>2</ID>
                                  <data>asdf</data>
                                  <data2>asdf</data2>
                                  <dataX>asdf</dataX>
                                  <dateAccessed>somedate</dateAccessed>


                                  now you can access the tags you want






                                  share|improve this answer
























                                    up vote
                                    -3
                                    down vote













                                    provided XML has no line breaks.
                                    why don't you try inserting n between >< which will make the XML in standard format



                                    Example:-
                                    i have created a file called stack with the given xml.



                                    below is the sed operation to introduce line breaks.



                                     cat stack|sed -e 's/></>n</g'

                                    <ID>2</ID>
                                    <data>asdf</data>
                                    <data2>asdf</data2>
                                    <dataX>asdf</dataX>
                                    <dateAccessed>somedate</dateAccessed>


                                    now you can access the tags you want






                                    share|improve this answer






















                                      up vote
                                      -3
                                      down vote










                                      up vote
                                      -3
                                      down vote









                                      provided XML has no line breaks.
                                      why don't you try inserting n between >< which will make the XML in standard format



                                      Example:-
                                      i have created a file called stack with the given xml.



                                      below is the sed operation to introduce line breaks.



                                       cat stack|sed -e 's/></>n</g'

                                      <ID>2</ID>
                                      <data>asdf</data>
                                      <data2>asdf</data2>
                                      <dataX>asdf</dataX>
                                      <dateAccessed>somedate</dateAccessed>


                                      now you can access the tags you want






                                      share|improve this answer












                                      provided XML has no line breaks.
                                      why don't you try inserting n between >< which will make the XML in standard format



                                      Example:-
                                      i have created a file called stack with the given xml.



                                      below is the sed operation to introduce line breaks.



                                       cat stack|sed -e 's/></>n</g'

                                      <ID>2</ID>
                                      <data>asdf</data>
                                      <data2>asdf</data2>
                                      <dataX>asdf</dataX>
                                      <dateAccessed>somedate</dateAccessed>


                                      now you can access the tags you want







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Oct 18 '17 at 7:08









                                      user256118

                                      1




                                      1



























                                           

                                          draft saved


                                          draft discarded















































                                           


                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');

                                          );

                                          Post as a guest













































































                                          Popular posts from this blog

                                          How to check contact read email or not when send email to Individual?

                                          Displaying single band from multi-band raster using QGIS

                                          How many registers does an x86_64 CPU actually have?