How to extract data between two different xml tags
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have looked but haven't been able to find anyone else with the same sort of problem I have.
I have an xml file like this:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.
I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?
EDIT:
I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID
So I can get something like this result:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
text-processing xml
bumped to the homepage by Community⦠4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |Â
up vote
0
down vote
favorite
I have looked but haven't been able to find anyone else with the same sort of problem I have.
I have an xml file like this:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.
I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?
EDIT:
I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID
So I can get something like this result:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
text-processing xml
bumped to the homepage by Community⦠4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
2
I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such asxmlstarlet
). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
â roaima
Oct 18 '17 at 7:14
The data is not well formed XML. It's lacking a root node.
â Kusalananda
Jul 11 at 20:47
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have looked but haven't been able to find anyone else with the same sort of problem I have.
I have an xml file like this:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.
I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?
EDIT:
I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID
So I can get something like this result:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
text-processing xml
I have looked but haven't been able to find anyone else with the same sort of problem I have.
I have an xml file like this:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
Basically a whole bunch of data all on one line, no line breaks.
I need to extract the info (preferably just as-is with tags intact) between a specific < ID> tag (eg < ID>2 )and the very next < /dateAccessed> tag. I have about 50 files to check for a particular ID and the following related data. I get that this is not standard, there is no nesting.
I originally tried to do this using grep and sed, but I just get the whole file returned, which seems odd to me. Can't I just treat this like a text file?
EDIT:
I didn't realise the formatter removed text that was in enclosing < and > , so after re-reading my question this morning, I realised it's asking something completely different.
TL;DR
I need what is between a specific value between ID tags and the next closing DateAccessed tag. Not between the same opening and closing tags, ie between ID and /ID
So I can get something like this result:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
text-processing xml
text-processing xml
edited Feb 27 '17 at 23:36
asked Feb 27 '17 at 1:37
averagescripter
1124
1124
bumped to the homepage by Community⦠4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community⦠4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
2
I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such asxmlstarlet
). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
â roaima
Oct 18 '17 at 7:14
The data is not well formed XML. It's lacking a root node.
â Kusalananda
Jul 11 at 20:47
add a comment |Â
2
I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such asxmlstarlet
). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.
â roaima
Oct 18 '17 at 7:14
The data is not well formed XML. It's lacking a root node.
â Kusalananda
Jul 11 at 20:47
2
2
I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as
xmlstarlet
). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.â roaima
Oct 18 '17 at 7:14
I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as
xmlstarlet
). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.â roaima
Oct 18 '17 at 7:14
The data is not well formed XML. It's lacking a root node.
â Kusalananda
Jul 11 at 20:47
The data is not well formed XML. It's lacking a root node.
â Kusalananda
Jul 11 at 20:47
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
0
down vote
Grep
grep -oE '<data>[^<]*</data>' yourxmlfile
Bash
tag='data'
tL="<$tag>" tR="</$tag>"
xml=$(< yourxmlfile)
while case $xml in *"$tL"* ) :;; * ) break;; esac; do
t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
echo "$tL$t2$tR"
done
Perl
perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile
Sed
sed -e "
s|<$tag>|n&|
s/.*n//
s|</$tag>|&n|
/n/P;D
" yourxmlfile
Output
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
add a comment |Â
up vote
0
down vote
As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?
With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):
#!/usr/bin/env python
# coding: ascii
"""extract.py
Extract everything between two XML tags
in a (possibly poorly formed) XML document."""
from bs4 import BeautifulSoup
import sys
# Set the opening tag name and value
opening_name = "ID"
opening_text = "2"
# Set the closing tag name
closing_name = "dateAccessed"
# Get the XML data from a file and instantiate a BeautifulSoup parser
# We add a root node because the input data is missing a root
with open(sys.argv[1], 'r') as xmlfile:
xmldoc = "<root>" + xmlfile.read() + "</root>"
soup = BeautifulSoup(xmldoc, 'xml')
# Iterate through the elements of the XML data and collect
# all of the elements inbetween the opening and closing tags
elements =
match = False
for e in soup.find_all():
if match is True:
elements.append(str(e))
if e.name==closing_name:
break
else:
try:
if e.name==opening_name and e.text==opening_text:
match = True
elements.append(str(e))
except AttributeError:
pass
# Output the results on a single line
print("".join(elements))
You would run it something like this:
python extract.py data.xml
For your given example data:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
It produces the following output:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
add a comment |Â
up vote
-1
down vote
if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this
awk -F"[<>]" 'print $3' input.txt
if you want to search for specific tag, then try this awk command. you need to change the value of input=ID
awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt
add a comment |Â
up vote
-3
down vote
provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format
Example:-
i have created a file called stack with the given xml.
below is the sed operation to introduce line breaks.
cat stack|sed -e 's/></>n</g'
<ID>2</ID>
<data>asdf</data>
<data2>asdf</data2>
<dataX>asdf</dataX>
<dateAccessed>somedate</dateAccessed>
now you can access the tags you want
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Grep
grep -oE '<data>[^<]*</data>' yourxmlfile
Bash
tag='data'
tL="<$tag>" tR="</$tag>"
xml=$(< yourxmlfile)
while case $xml in *"$tL"* ) :;; * ) break;; esac; do
t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
echo "$tL$t2$tR"
done
Perl
perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile
Sed
sed -e "
s|<$tag>|n&|
s/.*n//
s|</$tag>|&n|
/n/P;D
" yourxmlfile
Output
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
add a comment |Â
up vote
0
down vote
Grep
grep -oE '<data>[^<]*</data>' yourxmlfile
Bash
tag='data'
tL="<$tag>" tR="</$tag>"
xml=$(< yourxmlfile)
while case $xml in *"$tL"* ) :;; * ) break;; esac; do
t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
echo "$tL$t2$tR"
done
Perl
perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile
Sed
sed -e "
s|<$tag>|n&|
s/.*n//
s|</$tag>|&n|
/n/P;D
" yourxmlfile
Output
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Grep
grep -oE '<data>[^<]*</data>' yourxmlfile
Bash
tag='data'
tL="<$tag>" tR="</$tag>"
xml=$(< yourxmlfile)
while case $xml in *"$tL"* ) :;; * ) break;; esac; do
t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
echo "$tL$t2$tR"
done
Perl
perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile
Sed
sed -e "
s|<$tag>|n&|
s/.*n//
s|</$tag>|&n|
/n/P;D
" yourxmlfile
Output
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
Grep
grep -oE '<data>[^<]*</data>' yourxmlfile
Bash
tag='data'
tL="<$tag>" tR="</$tag>"
xml=$(< yourxmlfile)
while case $xml in *"$tL"* ) :;; * ) break;; esac; do
t1=$xml#*"$tL" t2=$t1%%"$tR"* xml=$t1#*"$tR"
echo "$tL$t2$tR"
done
Perl
perl -lne "print for/<$tag>.*?</$tag>/g" yourxmlfile
Sed
sed -e "
s|<$tag>|n&|
s/.*n//
s|</$tag>|&n|
/n/P;D
" yourxmlfile
Output
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
<data>asdf</data>
edited Feb 27 '17 at 4:33
answered Feb 27 '17 at 3:48
Rakesh Sharma
62213
62213
add a comment |Â
add a comment |Â
up vote
0
down vote
As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?
With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):
#!/usr/bin/env python
# coding: ascii
"""extract.py
Extract everything between two XML tags
in a (possibly poorly formed) XML document."""
from bs4 import BeautifulSoup
import sys
# Set the opening tag name and value
opening_name = "ID"
opening_text = "2"
# Set the closing tag name
closing_name = "dateAccessed"
# Get the XML data from a file and instantiate a BeautifulSoup parser
# We add a root node because the input data is missing a root
with open(sys.argv[1], 'r') as xmlfile:
xmldoc = "<root>" + xmlfile.read() + "</root>"
soup = BeautifulSoup(xmldoc, 'xml')
# Iterate through the elements of the XML data and collect
# all of the elements inbetween the opening and closing tags
elements =
match = False
for e in soup.find_all():
if match is True:
elements.append(str(e))
if e.name==closing_name:
break
else:
try:
if e.name==opening_name and e.text==opening_text:
match = True
elements.append(str(e))
except AttributeError:
pass
# Output the results on a single line
print("".join(elements))
You would run it something like this:
python extract.py data.xml
For your given example data:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
It produces the following output:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
add a comment |Â
up vote
0
down vote
As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?
With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):
#!/usr/bin/env python
# coding: ascii
"""extract.py
Extract everything between two XML tags
in a (possibly poorly formed) XML document."""
from bs4 import BeautifulSoup
import sys
# Set the opening tag name and value
opening_name = "ID"
opening_text = "2"
# Set the closing tag name
closing_name = "dateAccessed"
# Get the XML data from a file and instantiate a BeautifulSoup parser
# We add a root node because the input data is missing a root
with open(sys.argv[1], 'r') as xmlfile:
xmldoc = "<root>" + xmlfile.read() + "</root>"
soup = BeautifulSoup(xmldoc, 'xml')
# Iterate through the elements of the XML data and collect
# all of the elements inbetween the opening and closing tags
elements =
match = False
for e in soup.find_all():
if match is True:
elements.append(str(e))
if e.name==closing_name:
break
else:
try:
if e.name==opening_name and e.text==opening_text:
match = True
elements.append(str(e))
except AttributeError:
pass
# Output the results on a single line
print("".join(elements))
You would run it something like this:
python extract.py data.xml
For your given example data:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
It produces the following output:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
add a comment |Â
up vote
0
down vote
up vote
0
down vote
As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?
With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):
#!/usr/bin/env python
# coding: ascii
"""extract.py
Extract everything between two XML tags
in a (possibly poorly formed) XML document."""
from bs4 import BeautifulSoup
import sys
# Set the opening tag name and value
opening_name = "ID"
opening_text = "2"
# Set the closing tag name
closing_name = "dateAccessed"
# Get the XML data from a file and instantiate a BeautifulSoup parser
# We add a root node because the input data is missing a root
with open(sys.argv[1], 'r') as xmlfile:
xmldoc = "<root>" + xmlfile.read() + "</root>"
soup = BeautifulSoup(xmldoc, 'xml')
# Iterate through the elements of the XML data and collect
# all of the elements inbetween the opening and closing tags
elements =
match = False
for e in soup.find_all():
if match is True:
elements.append(str(e))
if e.name==closing_name:
break
else:
try:
if e.name==opening_name and e.text==opening_text:
match = True
elements.append(str(e))
except AttributeError:
pass
# Output the results on a single line
print("".join(elements))
You would run it something like this:
python extract.py data.xml
For your given example data:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
It produces the following output:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
As noted in the comments, your data isn't well-formed XML and it isn't completely clear what the structure of your document is, e.g. judging by your example data, it looks like you have no nested elements - is that really the case?
With that caveat in mind, here's a Python script that uses the BeautifulSoup4 parsing library to do what you want (i.e. it produces the desired output data for the given example input data):
#!/usr/bin/env python
# coding: ascii
"""extract.py
Extract everything between two XML tags
in a (possibly poorly formed) XML document."""
from bs4 import BeautifulSoup
import sys
# Set the opening tag name and value
opening_name = "ID"
opening_text = "2"
# Set the closing tag name
closing_name = "dateAccessed"
# Get the XML data from a file and instantiate a BeautifulSoup parser
# We add a root node because the input data is missing a root
with open(sys.argv[1], 'r') as xmlfile:
xmldoc = "<root>" + xmlfile.read() + "</root>"
soup = BeautifulSoup(xmldoc, 'xml')
# Iterate through the elements of the XML data and collect
# all of the elements inbetween the opening and closing tags
elements =
match = False
for e in soup.find_all():
if match is True:
elements.append(str(e))
if e.name==closing_name:
break
else:
try:
if e.name==opening_name and e.text==opening_text:
match = True
elements.append(str(e))
except AttributeError:
pass
# Output the results on a single line
print("".join(elements))
You would run it something like this:
python extract.py data.xml
For your given example data:
<ID>1</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>3</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed><ID>4</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
It produces the following output:
<ID>2</ID><data>asdf</data><data2>asdf</data2><dataX>asdf</dataX><dateAccessed>somedate</dateAccessed>
answered Jul 11 at 22:23
igal
4,992930
4,992930
add a comment |Â
add a comment |Â
up vote
-1
down vote
if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this
awk -F"[<>]" 'print $3' input.txt
if you want to search for specific tag, then try this awk command. you need to change the value of input=ID
awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt
add a comment |Â
up vote
-1
down vote
if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this
awk -F"[<>]" 'print $3' input.txt
if you want to search for specific tag, then try this awk command. you need to change the value of input=ID
awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt
add a comment |Â
up vote
-1
down vote
up vote
-1
down vote
if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this
awk -F"[<>]" 'print $3' input.txt
if you want to search for specific tag, then try this awk command. you need to change the value of input=ID
awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt
if you want to extract the ID value, and i assume ID always comes as first tag, then you can use this
awk -F"[<>]" 'print $3' input.txt
if you want to search for specific tag, then try this awk command. you need to change the value of input=ID
awk -F"[<>]" 'for(i=1;i<=NF;i++)if($i~input)print $(i+1);next' input=ID input.txt
answered Feb 27 '17 at 3:32
Kamaraj
2,9081513
2,9081513
add a comment |Â
add a comment |Â
up vote
-3
down vote
provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format
Example:-
i have created a file called stack with the given xml.
below is the sed operation to introduce line breaks.
cat stack|sed -e 's/></>n</g'
<ID>2</ID>
<data>asdf</data>
<data2>asdf</data2>
<dataX>asdf</dataX>
<dateAccessed>somedate</dateAccessed>
now you can access the tags you want
add a comment |Â
up vote
-3
down vote
provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format
Example:-
i have created a file called stack with the given xml.
below is the sed operation to introduce line breaks.
cat stack|sed -e 's/></>n</g'
<ID>2</ID>
<data>asdf</data>
<data2>asdf</data2>
<dataX>asdf</dataX>
<dateAccessed>somedate</dateAccessed>
now you can access the tags you want
add a comment |Â
up vote
-3
down vote
up vote
-3
down vote
provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format
Example:-
i have created a file called stack with the given xml.
below is the sed operation to introduce line breaks.
cat stack|sed -e 's/></>n</g'
<ID>2</ID>
<data>asdf</data>
<data2>asdf</data2>
<dataX>asdf</dataX>
<dateAccessed>somedate</dateAccessed>
now you can access the tags you want
provided XML has no line breaks.
why don't you try inserting n between >< which will make the XML in standard format
Example:-
i have created a file called stack with the given xml.
below is the sed operation to introduce line breaks.
cat stack|sed -e 's/></>n</g'
<ID>2</ID>
<data>asdf</data>
<data2>asdf</data2>
<dataX>asdf</dataX>
<dateAccessed>somedate</dateAccessed>
now you can access the tags you want
answered Oct 18 '17 at 7:08
user256118
1
1
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f347776%2fhow-to-extract-data-between-two-different-xml-tags%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
I can't help but feel this is the "wrong question". If you're working with XML files you should really be using an XML parser (such as
xmlstarlet
). I appreciate this won't give you an unbalanced segment, and so is not a suitable answer to your question as asked. But trying to treat XML as text will almost certainly lead to unintended consequences down the road. It's not a good place to be. Really.â roaima
Oct 18 '17 at 7:14
The data is not well formed XML. It's lacking a root node.
â Kusalananda
Jul 11 at 20:47