Finding specific string in XML file and storing in another file [closed]
Clash Royale CLAN TAG#URR8PPP
Text in input file is like this
<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>
Output file should have:
4) Computation of Tier I and Tier II Capital :
The file has many <title>
and [CDATA]
tags. but I want to copy text which is under tag <title>
under <CDATA>
and save its output in another file.
filenames string search xml
closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
Text in input file is like this
<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>
Output file should have:
4) Computation of Tier I and Tier II Capital :
The file has many <title>
and [CDATA]
tags. but I want to copy text which is under tag <title>
under <CDATA>
and save its output in another file.
filenames string search xml
closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...
– xenoid
Jan 29 at 9:31
4
grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt
:/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.
– Sparhawk
Jan 29 at 9:32
if required string is always within ** then trycat file | grep -rin * | cut -d * -f 3
– rajaganesh87
Jan 29 at 9:57
add a comment |
Text in input file is like this
<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>
Output file should have:
4) Computation of Tier I and Tier II Capital :
The file has many <title>
and [CDATA]
tags. but I want to copy text which is under tag <title>
under <CDATA>
and save its output in another file.
filenames string search xml
Text in input file is like this
<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>
Output file should have:
4) Computation of Tier I and Tier II Capital :
The file has many <title>
and [CDATA]
tags. but I want to copy text which is under tag <title>
under <CDATA>
and save its output in another file.
filenames string search xml
filenames string search xml
edited Jan 30 at 5:29
Ankita Jain
asked Jan 29 at 9:13
Ankita JainAnkita Jain
11
11
closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...
– xenoid
Jan 29 at 9:31
4
grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt
:/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.
– Sparhawk
Jan 29 at 9:32
if required string is always within ** then trycat file | grep -rin * | cut -d * -f 3
– rajaganesh87
Jan 29 at 9:57
add a comment |
Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...
– xenoid
Jan 29 at 9:31
4
grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt
:/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.
– Sparhawk
Jan 29 at 9:32
if required string is always within ** then trycat file | grep -rin * | cut -d * -f 3
– rajaganesh87
Jan 29 at 9:57
Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...
– xenoid
Jan 29 at 9:31
Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...
– xenoid
Jan 29 at 9:31
4
4
grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt
:/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.– Sparhawk
Jan 29 at 9:32
grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt
:/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.– Sparhawk
Jan 29 at 9:32
if required string is always within ** then try
cat file | grep -rin * | cut -d * -f 3
– rajaganesh87
Jan 29 at 9:57
if required string is always within ** then try
cat file | grep -rin * | cut -d * -f 3
– rajaganesh87
Jan 29 at 9:57
add a comment |
2 Answers
2
active
oldest
votes
It looks like you may have tried to put a pair of **
sequences into your CDATA
section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
you can use an XML parser to parse your XML:
xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :
If you have a tighter constraint than just "the contents of the <text/>
element" you can adjust the XPath filter appropriately. For example:
xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :
xmlstarlet is now working in my unix machine
– Ankita Jain
Jan 30 at 5:28
Great stuff. So you're sorted then?
– roaima
Jan 30 at 7:38
2
@AnkitaJain Good! If this solves your issue, please consider accepting the answer.
– Kusalananda
Jan 30 at 12:11
add a comment |
Like this?
$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'
sed
will print everything between the<title>
and</title>
(and include this tags). If your[CDATA
is always just in this area you can omit this stepgrep
will print out everything what is preceded by[CDATA[
and followed by]
** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]
– Ankita Jain
Jan 30 at 5:29
1
Then just remove the**
. I've edited my answer.
– finswimmer
Jan 30 at 9:43
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
It looks like you may have tried to put a pair of **
sequences into your CDATA
section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
you can use an XML parser to parse your XML:
xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :
If you have a tighter constraint than just "the contents of the <text/>
element" you can adjust the XPath filter appropriately. For example:
xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :
xmlstarlet is now working in my unix machine
– Ankita Jain
Jan 30 at 5:28
Great stuff. So you're sorted then?
– roaima
Jan 30 at 7:38
2
@AnkitaJain Good! If this solves your issue, please consider accepting the answer.
– Kusalananda
Jan 30 at 12:11
add a comment |
It looks like you may have tried to put a pair of **
sequences into your CDATA
section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
you can use an XML parser to parse your XML:
xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :
If you have a tighter constraint than just "the contents of the <text/>
element" you can adjust the XPath filter appropriately. For example:
xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :
xmlstarlet is now working in my unix machine
– Ankita Jain
Jan 30 at 5:28
Great stuff. So you're sorted then?
– roaima
Jan 30 at 7:38
2
@AnkitaJain Good! If this solves your issue, please consider accepting the answer.
– Kusalananda
Jan 30 at 12:11
add a comment |
It looks like you may have tried to put a pair of **
sequences into your CDATA
section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
you can use an XML parser to parse your XML:
xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :
If you have a tighter constraint than just "the contents of the <text/>
element" you can adjust the XPath filter appropriately. For example:
xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :
It looks like you may have tried to put a pair of **
sequences into your CDATA
section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
you can use an XML parser to parse your XML:
xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :
If you have a tighter constraint than just "the contents of the <text/>
element" you can adjust the XPath filter appropriately. For example:
xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :
answered Jan 29 at 14:05
roaimaroaima
44.7k755121
44.7k755121
xmlstarlet is now working in my unix machine
– Ankita Jain
Jan 30 at 5:28
Great stuff. So you're sorted then?
– roaima
Jan 30 at 7:38
2
@AnkitaJain Good! If this solves your issue, please consider accepting the answer.
– Kusalananda
Jan 30 at 12:11
add a comment |
xmlstarlet is now working in my unix machine
– Ankita Jain
Jan 30 at 5:28
Great stuff. So you're sorted then?
– roaima
Jan 30 at 7:38
2
@AnkitaJain Good! If this solves your issue, please consider accepting the answer.
– Kusalananda
Jan 30 at 12:11
xmlstarlet is now working in my unix machine
– Ankita Jain
Jan 30 at 5:28
xmlstarlet is now working in my unix machine
– Ankita Jain
Jan 30 at 5:28
Great stuff. So you're sorted then?
– roaima
Jan 30 at 7:38
Great stuff. So you're sorted then?
– roaima
Jan 30 at 7:38
2
2
@AnkitaJain Good! If this solves your issue, please consider accepting the answer.
– Kusalananda
Jan 30 at 12:11
@AnkitaJain Good! If this solves your issue, please consider accepting the answer.
– Kusalananda
Jan 30 at 12:11
add a comment |
Like this?
$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'
sed
will print everything between the<title>
and</title>
(and include this tags). If your[CDATA
is always just in this area you can omit this stepgrep
will print out everything what is preceded by[CDATA[
and followed by]
** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]
– Ankita Jain
Jan 30 at 5:29
1
Then just remove the**
. I've edited my answer.
– finswimmer
Jan 30 at 9:43
add a comment |
Like this?
$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'
sed
will print everything between the<title>
and</title>
(and include this tags). If your[CDATA
is always just in this area you can omit this stepgrep
will print out everything what is preceded by[CDATA[
and followed by]
** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]
– Ankita Jain
Jan 30 at 5:29
1
Then just remove the**
. I've edited my answer.
– finswimmer
Jan 30 at 9:43
add a comment |
Like this?
$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'
sed
will print everything between the<title>
and</title>
(and include this tags). If your[CDATA
is always just in this area you can omit this stepgrep
will print out everything what is preceded by[CDATA[
and followed by]
Like this?
$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'
sed
will print everything between the<title>
and</title>
(and include this tags). If your[CDATA
is always just in this area you can omit this stepgrep
will print out everything what is preceded by[CDATA[
and followed by]
edited Jan 30 at 9:42
answered Jan 29 at 12:15
finswimmerfinswimmer
56417
56417
** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]
– Ankita Jain
Jan 30 at 5:29
1
Then just remove the**
. I've edited my answer.
– finswimmer
Jan 30 at 9:43
add a comment |
** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]
– Ankita Jain
Jan 30 at 5:29
1
Then just remove the**
. I've edited my answer.
– finswimmer
Jan 30 at 9:43
** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]
– Ankita Jain
Jan 30 at 5:29
** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]
– Ankita Jain
Jan 30 at 5:29
1
1
Then just remove the
**
. I've edited my answer.– finswimmer
Jan 30 at 9:43
Then just remove the
**
. I've edited my answer.– finswimmer
Jan 30 at 9:43
add a comment |
Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...
– xenoid
Jan 29 at 9:31
4
grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt
:/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.– Sparhawk
Jan 29 at 9:32
if required string is always within ** then try
cat file | grep -rin * | cut -d * -f 3
– rajaganesh87
Jan 29 at 9:57