Finding specific string in XML file and storing in another file [closed]

-4

Text in input file is like this

<title>
 <band height="21" isSplitAllowed="true" >
 <staticText>
 <reportElement
 x="1"
 y="1"
 width="313"
 height="20"
 key="staticText-1"/>
 <box></box>
 <textElement>
 <font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
 </textElement>
 <text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
 </staticText>
 </band>
</title>

Output file should have:

4) Computation of Tier I and Tier II Capital :

The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.

edited Jan 30 at 5:29

asked Jan 29 at 9:13

Ankita Jain

closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34

Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

– xenoid
Jan 29 at 9:31

4

grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

– Sparhawk
Jan 29 at 9:32

if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

– rajaganesh87
Jan 29 at 9:57

add a comment |

-4

Text in input file is like this

<title>
 <band height="21" isSplitAllowed="true" >
 <staticText>
 <reportElement
 x="1"
 y="1"
 width="313"
 height="20"
 key="staticText-1"/>
 <box></box>
 <textElement>
 <font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
 </textElement>
 <text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
 </staticText>
 </band>
</title>

Output file should have:

4) Computation of Tier I and Tier II Capital :

The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.

edited Jan 30 at 5:29

asked Jan 29 at 9:13

Ankita Jain

closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34

Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

– xenoid
Jan 29 at 9:31

4

grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

– Sparhawk
Jan 29 at 9:32

if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

– rajaganesh87
Jan 29 at 9:57

add a comment |

-4

Text in input file is like this

<title>
 <band height="21" isSplitAllowed="true" >
 <staticText>
 <reportElement
 x="1"
 y="1"
 width="313"
 height="20"
 key="staticText-1"/>
 <box></box>
 <textElement>
 <font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
 </textElement>
 <text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
 </staticText>
 </band>
</title>

Output file should have:

4) Computation of Tier I and Tier II Capital :

The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.

edited Jan 30 at 5:29

asked Jan 29 at 9:13

Ankita Jain

Text in input file is like this

<title>
 <band height="21" isSplitAllowed="true" >
 <staticText>
 <reportElement
 x="1"
 y="1"
 width="313"
 height="20"
 key="staticText-1"/>
 <box></box>
 <textElement>
 <font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
 </textElement>
 <text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
 </staticText>
 </band>
</title>

Output file should have:

4) Computation of Tier I and Tier II Capital :

The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.

filenames string search xml

edited Jan 30 at 5:29

asked Jan 29 at 9:13

Ankita Jain

edited Jan 30 at 5:29

asked Jan 29 at 9:13

Ankita Jain

edited Jan 30 at 5:29

asked Jan 29 at 9:13

Ankita Jain

asked Jan 29 at 9:13

Ankita Jain

asked Jan 29 at 9:13

Ankita Jain

closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34

Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

– xenoid
Jan 29 at 9:31

4

grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

– Sparhawk
Jan 29 at 9:32

if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

– rajaganesh87
Jan 29 at 9:57

add a comment |

Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

– xenoid
Jan 29 at 9:31

4

grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

– Sparhawk
Jan 29 at 9:32

if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

– rajaganesh87
Jan 29 at 9:57

Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

– xenoid
Jan 29 at 9:31

grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

– Sparhawk
Jan 29 at 9:32

if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

– rajaganesh87
Jan 29 at 9:57

add a comment |

2 Answers
2

active

oldest

votes

It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,

<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>

you can use an XML parser to parse your XML:

xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :

If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:

xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :

answered Jan 29 at 14:05

roaima

44.7k755121

xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28

Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38

2

@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11

add a comment |

Like this?

$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'

sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step

grep will print out everything what is preceded by [CDATA[ and followed by ]

edited Jan 30 at 9:42

answered Jan 29 at 12:15

finswimmer

56417

** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29

1

Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,

<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>

you can use an XML parser to parse your XML:

xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :

If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:

xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :

answered Jan 29 at 14:05

roaima

44.7k755121

xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28

Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38

2

@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11

add a comment |

It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,

<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>

you can use an XML parser to parse your XML:

xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :

If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:

xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :

answered Jan 29 at 14:05

roaima

44.7k755121

xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28

Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38

2

@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11

add a comment |

It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,

<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>

you can use an XML parser to parse your XML:

xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :

If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:

xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :

answered Jan 29 at 14:05

roaima

44.7k755121

It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,

<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>

you can use an XML parser to parse your XML:

xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :

If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:

xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :

answered Jan 29 at 14:05

roaima

44.7k755121

answered Jan 29 at 14:05

roaima

44.7k755121

answered Jan 29 at 14:05

roaima

44.7k755121

answered Jan 29 at 14:05

roaima

44.7k755121

xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28

Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38

2

@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11

add a comment |

xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28

Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38

2

@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11

xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28

Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38

@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11

add a comment |

Like this?

$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'

sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step

grep will print out everything what is preceded by [CDATA[ and followed by ]

edited Jan 30 at 9:42

answered Jan 29 at 12:15

finswimmer

56417

** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29

1

Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43

add a comment |

Like this?

$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'

sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step

grep will print out everything what is preceded by [CDATA[ and followed by ]

edited Jan 30 at 9:42

answered Jan 29 at 12:15

finswimmer

56417

** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29

1

Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43

add a comment |

Like this?

$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'

sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step

grep will print out everything what is preceded by [CDATA[ and followed by ]

edited Jan 30 at 9:42

answered Jan 29 at 12:15

finswimmer

56417

Like this?

$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'

sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step

grep will print out everything what is preceded by [CDATA[ and followed by ]

edited Jan 30 at 9:42

answered Jan 29 at 12:15

finswimmer

56417

edited Jan 30 at 9:42

answered Jan 29 at 12:15

finswimmer

56417

answered Jan 29 at 12:15

finswimmer

56417

answered Jan 29 at 12:15

finswimmer

56417

** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29

1

Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43

add a comment |

** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29

1

Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43

** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29

Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43

add a comment |

搜尋此網誌

mjhjmtu