Finding specific string in XML file and storing in another file [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












-4















Text in input file is like this



<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>


Output file should have:



4) Computation of Tier I and Tier II Capital :


The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.










share|improve this question















closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


















  • Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

    – xenoid
    Jan 29 at 9:31






  • 4





    grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

    – Sparhawk
    Jan 29 at 9:32











  • if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

    – rajaganesh87
    Jan 29 at 9:57















-4















Text in input file is like this



<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>


Output file should have:



4) Computation of Tier I and Tier II Capital :


The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.










share|improve this question















closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


















  • Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

    – xenoid
    Jan 29 at 9:31






  • 4





    grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

    – Sparhawk
    Jan 29 at 9:32











  • if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

    – rajaganesh87
    Jan 29 at 9:57













-4












-4








-4








Text in input file is like this



<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>


Output file should have:



4) Computation of Tier I and Tier II Capital :


The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.










share|improve this question
















Text in input file is like this



<title>
<band height="21" isSplitAllowed="true" >
<staticText>
<reportElement
x="1"
y="1"
width="313"
height="20"
key="staticText-1"/>
<box></box>
<textElement>
<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
</textElement>
<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>
</staticText>
</band>
</title>


Output file should have:



4) Computation of Tier I and Tier II Capital :


The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.







filenames string search xml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 30 at 5:29







Ankita Jain

















asked Jan 29 at 9:13









Ankita JainAnkita Jain

11




11




closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.









closed as unclear what you're asking by G-Man, roaima, Sparhawk, Mr Shunz, msp9011 Jan 30 at 18:34


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

    – xenoid
    Jan 29 at 9:31






  • 4





    grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

    – Sparhawk
    Jan 29 at 9:32











  • if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

    – rajaganesh87
    Jan 29 at 9:57

















  • Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

    – xenoid
    Jan 29 at 9:31






  • 4





    grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

    – Sparhawk
    Jan 29 at 9:32











  • if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

    – rajaganesh87
    Jan 29 at 9:57
















Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

– xenoid
Jan 29 at 9:31





Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...

– xenoid
Jan 29 at 9:31




4




4





grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

– Sparhawk
Jan 29 at 9:32





grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.

– Sparhawk
Jan 29 at 9:32













if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

– rajaganesh87
Jan 29 at 9:57





if required string is always within ** then try cat file | grep -rin * | cut -d * -f 3

– rajaganesh87
Jan 29 at 9:57










2 Answers
2






active

oldest

votes


















2














It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,



<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>


you can use an XML parser to parse your XML:



xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :


If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:



xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :





share|improve this answer























  • xmlstarlet is now working in my unix machine

    – Ankita Jain
    Jan 30 at 5:28











  • Great stuff. So you're sorted then?

    – roaima
    Jan 30 at 7:38






  • 2





    @AnkitaJain Good! If this solves your issue, please consider accepting the answer.

    – Kusalananda
    Jan 30 at 12:11


















0














Like this?



$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'



  • sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step


  • grep will print out everything what is preceded by [CDATA[ and followed by ]





share|improve this answer

























  • ** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

    – Ankita Jain
    Jan 30 at 5:29






  • 1





    Then just remove the **. I've edited my answer.

    – finswimmer
    Jan 30 at 9:43

















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,



<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>


you can use an XML parser to parse your XML:



xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :


If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:



xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :





share|improve this answer























  • xmlstarlet is now working in my unix machine

    – Ankita Jain
    Jan 30 at 5:28











  • Great stuff. So you're sorted then?

    – roaima
    Jan 30 at 7:38






  • 2





    @AnkitaJain Good! If this solves your issue, please consider accepting the answer.

    – Kusalananda
    Jan 30 at 12:11















2














It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,



<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>


you can use an XML parser to parse your XML:



xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :


If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:



xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :





share|improve this answer























  • xmlstarlet is now working in my unix machine

    – Ankita Jain
    Jan 30 at 5:28











  • Great stuff. So you're sorted then?

    – roaima
    Jan 30 at 7:38






  • 2





    @AnkitaJain Good! If this solves your issue, please consider accepting the answer.

    – Kusalananda
    Jan 30 at 12:11













2












2








2







It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,



<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>


you can use an XML parser to parse your XML:



xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :


If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:



xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :





share|improve this answer













It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,



<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>


you can use an XML parser to parse your XML:



xmlstarlet sel -T -t -v '//text' -n x.xml
4) Computation of Tier I and Tier II Capital :


If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:



xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml
4) Computation of Tier I and Tier II Capital :






share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 29 at 14:05









roaimaroaima

44.7k755121




44.7k755121












  • xmlstarlet is now working in my unix machine

    – Ankita Jain
    Jan 30 at 5:28











  • Great stuff. So you're sorted then?

    – roaima
    Jan 30 at 7:38






  • 2





    @AnkitaJain Good! If this solves your issue, please consider accepting the answer.

    – Kusalananda
    Jan 30 at 12:11

















  • xmlstarlet is now working in my unix machine

    – Ankita Jain
    Jan 30 at 5:28











  • Great stuff. So you're sorted then?

    – roaima
    Jan 30 at 7:38






  • 2





    @AnkitaJain Good! If this solves your issue, please consider accepting the answer.

    – Kusalananda
    Jan 30 at 12:11
















xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28





xmlstarlet is now working in my unix machine

– Ankita Jain
Jan 30 at 5:28













Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38





Great stuff. So you're sorted then?

– roaima
Jan 30 at 7:38




2




2





@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11





@AnkitaJain Good! If this solves your issue, please consider accepting the answer.

– Kusalananda
Jan 30 at 12:11













0














Like this?



$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'



  • sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step


  • grep will print out everything what is preceded by [CDATA[ and followed by ]





share|improve this answer

























  • ** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

    – Ankita Jain
    Jan 30 at 5:29






  • 1





    Then just remove the **. I've edited my answer.

    – finswimmer
    Jan 30 at 9:43















0














Like this?



$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'



  • sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step


  • grep will print out everything what is preceded by [CDATA[ and followed by ]





share|improve this answer

























  • ** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

    – Ankita Jain
    Jan 30 at 5:29






  • 1





    Then just remove the **. I've edited my answer.

    – finswimmer
    Jan 30 at 9:43













0












0








0







Like this?



$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'



  • sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step


  • grep will print out everything what is preceded by [CDATA[ and followed by ]





share|improve this answer















Like this?



$ sed -n '/<title>/,/</title>/p' input.txt | grep -oP '(?<=[CDATA[).*(?=])'



  • sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step


  • grep will print out everything what is preceded by [CDATA[ and followed by ]






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 30 at 9:42

























answered Jan 29 at 12:15









finswimmerfinswimmer

56417




56417












  • ** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

    – Ankita Jain
    Jan 30 at 5:29






  • 1





    Then just remove the **. I've edited my answer.

    – finswimmer
    Jan 30 at 9:43

















  • ** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

    – Ankita Jain
    Jan 30 at 5:29






  • 1





    Then just remove the **. I've edited my answer.

    – finswimmer
    Jan 30 at 9:43
















** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29





** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]

– Ankita Jain
Jan 30 at 5:29




1




1





Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43





Then just remove the **. I've edited my answer.

– finswimmer
Jan 30 at 9:43


Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?