Find XML files with specific values

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I have a folder with ~10K XML files. Each of them looks like this:



...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...


The name includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?







share|improve this question

















  • 1




    I use an app called PowerGrep, but have to switch to windows tho
    – Huyen
    May 18 at 6:24










  • Should <attribute><name>Cat</name></attribute> also give a hit?
    – Kusalananda
    May 18 at 7:53















up vote
1
down vote

favorite












I have a folder with ~10K XML files. Each of them looks like this:



...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...


The name includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?







share|improve this question

















  • 1




    I use an app called PowerGrep, but have to switch to windows tho
    – Huyen
    May 18 at 6:24










  • Should <attribute><name>Cat</name></attribute> also give a hit?
    – Kusalananda
    May 18 at 7:53













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a folder with ~10K XML files. Each of them looks like this:



...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...


The name includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?







share|improve this question













I have a folder with ~10K XML files. Each of them looks like this:



...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...


The name includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?









share|improve this question












share|improve this question




share|improve this question








edited May 18 at 7:37









karel

706817




706817









asked May 18 at 5:06









Huyen

62




62







  • 1




    I use an app called PowerGrep, but have to switch to windows tho
    – Huyen
    May 18 at 6:24










  • Should <attribute><name>Cat</name></attribute> also give a hit?
    – Kusalananda
    May 18 at 7:53













  • 1




    I use an app called PowerGrep, but have to switch to windows tho
    – Huyen
    May 18 at 6:24










  • Should <attribute><name>Cat</name></attribute> also give a hit?
    – Kusalananda
    May 18 at 7:53








1




1




I use an app called PowerGrep, but have to switch to windows tho
– Huyen
May 18 at 6:24




I use an app called PowerGrep, but have to switch to windows tho
– Huyen
May 18 at 6:24












Should <attribute><name>Cat</name></attribute> also give a hit?
– Kusalananda
May 18 at 7:53





Should <attribute><name>Cat</name></attribute> also give a hit?
– Kusalananda
May 18 at 7:53











3 Answers
3






active

oldest

votes

















up vote
1
down vote













Following code is based on GNU grep



As you said , that all files are like this, so you can use grep



for Cat or Dog , use



grep -l '<name>(Cat|Dog)</name>' *


for Cat and Dog both to be present, use



grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'


and if you want case-insensitive search , then add -i option to grep



-l - this option will print only filename having match



With normal regex, the characters (, | and ) need to be escaped, so I have escaped them






share|improve this answer



















  • 1




    Basic regular expression does not support alternation with | nor with |. You have to either use grep -E to enable extended regular expressions or specify that you're using GNU grep (which does support alternation with | in basic regular expressions).
    – Kusalananda
    May 18 at 5:34

















up vote
1
down vote













To get all the Cat or Dog values out of the name node in an XML document like yours, you may use xmlstarlet like this:



xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml


This would generate the words Cat and Dog as output if they exist the document as the values of an object node's name child-node. This operation would be tricky to get right with grep in case there are other name nodes that are not child-nodes to object nodes, or if some name nodes have attributes etc.



Unfortunately, xmlstarlet does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep at the end of this to check whether we got any output at all (this will be used in the next step):



xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'


We can then run this on all the 10k files though find:



find . -type f -name '*.xml' -exec sh -c '
xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
grep -q "."' sh ';' -print


This would first find all regular files in or below the current directory whose names end with .xml. For each such file, xmlstarlet is run to extract the Cat and Dog strings from the correct XML nodes, and grep is used to check whether xmlstarlet found anything. Running grep with its -q option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.



If grep found anything, find then prints the pathname of the file that contained the data.






share|improve this answer






























    up vote
    0
    down vote













    If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.



    Example:



    $ glimpseindex -H . MyDir
    $ glimpse -l -H . 'cat;dog'


    to get the files containing cad and dog






    share|improve this answer





















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444517%2ffind-xml-files-with-specific-values%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      Following code is based on GNU grep



      As you said , that all files are like this, so you can use grep



      for Cat or Dog , use



      grep -l '<name>(Cat|Dog)</name>' *


      for Cat and Dog both to be present, use



      grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'


      and if you want case-insensitive search , then add -i option to grep



      -l - this option will print only filename having match



      With normal regex, the characters (, | and ) need to be escaped, so I have escaped them






      share|improve this answer



















      • 1




        Basic regular expression does not support alternation with | nor with |. You have to either use grep -E to enable extended regular expressions or specify that you're using GNU grep (which does support alternation with | in basic regular expressions).
        – Kusalananda
        May 18 at 5:34














      up vote
      1
      down vote













      Following code is based on GNU grep



      As you said , that all files are like this, so you can use grep



      for Cat or Dog , use



      grep -l '<name>(Cat|Dog)</name>' *


      for Cat and Dog both to be present, use



      grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'


      and if you want case-insensitive search , then add -i option to grep



      -l - this option will print only filename having match



      With normal regex, the characters (, | and ) need to be escaped, so I have escaped them






      share|improve this answer



















      • 1




        Basic regular expression does not support alternation with | nor with |. You have to either use grep -E to enable extended regular expressions or specify that you're using GNU grep (which does support alternation with | in basic regular expressions).
        – Kusalananda
        May 18 at 5:34












      up vote
      1
      down vote










      up vote
      1
      down vote









      Following code is based on GNU grep



      As you said , that all files are like this, so you can use grep



      for Cat or Dog , use



      grep -l '<name>(Cat|Dog)</name>' *


      for Cat and Dog both to be present, use



      grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'


      and if you want case-insensitive search , then add -i option to grep



      -l - this option will print only filename having match



      With normal regex, the characters (, | and ) need to be escaped, so I have escaped them






      share|improve this answer















      Following code is based on GNU grep



      As you said , that all files are like this, so you can use grep



      for Cat or Dog , use



      grep -l '<name>(Cat|Dog)</name>' *


      for Cat and Dog both to be present, use



      grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'


      and if you want case-insensitive search , then add -i option to grep



      -l - this option will print only filename having match



      With normal regex, the characters (, | and ) need to be escaped, so I have escaped them







      share|improve this answer















      share|improve this answer



      share|improve this answer








      edited May 18 at 5:37


























      answered May 18 at 5:16









      mkmayank

      36310




      36310







      • 1




        Basic regular expression does not support alternation with | nor with |. You have to either use grep -E to enable extended regular expressions or specify that you're using GNU grep (which does support alternation with | in basic regular expressions).
        – Kusalananda
        May 18 at 5:34












      • 1




        Basic regular expression does not support alternation with | nor with |. You have to either use grep -E to enable extended regular expressions or specify that you're using GNU grep (which does support alternation with | in basic regular expressions).
        – Kusalananda
        May 18 at 5:34







      1




      1




      Basic regular expression does not support alternation with | nor with |. You have to either use grep -E to enable extended regular expressions or specify that you're using GNU grep (which does support alternation with | in basic regular expressions).
      – Kusalananda
      May 18 at 5:34




      Basic regular expression does not support alternation with | nor with |. You have to either use grep -E to enable extended regular expressions or specify that you're using GNU grep (which does support alternation with | in basic regular expressions).
      – Kusalananda
      May 18 at 5:34












      up vote
      1
      down vote













      To get all the Cat or Dog values out of the name node in an XML document like yours, you may use xmlstarlet like this:



      xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml


      This would generate the words Cat and Dog as output if they exist the document as the values of an object node's name child-node. This operation would be tricky to get right with grep in case there are other name nodes that are not child-nodes to object nodes, or if some name nodes have attributes etc.



      Unfortunately, xmlstarlet does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep at the end of this to check whether we got any output at all (this will be used in the next step):



      xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'


      We can then run this on all the 10k files though find:



      find . -type f -name '*.xml' -exec sh -c '
      xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
      grep -q "."' sh ';' -print


      This would first find all regular files in or below the current directory whose names end with .xml. For each such file, xmlstarlet is run to extract the Cat and Dog strings from the correct XML nodes, and grep is used to check whether xmlstarlet found anything. Running grep with its -q option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.



      If grep found anything, find then prints the pathname of the file that contained the data.






      share|improve this answer



























        up vote
        1
        down vote













        To get all the Cat or Dog values out of the name node in an XML document like yours, you may use xmlstarlet like this:



        xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml


        This would generate the words Cat and Dog as output if they exist the document as the values of an object node's name child-node. This operation would be tricky to get right with grep in case there are other name nodes that are not child-nodes to object nodes, or if some name nodes have attributes etc.



        Unfortunately, xmlstarlet does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep at the end of this to check whether we got any output at all (this will be used in the next step):



        xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'


        We can then run this on all the 10k files though find:



        find . -type f -name '*.xml' -exec sh -c '
        xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
        grep -q "."' sh ';' -print


        This would first find all regular files in or below the current directory whose names end with .xml. For each such file, xmlstarlet is run to extract the Cat and Dog strings from the correct XML nodes, and grep is used to check whether xmlstarlet found anything. Running grep with its -q option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.



        If grep found anything, find then prints the pathname of the file that contained the data.






        share|improve this answer

























          up vote
          1
          down vote










          up vote
          1
          down vote









          To get all the Cat or Dog values out of the name node in an XML document like yours, you may use xmlstarlet like this:



          xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml


          This would generate the words Cat and Dog as output if they exist the document as the values of an object node's name child-node. This operation would be tricky to get right with grep in case there are other name nodes that are not child-nodes to object nodes, or if some name nodes have attributes etc.



          Unfortunately, xmlstarlet does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep at the end of this to check whether we got any output at all (this will be used in the next step):



          xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'


          We can then run this on all the 10k files though find:



          find . -type f -name '*.xml' -exec sh -c '
          xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
          grep -q "."' sh ';' -print


          This would first find all regular files in or below the current directory whose names end with .xml. For each such file, xmlstarlet is run to extract the Cat and Dog strings from the correct XML nodes, and grep is used to check whether xmlstarlet found anything. Running grep with its -q option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.



          If grep found anything, find then prints the pathname of the file that contained the data.






          share|improve this answer















          To get all the Cat or Dog values out of the name node in an XML document like yours, you may use xmlstarlet like this:



          xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml


          This would generate the words Cat and Dog as output if they exist the document as the values of an object node's name child-node. This operation would be tricky to get right with grep in case there are other name nodes that are not child-nodes to object nodes, or if some name nodes have attributes etc.



          Unfortunately, xmlstarlet does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep at the end of this to check whether we got any output at all (this will be used in the next step):



          xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'


          We can then run this on all the 10k files though find:



          find . -type f -name '*.xml' -exec sh -c '
          xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
          grep -q "."' sh ';' -print


          This would first find all regular files in or below the current directory whose names end with .xml. For each such file, xmlstarlet is run to extract the Cat and Dog strings from the correct XML nodes, and grep is used to check whether xmlstarlet found anything. Running grep with its -q option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.



          If grep found anything, find then prints the pathname of the file that contained the data.







          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited May 18 at 8:53


























          answered May 18 at 5:59









          Kusalananda

          102k13199314




          102k13199314




















              up vote
              0
              down vote













              If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.



              Example:



              $ glimpseindex -H . MyDir
              $ glimpse -l -H . 'cat;dog'


              to get the files containing cad and dog






              share|improve this answer

























                up vote
                0
                down vote













                If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.



                Example:



                $ glimpseindex -H . MyDir
                $ glimpse -l -H . 'cat;dog'


                to get the files containing cad and dog






                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.



                  Example:



                  $ glimpseindex -H . MyDir
                  $ glimpse -l -H . 'cat;dog'


                  to get the files containing cad and dog






                  share|improve this answer













                  If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.



                  Example:



                  $ glimpseindex -H . MyDir
                  $ glimpse -l -H . 'cat;dog'


                  to get the files containing cad and dog







                  share|improve this answer













                  share|improve this answer



                  share|improve this answer











                  answered May 18 at 8:32









                  JJoao

                  6,6831826




                  6,6831826






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444517%2ffind-xml-files-with-specific-values%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Displaying single band from multi-band raster using QGIS

                      How many registers does an x86_64 CPU actually have?