Find XML files with specific values
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a folder with ~10K XML files. Each of them looks like this:
...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...
The name
includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?
linux command-line find xml
add a comment |Â
up vote
1
down vote
favorite
I have a folder with ~10K XML files. Each of them looks like this:
...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...
The name
includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?
linux command-line find xml
1
I use an app called PowerGrep, but have to switch to windows tho
â Huyen
May 18 at 6:24
Should<attribute><name>Cat</name></attribute>
also give a hit?
â Kusalananda
May 18 at 7:53
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a folder with ~10K XML files. Each of them looks like this:
...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...
The name
includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?
linux command-line find xml
I have a folder with ~10K XML files. Each of them looks like this:
...
<object>
<name>Cat</name>
</object>
<object>
<name>Cow</name>
</object>
...
The name
includes person, cat, dog, cow, ... I want to pick out the only xml files with cat and/or dog. How can I do this?
linux command-line find xml
edited May 18 at 7:37
karel
706817
706817
asked May 18 at 5:06
Huyen
62
62
1
I use an app called PowerGrep, but have to switch to windows tho
â Huyen
May 18 at 6:24
Should<attribute><name>Cat</name></attribute>
also give a hit?
â Kusalananda
May 18 at 7:53
add a comment |Â
1
I use an app called PowerGrep, but have to switch to windows tho
â Huyen
May 18 at 6:24
Should<attribute><name>Cat</name></attribute>
also give a hit?
â Kusalananda
May 18 at 7:53
1
1
I use an app called PowerGrep, but have to switch to windows tho
â Huyen
May 18 at 6:24
I use an app called PowerGrep, but have to switch to windows tho
â Huyen
May 18 at 6:24
Should
<attribute><name>Cat</name></attribute>
also give a hit?â Kusalananda
May 18 at 7:53
Should
<attribute><name>Cat</name></attribute>
also give a hit?â Kusalananda
May 18 at 7:53
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
1
down vote
Following code is based on GNU grep
As you said , that all files are like this, so you can use grep
for Cat or Dog , use
grep -l '<name>(Cat|Dog)</name>' *
for Cat and Dog both to be present, use
grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'
and if you want case-insensitive search , then add -i
option to grep
-l
- this option will print only filename having match
With normal regex, the characters (
, |
and )
need to be escaped, so I have escaped them
1
Basic regular expression does not support alternation with|
nor with|
. You have to either usegrep -E
to enable extended regular expressions or specify that you're using GNUgrep
(which does support alternation with|
in basic regular expressions).
â Kusalananda
May 18 at 5:34
add a comment |Â
up vote
1
down vote
To get all the Cat
or Dog
values out of the name
node in an XML document like yours, you may use xmlstarlet
like this:
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml
This would generate the words Cat
and Dog
as output if they exist the document as the values of an object
node's name
child-node. This operation would be tricky to get right with grep
in case there are other name
nodes that are not child-nodes to object
nodes, or if some name
nodes have attributes etc.
Unfortunately, xmlstarlet
does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep
at the end of this to check whether we got any output at all (this will be used in the next step):
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'
We can then run this on all the 10k files though find
:
find . -type f -name '*.xml' -exec sh -c '
xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
grep -q "."' sh ';' -print
This would first find all regular files in or below the current directory whose names end with .xml
. For each such file, xmlstarlet
is run to extract the Cat
and Dog
strings from the correct XML nodes, and grep
is used to check whether xmlstarlet
found anything. Running grep
with its -q
option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.
If grep
found anything, find
then prints the pathname of the file that contained the data.
add a comment |Â
up vote
0
down vote
If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.
Example:
$ glimpseindex -H . MyDir
$ glimpse -l -H . 'cat;dog'
to get the files containing cad and dog
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Following code is based on GNU grep
As you said , that all files are like this, so you can use grep
for Cat or Dog , use
grep -l '<name>(Cat|Dog)</name>' *
for Cat and Dog both to be present, use
grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'
and if you want case-insensitive search , then add -i
option to grep
-l
- this option will print only filename having match
With normal regex, the characters (
, |
and )
need to be escaped, so I have escaped them
1
Basic regular expression does not support alternation with|
nor with|
. You have to either usegrep -E
to enable extended regular expressions or specify that you're using GNUgrep
(which does support alternation with|
in basic regular expressions).
â Kusalananda
May 18 at 5:34
add a comment |Â
up vote
1
down vote
Following code is based on GNU grep
As you said , that all files are like this, so you can use grep
for Cat or Dog , use
grep -l '<name>(Cat|Dog)</name>' *
for Cat and Dog both to be present, use
grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'
and if you want case-insensitive search , then add -i
option to grep
-l
- this option will print only filename having match
With normal regex, the characters (
, |
and )
need to be escaped, so I have escaped them
1
Basic regular expression does not support alternation with|
nor with|
. You have to either usegrep -E
to enable extended regular expressions or specify that you're using GNUgrep
(which does support alternation with|
in basic regular expressions).
â Kusalananda
May 18 at 5:34
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Following code is based on GNU grep
As you said , that all files are like this, so you can use grep
for Cat or Dog , use
grep -l '<name>(Cat|Dog)</name>' *
for Cat and Dog both to be present, use
grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'
and if you want case-insensitive search , then add -i
option to grep
-l
- this option will print only filename having match
With normal regex, the characters (
, |
and )
need to be escaped, so I have escaped them
Following code is based on GNU grep
As you said , that all files are like this, so you can use grep
for Cat or Dog , use
grep -l '<name>(Cat|Dog)</name>' *
for Cat and Dog both to be present, use
grep -l '<name>Cat</name>' * | xargs grep -l '<name>Dog</name>'
and if you want case-insensitive search , then add -i
option to grep
-l
- this option will print only filename having match
With normal regex, the characters (
, |
and )
need to be escaped, so I have escaped them
edited May 18 at 5:37
answered May 18 at 5:16
mkmayank
36310
36310
1
Basic regular expression does not support alternation with|
nor with|
. You have to either usegrep -E
to enable extended regular expressions or specify that you're using GNUgrep
(which does support alternation with|
in basic regular expressions).
â Kusalananda
May 18 at 5:34
add a comment |Â
1
Basic regular expression does not support alternation with|
nor with|
. You have to either usegrep -E
to enable extended regular expressions or specify that you're using GNUgrep
(which does support alternation with|
in basic regular expressions).
â Kusalananda
May 18 at 5:34
1
1
Basic regular expression does not support alternation with
|
nor with |
. You have to either use grep -E
to enable extended regular expressions or specify that you're using GNU grep
(which does support alternation with |
in basic regular expressions).â Kusalananda
May 18 at 5:34
Basic regular expression does not support alternation with
|
nor with |
. You have to either use grep -E
to enable extended regular expressions or specify that you're using GNU grep
(which does support alternation with |
in basic regular expressions).â Kusalananda
May 18 at 5:34
add a comment |Â
up vote
1
down vote
To get all the Cat
or Dog
values out of the name
node in an XML document like yours, you may use xmlstarlet
like this:
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml
This would generate the words Cat
and Dog
as output if they exist the document as the values of an object
node's name
child-node. This operation would be tricky to get right with grep
in case there are other name
nodes that are not child-nodes to object
nodes, or if some name
nodes have attributes etc.
Unfortunately, xmlstarlet
does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep
at the end of this to check whether we got any output at all (this will be used in the next step):
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'
We can then run this on all the 10k files though find
:
find . -type f -name '*.xml' -exec sh -c '
xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
grep -q "."' sh ';' -print
This would first find all regular files in or below the current directory whose names end with .xml
. For each such file, xmlstarlet
is run to extract the Cat
and Dog
strings from the correct XML nodes, and grep
is used to check whether xmlstarlet
found anything. Running grep
with its -q
option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.
If grep
found anything, find
then prints the pathname of the file that contained the data.
add a comment |Â
up vote
1
down vote
To get all the Cat
or Dog
values out of the name
node in an XML document like yours, you may use xmlstarlet
like this:
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml
This would generate the words Cat
and Dog
as output if they exist the document as the values of an object
node's name
child-node. This operation would be tricky to get right with grep
in case there are other name
nodes that are not child-nodes to object
nodes, or if some name
nodes have attributes etc.
Unfortunately, xmlstarlet
does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep
at the end of this to check whether we got any output at all (this will be used in the next step):
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'
We can then run this on all the 10k files though find
:
find . -type f -name '*.xml' -exec sh -c '
xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
grep -q "."' sh ';' -print
This would first find all regular files in or below the current directory whose names end with .xml
. For each such file, xmlstarlet
is run to extract the Cat
and Dog
strings from the correct XML nodes, and grep
is used to check whether xmlstarlet
found anything. Running grep
with its -q
option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.
If grep
found anything, find
then prints the pathname of the file that contained the data.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
To get all the Cat
or Dog
values out of the name
node in an XML document like yours, you may use xmlstarlet
like this:
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml
This would generate the words Cat
and Dog
as output if they exist the document as the values of an object
node's name
child-node. This operation would be tricky to get right with grep
in case there are other name
nodes that are not child-nodes to object
nodes, or if some name
nodes have attributes etc.
Unfortunately, xmlstarlet
does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep
at the end of this to check whether we got any output at all (this will be used in the next step):
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'
We can then run this on all the 10k files though find
:
find . -type f -name '*.xml' -exec sh -c '
xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
grep -q "."' sh ';' -print
This would first find all regular files in or below the current directory whose names end with .xml
. For each such file, xmlstarlet
is run to extract the Cat
and Dog
strings from the correct XML nodes, and grep
is used to check whether xmlstarlet
found anything. Running grep
with its -q
option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.
If grep
found anything, find
then prints the pathname of the file that contained the data.
To get all the Cat
or Dog
values out of the name
node in an XML document like yours, you may use xmlstarlet
like this:
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml
This would generate the words Cat
and Dog
as output if they exist the document as the values of an object
node's name
child-node. This operation would be tricky to get right with grep
in case there are other name
nodes that are not child-nodes to object
nodes, or if some name
nodes have attributes etc.
Unfortunately, xmlstarlet
does not exit with a non-zero exit status if it can't find anything in the XML input file, so we need to tack on a grep
at the end of this to check whether we got any output at all (this will be used in the next step):
xmlstarlet sel -t -v '//object/name[text() = "Cat" or text() = "Dog"]' file.xml | grep '.'
We can then run this on all the 10k files though find
:
find . -type f -name '*.xml' -exec sh -c '
xmlstarlet sel -t -v "//object/name[text() = "Cat" or text() = "Dog"]" "$1" |
grep -q "."' sh ';' -print
This would first find all regular files in or below the current directory whose names end with .xml
. For each such file, xmlstarlet
is run to extract the Cat
and Dog
strings from the correct XML nodes, and grep
is used to check whether xmlstarlet
found anything. Running grep
with its -q
option makes the utility quiet, but it will exit with the appropriate exit status depending on whether it matched anything or not.
If grep
found anything, find
then prints the pathname of the file that contained the data.
edited May 18 at 8:53
answered May 18 at 5:59
Kusalananda
102k13199314
102k13199314
add a comment |Â
add a comment |Â
up vote
0
down vote
If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.
Example:
$ glimpseindex -H . MyDir
$ glimpse -l -H . 'cat;dog'
to get the files containing cad and dog
add a comment |Â
up vote
0
down vote
If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.
Example:
$ glimpseindex -H . MyDir
$ glimpse -l -H . 'cat;dog'
to get the files containing cad and dog
add a comment |Â
up vote
0
down vote
up vote
0
down vote
If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.
Example:
$ glimpseindex -H . MyDir
$ glimpse -l -H . 'cat;dog'
to get the files containing cad and dog
If you have many files consider the use of indexer tools like Beagle, Tracker, glimpse or similar.
Example:
$ glimpseindex -H . MyDir
$ glimpse -l -H . 'cat;dog'
to get the files containing cad and dog
answered May 18 at 8:32
JJoao
6,6831826
6,6831826
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444517%2ffind-xml-files-with-specific-values%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
I use an app called PowerGrep, but have to switch to windows tho
â Huyen
May 18 at 6:24
Should
<attribute><name>Cat</name></attribute>
also give a hit?â Kusalananda
May 18 at 7:53