Looking for duplicate instances of a tag in a file
Clash Royale CLAN TAG#URR8PPP
Mutliple snippets of code exist in a file similar to the following:
<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>
I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
I want to know if it is repeated again within the opening and closing blah tags.
I've tried multiple things but one of the latest was the following:
grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less
which returned nothing.
I also tried:
grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml
which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).
Should I use sed if I want to search for something that may or may not be on one line?
Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.
Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.
Expected results would look something like:
results for search a) showing all ramout results:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
results for search b) showing duplicate results would show:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
text-processing awk sed grep regular-expression
add a comment |
Mutliple snippets of code exist in a file similar to the following:
<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>
I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
I want to know if it is repeated again within the opening and closing blah tags.
I've tried multiple things but one of the latest was the following:
grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less
which returned nothing.
I also tried:
grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml
which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).
Should I use sed if I want to search for something that may or may not be on one line?
Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.
Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.
Expected results would look something like:
results for search a) showing all ramout results:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
results for search b) showing duplicate results would show:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
text-processing awk sed grep regular-expression
Related: stackoverflow.com/questions/1732348/…
– Kusalananda
Mar 12 '17 at 13:27
add a comment |
Mutliple snippets of code exist in a file similar to the following:
<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>
I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
I want to know if it is repeated again within the opening and closing blah tags.
I've tried multiple things but one of the latest was the following:
grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less
which returned nothing.
I also tried:
grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml
which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).
Should I use sed if I want to search for something that may or may not be on one line?
Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.
Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.
Expected results would look something like:
results for search a) showing all ramout results:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
results for search b) showing duplicate results would show:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
text-processing awk sed grep regular-expression
Mutliple snippets of code exist in a file similar to the following:
<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>
I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
I want to know if it is repeated again within the opening and closing blah tags.
I've tried multiple things but one of the latest was the following:
grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less
which returned nothing.
I also tried:
grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml
which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).
Should I use sed if I want to search for something that may or may not be on one line?
Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.
Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.
Expected results would look something like:
results for search a) showing all ramout results:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
results for search b) showing duplicate results would show:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
text-processing awk sed grep regular-expression
text-processing awk sed grep regular-expression
edited Mar 12 '17 at 3:03
Jeff Schaller
43.3k1160140
43.3k1160140
asked Mar 11 '17 at 23:42
regexnoobregexnoob
286
286
Related: stackoverflow.com/questions/1732348/…
– Kusalananda
Mar 12 '17 at 13:27
add a comment |
Related: stackoverflow.com/questions/1732348/…
– Kusalananda
Mar 12 '17 at 13:27
Related: stackoverflow.com/questions/1732348/…
– Kusalananda
Mar 12 '17 at 13:27
Related: stackoverflow.com/questions/1732348/…
– Kusalananda
Mar 12 '17 at 13:27
add a comment |
2 Answers
2
active
oldest
votes
Using XMLStarlet (sometimes installed as xmlstarlet
instead of just xml
) to extract the relevant tags, then sort
and uniq
to find the duplicates:
$ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
<ramout assot="f0123_fun10" bapel="2 or 6"/>
The xml
command will match all <ramout>
tags directly under the <blah>
tag, and for each of these copy the tag followed by a newline to standard output.
sort
sorts and uniq -d
will parse out any duplicate entries from the output of sort
.
I'm new to starlet so thank you... new tool for my toolbox.
– regexnoob
Mar 13 '17 at 18:42
add a comment |
Something like this works ok in my tests:
awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1
echo "Finding Cuplicates:"
awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
Finding Cuplicates:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
Test it online here
We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
Above two commands can be combined in one awk offourse, this was just a test.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f350831%2flooking-for-duplicate-instances-of-a-tag-in-a-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using XMLStarlet (sometimes installed as xmlstarlet
instead of just xml
) to extract the relevant tags, then sort
and uniq
to find the duplicates:
$ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
<ramout assot="f0123_fun10" bapel="2 or 6"/>
The xml
command will match all <ramout>
tags directly under the <blah>
tag, and for each of these copy the tag followed by a newline to standard output.
sort
sorts and uniq -d
will parse out any duplicate entries from the output of sort
.
I'm new to starlet so thank you... new tool for my toolbox.
– regexnoob
Mar 13 '17 at 18:42
add a comment |
Using XMLStarlet (sometimes installed as xmlstarlet
instead of just xml
) to extract the relevant tags, then sort
and uniq
to find the duplicates:
$ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
<ramout assot="f0123_fun10" bapel="2 or 6"/>
The xml
command will match all <ramout>
tags directly under the <blah>
tag, and for each of these copy the tag followed by a newline to standard output.
sort
sorts and uniq -d
will parse out any duplicate entries from the output of sort
.
I'm new to starlet so thank you... new tool for my toolbox.
– regexnoob
Mar 13 '17 at 18:42
add a comment |
Using XMLStarlet (sometimes installed as xmlstarlet
instead of just xml
) to extract the relevant tags, then sort
and uniq
to find the duplicates:
$ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
<ramout assot="f0123_fun10" bapel="2 or 6"/>
The xml
command will match all <ramout>
tags directly under the <blah>
tag, and for each of these copy the tag followed by a newline to standard output.
sort
sorts and uniq -d
will parse out any duplicate entries from the output of sort
.
Using XMLStarlet (sometimes installed as xmlstarlet
instead of just xml
) to extract the relevant tags, then sort
and uniq
to find the duplicates:
$ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
<ramout assot="f0123_fun10" bapel="2 or 6"/>
The xml
command will match all <ramout>
tags directly under the <blah>
tag, and for each of these copy the tag followed by a newline to standard output.
sort
sorts and uniq -d
will parse out any duplicate entries from the output of sort
.
edited Feb 16 at 15:01
answered Mar 12 '17 at 13:26
KusalanandaKusalananda
135k17255421
135k17255421
I'm new to starlet so thank you... new tool for my toolbox.
– regexnoob
Mar 13 '17 at 18:42
add a comment |
I'm new to starlet so thank you... new tool for my toolbox.
– regexnoob
Mar 13 '17 at 18:42
I'm new to starlet so thank you... new tool for my toolbox.
– regexnoob
Mar 13 '17 at 18:42
I'm new to starlet so thank you... new tool for my toolbox.
– regexnoob
Mar 13 '17 at 18:42
add a comment |
Something like this works ok in my tests:
awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1
echo "Finding Cuplicates:"
awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
Finding Cuplicates:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
Test it online here
We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
Above two commands can be combined in one awk offourse, this was just a test.
add a comment |
Something like this works ok in my tests:
awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1
echo "Finding Cuplicates:"
awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
Finding Cuplicates:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
Test it online here
We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
Above two commands can be combined in one awk offourse, this was just a test.
add a comment |
Something like this works ok in my tests:
awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1
echo "Finding Cuplicates:"
awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
Finding Cuplicates:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
Test it online here
We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
Above two commands can be combined in one awk offourse, this was just a test.
Something like this works ok in my tests:
awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1
echo "Finding Cuplicates:"
awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
Finding Cuplicates:
<ramout assot="f0123_fun10" bapel="2 or 6"/>
Test it online here
We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
Above two commands can be combined in one awk offourse, this was just a test.
edited Mar 12 '17 at 13:12
answered Mar 12 '17 at 13:05
George VasiliouGeorge Vasiliou
5,70531029
5,70531029
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f350831%2flooking-for-duplicate-instances-of-a-tag-in-a-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Related: stackoverflow.com/questions/1732348/…
– Kusalananda
Mar 12 '17 at 13:27