Looking for duplicate instances of a tag in a file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












3















Mutliple snippets of code exist in a file similar to the following:



<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>


I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:



<ramout assot="f0123_fun10" bapel="2 or 6"/> 


I want to know if it is repeated again within the opening and closing blah tags.



I've tried multiple things but one of the latest was the following:



grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less


which returned nothing.



I also tried:



 grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml


which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).



Should I use sed if I want to search for something that may or may not be on one line?



Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.



Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.



Expected results would look something like:



results for search a) showing all ramout results:



<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>


results for search b) showing duplicate results would show:



<ramout assot="f0123_fun10" bapel="2 or 6"/>









share|improve this question
























  • Related: stackoverflow.com/questions/1732348/…

    – Kusalananda
    Mar 12 '17 at 13:27















3















Mutliple snippets of code exist in a file similar to the following:



<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>


I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:



<ramout assot="f0123_fun10" bapel="2 or 6"/> 


I want to know if it is repeated again within the opening and closing blah tags.



I've tried multiple things but one of the latest was the following:



grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less


which returned nothing.



I also tried:



 grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml


which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).



Should I use sed if I want to search for something that may or may not be on one line?



Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.



Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.



Expected results would look something like:



results for search a) showing all ramout results:



<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>


results for search b) showing duplicate results would show:



<ramout assot="f0123_fun10" bapel="2 or 6"/>









share|improve this question
























  • Related: stackoverflow.com/questions/1732348/…

    – Kusalananda
    Mar 12 '17 at 13:27













3












3








3








Mutliple snippets of code exist in a file similar to the following:



<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>


I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:



<ramout assot="f0123_fun10" bapel="2 or 6"/> 


I want to know if it is repeated again within the opening and closing blah tags.



I've tried multiple things but one of the latest was the following:



grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less


which returned nothing.



I also tried:



 grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml


which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).



Should I use sed if I want to search for something that may or may not be on one line?



Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.



Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.



Expected results would look something like:



results for search a) showing all ramout results:



<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>


results for search b) showing duplicate results would show:



<ramout assot="f0123_fun10" bapel="2 or 6"/>









share|improve this question
















Mutliple snippets of code exist in a file similar to the following:



<blah>Spread the peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> on good looking bread <ramout assot="f0123_fun10" bapel="3 or 5"/> that does not have peanut butter <ramout assot="f0123_fun10" bapel="2 or 6"/> already on the bread this that and the other <ramout assot="f0123_fun10" bapel="4"/> with something else.</blah>


I am trying to find duplicate instances of the ramout tag in a single file.
If the following exists:



<ramout assot="f0123_fun10" bapel="2 or 6"/> 


I want to know if it is repeated again within the opening and closing blah tags.



I've tried multiple things but one of the latest was the following:



grep -Eoi '<blah>.*([[:space:]]<ramout assot).*1.*</blah>' *.xml | less


which returned nothing.



I also tried:



 grep -Eio '<blah>.*([[:space:]]<ramout assot="[a-z][0-9]5_fig[0-9]+" bapel="[0-9]+.*)' *.xml


which does not include the backreference but it also does not show all results. It looks like this is only showing the results that are one one line (do not span across a more than one line).



Should I use sed if I want to search for something that may or may not be on one line?



Is awk a viable candidate? I saw and tried: awk '/Start pattern/,/End pattern/' filename which returned more results but I am still not getting all results.



Any help being able to find a) all results in the entire file and separately b) all results that are duplicates within blah tags would be appreciated.



Expected results would look something like:



results for search a) showing all ramout results:



<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>


results for search b) showing duplicate results would show:



<ramout assot="f0123_fun10" bapel="2 or 6"/>






text-processing awk sed grep regular-expression






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 12 '17 at 3:03









Jeff Schaller

43.3k1160140




43.3k1160140










asked Mar 11 '17 at 23:42









regexnoobregexnoob

286




286












  • Related: stackoverflow.com/questions/1732348/…

    – Kusalananda
    Mar 12 '17 at 13:27

















  • Related: stackoverflow.com/questions/1732348/…

    – Kusalananda
    Mar 12 '17 at 13:27
















Related: stackoverflow.com/questions/1732348/…

– Kusalananda
Mar 12 '17 at 13:27





Related: stackoverflow.com/questions/1732348/…

– Kusalananda
Mar 12 '17 at 13:27










2 Answers
2






active

oldest

votes


















2














Using XMLStarlet (sometimes installed as xmlstarlet instead of just xml) to extract the relevant tags, then sort and uniq to find the duplicates:



$ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
<ramout assot="f0123_fun10" bapel="2 or 6"/>


The xml command will match all <ramout> tags directly under the <blah> tag, and for each of these copy the tag followed by a newline to standard output.



sort sorts and uniq -d will parse out any duplicate entries from the output of sort.






share|improve this answer

























  • I'm new to starlet so thank you... new tool for my toolbox.

    – regexnoob
    Mar 13 '17 at 18:42


















0














Something like this works ok in my tests:



awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1

echo "Finding Cuplicates:"
awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1

<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="3 or 5"/>
<ramout assot="f0123_fun10" bapel="2 or 6"/>
<ramout assot="f0123_fun10" bapel="4"/>
Finding Cuplicates:
<ramout assot="f0123_fun10" bapel="2 or 6"/>


Test it online here



We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
Above two commands can be combined in one awk offourse, this was just a test.






share|improve this answer
























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f350831%2flooking-for-duplicate-instances-of-a-tag-in-a-file%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Using XMLStarlet (sometimes installed as xmlstarlet instead of just xml) to extract the relevant tags, then sort and uniq to find the duplicates:



    $ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
    <ramout assot="f0123_fun10" bapel="2 or 6"/>


    The xml command will match all <ramout> tags directly under the <blah> tag, and for each of these copy the tag followed by a newline to standard output.



    sort sorts and uniq -d will parse out any duplicate entries from the output of sort.






    share|improve this answer

























    • I'm new to starlet so thank you... new tool for my toolbox.

      – regexnoob
      Mar 13 '17 at 18:42















    2














    Using XMLStarlet (sometimes installed as xmlstarlet instead of just xml) to extract the relevant tags, then sort and uniq to find the duplicates:



    $ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
    <ramout assot="f0123_fun10" bapel="2 or 6"/>


    The xml command will match all <ramout> tags directly under the <blah> tag, and for each of these copy the tag followed by a newline to standard output.



    sort sorts and uniq -d will parse out any duplicate entries from the output of sort.






    share|improve this answer

























    • I'm new to starlet so thank you... new tool for my toolbox.

      – regexnoob
      Mar 13 '17 at 18:42













    2












    2








    2







    Using XMLStarlet (sometimes installed as xmlstarlet instead of just xml) to extract the relevant tags, then sort and uniq to find the duplicates:



    $ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
    <ramout assot="f0123_fun10" bapel="2 or 6"/>


    The xml command will match all <ramout> tags directly under the <blah> tag, and for each of these copy the tag followed by a newline to standard output.



    sort sorts and uniq -d will parse out any duplicate entries from the output of sort.






    share|improve this answer















    Using XMLStarlet (sometimes installed as xmlstarlet instead of just xml) to extract the relevant tags, then sort and uniq to find the duplicates:



    $ xml sel -t -m '/blah/ramout' -c '.' -nl test.xml | sort | uniq -d
    <ramout assot="f0123_fun10" bapel="2 or 6"/>


    The xml command will match all <ramout> tags directly under the <blah> tag, and for each of these copy the tag followed by a newline to standard output.



    sort sorts and uniq -d will parse out any duplicate entries from the output of sort.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 16 at 15:01

























    answered Mar 12 '17 at 13:26









    KusalanandaKusalananda

    135k17255421




    135k17255421












    • I'm new to starlet so thank you... new tool for my toolbox.

      – regexnoob
      Mar 13 '17 at 18:42

















    • I'm new to starlet so thank you... new tool for my toolbox.

      – regexnoob
      Mar 13 '17 at 18:42
















    I'm new to starlet so thank you... new tool for my toolbox.

    – regexnoob
    Mar 13 '17 at 18:42





    I'm new to starlet so thank you... new tool for my toolbox.

    – regexnoob
    Mar 13 '17 at 18:42













    0














    Something like this works ok in my tests:



    awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1

    echo "Finding Cuplicates:"
    awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1

    <ramout assot="f0123_fun10" bapel="2 or 6"/>
    <ramout assot="f0123_fun10" bapel="3 or 5"/>
    <ramout assot="f0123_fun10" bapel="2 or 6"/>
    <ramout assot="f0123_fun10" bapel="4"/>
    Finding Cuplicates:
    <ramout assot="f0123_fun10" bapel="2 or 6"/>


    Test it online here



    We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
    Above two commands can be combined in one awk offourse, this was just a test.






    share|improve this answer





























      0














      Something like this works ok in my tests:



      awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1

      echo "Finding Cuplicates:"
      awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1

      <ramout assot="f0123_fun10" bapel="2 or 6"/>
      <ramout assot="f0123_fun10" bapel="3 or 5"/>
      <ramout assot="f0123_fun10" bapel="2 or 6"/>
      <ramout assot="f0123_fun10" bapel="4"/>
      Finding Cuplicates:
      <ramout assot="f0123_fun10" bapel="2 or 6"/>


      Test it online here



      We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
      Above two commands can be combined in one awk offourse, this was just a test.






      share|improve this answer



























        0












        0








        0







        Something like this works ok in my tests:



        awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1

        echo "Finding Cuplicates:"
        awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1

        <ramout assot="f0123_fun10" bapel="2 or 6"/>
        <ramout assot="f0123_fun10" bapel="3 or 5"/>
        <ramout assot="f0123_fun10" bapel="2 or 6"/>
        <ramout assot="f0123_fun10" bapel="4"/>
        Finding Cuplicates:
        <ramout assot="f0123_fun10" bapel="2 or 6"/>


        Test it online here



        We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
        Above two commands can be combined in one awk offourse, this was just a test.






        share|improve this answer















        Something like this works ok in my tests:



        awk -F"/>" -v RS="<ramout assot=" 'NR>1print RS $1 FS' file1

        echo "Finding Cuplicates:"
        awk -F"/>" -v RS="<ramout assot=" 'NR==1nextseen[$1]++==1print RS $1 FS' file1

        <ramout assot="f0123_fun10" bapel="2 or 6"/>
        <ramout assot="f0123_fun10" bapel="3 or 5"/>
        <ramout assot="f0123_fun10" bapel="2 or 6"/>
        <ramout assot="f0123_fun10" bapel="4"/>
        Finding Cuplicates:
        <ramout assot="f0123_fun10" bapel="2 or 6"/>


        Test it online here



        We make advantage of awk capabilitie to declare a custom record separator (RS) and custom field separator (FS).
        Above two commands can be combined in one awk offourse, this was just a test.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 12 '17 at 13:12

























        answered Mar 12 '17 at 13:05









        George VasiliouGeorge Vasiliou

        5,70531029




        5,70531029



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f350831%2flooking-for-duplicate-instances-of-a-tag-in-a-file%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Bahrain

            Postfix configuration issue with fips on centos 7; mailgun relay