wget --spider: how to tell where broken links are coming from

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












6














I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)



The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?










share|improve this question





















  • I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
    – Saj
    Dec 20 '18 at 8:34
















6














I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)



The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?










share|improve this question





















  • I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
    – Saj
    Dec 20 '18 at 8:34














6












6








6


1





I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)



The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?










share|improve this question













I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)



The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?







wget






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jun 29 '12 at 16:49







user19866


















  • I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
    – Saj
    Dec 20 '18 at 8:34

















  • I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
    – Saj
    Dec 20 '18 at 8:34
















I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34





I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34











2 Answers
2






active

oldest

votes


















3














You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404's in the log file and pull the referrer field. That will tell you the page that contains the broken link.



It should then just be a matter of examining that page for the offending link.






share|improve this answer




















  • Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
    – user19866
    Jul 10 '12 at 19:52










  • This is good for broken internal links, but not for links to external sites.
    – Screenack
    Jan 10 '18 at 20:15


















2














A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:



On the command line:



wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'


You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug






share|improve this answer




















    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f41949%2fwget-spider-how-to-tell-where-broken-links-are-coming-from%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown
























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404's in the log file and pull the referrer field. That will tell you the page that contains the broken link.



    It should then just be a matter of examining that page for the offending link.






    share|improve this answer




















    • Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
      – user19866
      Jul 10 '12 at 19:52










    • This is good for broken internal links, but not for links to external sites.
      – Screenack
      Jan 10 '18 at 20:15















    3














    You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404's in the log file and pull the referrer field. That will tell you the page that contains the broken link.



    It should then just be a matter of examining that page for the offending link.






    share|improve this answer




















    • Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
      – user19866
      Jul 10 '12 at 19:52










    • This is good for broken internal links, but not for links to external sites.
      – Screenack
      Jan 10 '18 at 20:15













    3












    3








    3






    You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404's in the log file and pull the referrer field. That will tell you the page that contains the broken link.



    It should then just be a matter of examining that page for the offending link.






    share|improve this answer












    You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404's in the log file and pull the referrer field. That will tell you the page that contains the broken link.



    It should then just be a matter of examining that page for the offending link.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jun 29 '12 at 18:48









    bahamat

    24.2k14890




    24.2k14890











    • Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
      – user19866
      Jul 10 '12 at 19:52










    • This is good for broken internal links, but not for links to external sites.
      – Screenack
      Jan 10 '18 at 20:15
















    • Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
      – user19866
      Jul 10 '12 at 19:52










    • This is good for broken internal links, but not for links to external sites.
      – Screenack
      Jan 10 '18 at 20:15















    Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
    – user19866
    Jul 10 '12 at 19:52




    Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
    – user19866
    Jul 10 '12 at 19:52












    This is good for broken internal links, but not for links to external sites.
    – Screenack
    Jan 10 '18 at 20:15




    This is good for broken internal links, but not for links to external sites.
    – Screenack
    Jan 10 '18 at 20:15













    2














    A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:



    On the command line:



    wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'


    You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug






    share|improve this answer

























      2














      A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:



      On the command line:



      wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'


      You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug






      share|improve this answer























        2












        2








        2






        A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:



        On the command line:



        wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'


        You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug






        share|improve this answer












        A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:



        On the command line:



        wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'


        You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 31 '16 at 10:56









        Tsojcanth

        1213




        1213



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f41949%2fwget-spider-how-to-tell-where-broken-links-are-coming-from%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?