wget --spider: how to tell where broken links are coming from

I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)

The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?

asked Jun 29 '12 at 16:49

user19866

I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34

add a comment |

asked Jun 29 '12 at 16:49

user19866

I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34

add a comment |

asked Jun 29 '12 at 16:49

user19866

wget

asked Jun 29 '12 at 16:49

user19866

asked Jun 29 '12 at 16:49

user19866

asked Jun 29 '12 at 16:49

user19866

asked Jun 29 '12 at 16:49

user19866

asked Jun 29 '12 at 16:49

user19866

I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34

add a comment |

I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34

I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34

add a comment |

2 Answers
2

active

oldest

votes

You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404's in the log file and pull the referrer field. That will tell you the page that contains the broken link.

It should then just be a matter of examining that page for the offending link.

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52

This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15

add a comment |

A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:

On the command line:

wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'

You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug

answered May 31 '16 at 10:56

Tsojcanth

1213

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f41949%2fwget-spider-how-to-tell-where-broken-links-are-coming-from%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

It should then just be a matter of examining that page for the offending link.

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52

This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15

add a comment |

It should then just be a matter of examining that page for the offending link.

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52

This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15

add a comment |

It should then just be a matter of examining that page for the offending link.

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

It should then just be a matter of examining that page for the offending link.

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

answered Jun 29 '12 at 18:48

bahamat

24.2k14890

Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52

This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15

add a comment |

Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52

This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15

Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52

This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15

add a comment |

A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:

On the command line:

wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'

You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug

answered May 31 '16 at 10:56

Tsojcanth

1213

add a comment |

A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:

On the command line:

wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'

You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug

answered May 31 '16 at 10:56

Tsojcanth

1213

add a comment |

A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:

On the command line:

wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'

You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug

answered May 31 '16 at 10:56

Tsojcanth

1213

A good way (not involving the webserver logs) is to use the --debug flag and grep for ^Referer:

On the command line:

wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'

You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug

answered May 31 '16 at 10:56

Tsojcanth

1213

answered May 31 '16 at 10:56

Tsojcanth

1213

answered May 31 '16 at 10:56

Tsojcanth

1213

answered May 31 '16 at 10:56

Tsojcanth

1213

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu