wget --spider: how to tell where broken links are coming from
Clash Royale CLAN TAG#URR8PPP
I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)
The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt
. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?
wget
add a comment |
I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)
The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt
. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?
wget
I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34
add a comment |
I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)
The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt
. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?
wget
I use wget's built-in spider mode as a convenience sometimes to quickly check a local site for broken links. This morning I turned its attention to a production site that we'd just put major changes on, and it's coming up with 3 broken links, but it seems impossible to tell where they are! (It only says what they're linking to and there's no straightforward way of relating that alone back to a page.)
The options I'm currently using are wget -r -nv --spider http://www.domain.com/ -o /path/to/log.txt
. Does anyone know of an option I'm overlooking, a way to read the output, or even a simple substitute for this command that will also let me know what file the links appear in (and ideally a line #)?
wget
wget
asked Jun 29 '12 at 16:49
user19866
I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34
add a comment |
I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34
I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34
I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34
add a comment |
2 Answers
2
active
oldest
votes
You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404
's in the log file and pull the referrer
field. That will tell you the page that contains the broken link.
It should then just be a matter of examining that page for the offending link.
Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52
This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15
add a comment |
A good way (not involving the webserver logs) is to use the --debug
flag and grep for ^Referer:
On the command line:
wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'
You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f41949%2fwget-spider-how-to-tell-where-broken-links-are-coming-from%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404
's in the log file and pull the referrer
field. That will tell you the page that contains the broken link.
It should then just be a matter of examining that page for the offending link.
Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52
This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15
add a comment |
You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404
's in the log file and pull the referrer
field. That will tell you the page that contains the broken link.
It should then just be a matter of examining that page for the offending link.
Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52
This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15
add a comment |
You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404
's in the log file and pull the referrer
field. That will tell you the page that contains the broken link.
It should then just be a matter of examining that page for the offending link.
You should be able to watch the web server logs, in conjunction with the wget run. Look for the 404
's in the log file and pull the referrer
field. That will tell you the page that contains the broken link.
It should then just be a matter of examining that page for the offending link.
answered Jun 29 '12 at 18:48
bahamat
24.2k14890
24.2k14890
Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52
This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15
add a comment |
Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52
This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15
Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52
Good idea. I forgot I asked this on here, actually! What I ended up doing was using it in combination with grep on my local copy of the site (particularly using the -n option to get line numbers).
– user19866
Jul 10 '12 at 19:52
This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15
This is good for broken internal links, but not for links to external sites.
– Screenack
Jan 10 '18 at 20:15
add a comment |
A good way (not involving the webserver logs) is to use the --debug
flag and grep for ^Referer:
On the command line:
wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'
You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug
add a comment |
A good way (not involving the webserver logs) is to use the --debug
flag and grep for ^Referer:
On the command line:
wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'
You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug
add a comment |
A good way (not involving the webserver logs) is to use the --debug
flag and grep for ^Referer:
On the command line:
wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'
You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug
A good way (not involving the webserver logs) is to use the --debug
flag and grep for ^Referer:
On the command line:
wget -r -nv --spider http://www.domain.com/ 2>&1 | egrep -A 1 '(^---response end---$|^--[0-9]4-[0-9]2-[0-9]2|^[0-9]4-[0-9]2-[0-9]2 ERROR|^Referer:|^Remote file does not)'
You can do similar grepping on your log. Caveat: some wget are not compiled with the support for --debug
answered May 31 '16 at 10:56
Tsojcanth
1213
1213
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f41949%2fwget-spider-how-to-tell-where-broken-links-are-coming-from%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I get this while working on zedboard. ![enter image description here](i.stack.imgur.com/SkNpQ.png)
– Saj
Dec 20 '18 at 8:34