wget - Mirroring a full website with requisites on different hosts

Multi tool use
Multi tool use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite
2












I am trying to make a full static copy of a Wordpress website with wget to be browsed without any network connection (all links and images must be converted).



The different requisites for the pages (images, css, js, ...) are on 3 different Wordpress hosts and are always on the same wp-content/uploads directories.



I tried to limit the recursion on the other domains to wp-content/uploads directories with --domains and --include-directories, but I can't limit wget to fetch only these directories on the $URL1 and $URL2.



Here is the command line (which don't limit to $URL0 and [$URL1|$URL2]/wp-content/uploads ) :



wget --convert-links --recursive -l inf -N -e robots=off -R -nc 
--default-page=index.html -E -D$URL1,$URL2,$URL0 --page-requisites
-B$URL0 -X$URL1,$URL2 --cut-dirs=1 -I*/wp-content/uploads/*, -H -F $URL0


Is there any possibility to limit wget's recursion on the other domains to only some directories?










share|improve this question



















  • 1




    Do I understand correctly that you want only directories below wp-content/uploads? If so, is the -np (no parent) flag what you're looking for?
    – Kevin
    Dec 13 '11 at 19:18














up vote
4
down vote

favorite
2












I am trying to make a full static copy of a Wordpress website with wget to be browsed without any network connection (all links and images must be converted).



The different requisites for the pages (images, css, js, ...) are on 3 different Wordpress hosts and are always on the same wp-content/uploads directories.



I tried to limit the recursion on the other domains to wp-content/uploads directories with --domains and --include-directories, but I can't limit wget to fetch only these directories on the $URL1 and $URL2.



Here is the command line (which don't limit to $URL0 and [$URL1|$URL2]/wp-content/uploads ) :



wget --convert-links --recursive -l inf -N -e robots=off -R -nc 
--default-page=index.html -E -D$URL1,$URL2,$URL0 --page-requisites
-B$URL0 -X$URL1,$URL2 --cut-dirs=1 -I*/wp-content/uploads/*, -H -F $URL0


Is there any possibility to limit wget's recursion on the other domains to only some directories?










share|improve this question



















  • 1




    Do I understand correctly that you want only directories below wp-content/uploads? If so, is the -np (no parent) flag what you're looking for?
    – Kevin
    Dec 13 '11 at 19:18












up vote
4
down vote

favorite
2









up vote
4
down vote

favorite
2






2





I am trying to make a full static copy of a Wordpress website with wget to be browsed without any network connection (all links and images must be converted).



The different requisites for the pages (images, css, js, ...) are on 3 different Wordpress hosts and are always on the same wp-content/uploads directories.



I tried to limit the recursion on the other domains to wp-content/uploads directories with --domains and --include-directories, but I can't limit wget to fetch only these directories on the $URL1 and $URL2.



Here is the command line (which don't limit to $URL0 and [$URL1|$URL2]/wp-content/uploads ) :



wget --convert-links --recursive -l inf -N -e robots=off -R -nc 
--default-page=index.html -E -D$URL1,$URL2,$URL0 --page-requisites
-B$URL0 -X$URL1,$URL2 --cut-dirs=1 -I*/wp-content/uploads/*, -H -F $URL0


Is there any possibility to limit wget's recursion on the other domains to only some directories?










share|improve this question















I am trying to make a full static copy of a Wordpress website with wget to be browsed without any network connection (all links and images must be converted).



The different requisites for the pages (images, css, js, ...) are on 3 different Wordpress hosts and are always on the same wp-content/uploads directories.



I tried to limit the recursion on the other domains to wp-content/uploads directories with --domains and --include-directories, but I can't limit wget to fetch only these directories on the $URL1 and $URL2.



Here is the command line (which don't limit to $URL0 and [$URL1|$URL2]/wp-content/uploads ) :



wget --convert-links --recursive -l inf -N -e robots=off -R -nc 
--default-page=index.html -E -D$URL1,$URL2,$URL0 --page-requisites
-B$URL0 -X$URL1,$URL2 --cut-dirs=1 -I*/wp-content/uploads/*, -H -F $URL0


Is there any possibility to limit wget's recursion on the other domains to only some directories?







regular-expression wget hosts domain






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 8 at 0:52









Jeff Schaller

33.1k849111




33.1k849111










asked Oct 19 '11 at 22:34









user11689

212




212







  • 1




    Do I understand correctly that you want only directories below wp-content/uploads? If so, is the -np (no parent) flag what you're looking for?
    – Kevin
    Dec 13 '11 at 19:18












  • 1




    Do I understand correctly that you want only directories below wp-content/uploads? If so, is the -np (no parent) flag what you're looking for?
    – Kevin
    Dec 13 '11 at 19:18







1




1




Do I understand correctly that you want only directories below wp-content/uploads? If so, is the -np (no parent) flag what you're looking for?
– Kevin
Dec 13 '11 at 19:18




Do I understand correctly that you want only directories below wp-content/uploads? If so, is the -np (no parent) flag what you're looking for?
– Kevin
Dec 13 '11 at 19:18










2 Answers
2






active

oldest

votes

















up vote
1
down vote













wget --mirror --convert-links yourdomain.com





share|improve this answer






















  • This seems like it does the opposite of what he asked; the man page says --mirror "sets infinite recursion depth"
    – Michael Mrozek♦
    Nov 7 '11 at 12:29






  • 3




    Also, could you tell us a bit about what the command actually does. Simply stating a command is not enough.
    – n0pe
    Nov 8 '11 at 3:05

















up vote
0
down vote













Think you might be looking for the include_directories switch?



From the manual:




‘include_directories = list’
‘-I’ option accepts a comma-separated list of directories included in the retrieval. Any other directories will simply be ignored. The directories are absolute paths.
So, if you wish to download from ‘http://host/people/bozo/’ following only links to bozo's colleagues in the /people directory and the bogus scripts in /cgi-bin, you can specify:




 wget -I /people,/cgi-bin http://host/people/bozo/





share|improve this answer




















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22961%2fwget-mirroring-a-full-website-with-requisites-on-different-hosts%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    wget --mirror --convert-links yourdomain.com





    share|improve this answer






















    • This seems like it does the opposite of what he asked; the man page says --mirror "sets infinite recursion depth"
      – Michael Mrozek♦
      Nov 7 '11 at 12:29






    • 3




      Also, could you tell us a bit about what the command actually does. Simply stating a command is not enough.
      – n0pe
      Nov 8 '11 at 3:05














    up vote
    1
    down vote













    wget --mirror --convert-links yourdomain.com





    share|improve this answer






















    • This seems like it does the opposite of what he asked; the man page says --mirror "sets infinite recursion depth"
      – Michael Mrozek♦
      Nov 7 '11 at 12:29






    • 3




      Also, could you tell us a bit about what the command actually does. Simply stating a command is not enough.
      – n0pe
      Nov 8 '11 at 3:05












    up vote
    1
    down vote










    up vote
    1
    down vote









    wget --mirror --convert-links yourdomain.com





    share|improve this answer














    wget --mirror --convert-links yourdomain.com






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 7 '11 at 12:29









    Michael Mrozek♦

    58.8k27184207




    58.8k27184207










    answered Nov 7 '11 at 9:38









    Peter

    111




    111











    • This seems like it does the opposite of what he asked; the man page says --mirror "sets infinite recursion depth"
      – Michael Mrozek♦
      Nov 7 '11 at 12:29






    • 3




      Also, could you tell us a bit about what the command actually does. Simply stating a command is not enough.
      – n0pe
      Nov 8 '11 at 3:05
















    • This seems like it does the opposite of what he asked; the man page says --mirror "sets infinite recursion depth"
      – Michael Mrozek♦
      Nov 7 '11 at 12:29






    • 3




      Also, could you tell us a bit about what the command actually does. Simply stating a command is not enough.
      – n0pe
      Nov 8 '11 at 3:05















    This seems like it does the opposite of what he asked; the man page says --mirror "sets infinite recursion depth"
    – Michael Mrozek♦
    Nov 7 '11 at 12:29




    This seems like it does the opposite of what he asked; the man page says --mirror "sets infinite recursion depth"
    – Michael Mrozek♦
    Nov 7 '11 at 12:29




    3




    3




    Also, could you tell us a bit about what the command actually does. Simply stating a command is not enough.
    – n0pe
    Nov 8 '11 at 3:05




    Also, could you tell us a bit about what the command actually does. Simply stating a command is not enough.
    – n0pe
    Nov 8 '11 at 3:05












    up vote
    0
    down vote













    Think you might be looking for the include_directories switch?



    From the manual:




    ‘include_directories = list’
    ‘-I’ option accepts a comma-separated list of directories included in the retrieval. Any other directories will simply be ignored. The directories are absolute paths.
    So, if you wish to download from ‘http://host/people/bozo/’ following only links to bozo's colleagues in the /people directory and the bogus scripts in /cgi-bin, you can specify:




     wget -I /people,/cgi-bin http://host/people/bozo/





    share|improve this answer
























      up vote
      0
      down vote













      Think you might be looking for the include_directories switch?



      From the manual:




      ‘include_directories = list’
      ‘-I’ option accepts a comma-separated list of directories included in the retrieval. Any other directories will simply be ignored. The directories are absolute paths.
      So, if you wish to download from ‘http://host/people/bozo/’ following only links to bozo's colleagues in the /people directory and the bogus scripts in /cgi-bin, you can specify:




       wget -I /people,/cgi-bin http://host/people/bozo/





      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        Think you might be looking for the include_directories switch?



        From the manual:




        ‘include_directories = list’
        ‘-I’ option accepts a comma-separated list of directories included in the retrieval. Any other directories will simply be ignored. The directories are absolute paths.
        So, if you wish to download from ‘http://host/people/bozo/’ following only links to bozo's colleagues in the /people directory and the bogus scripts in /cgi-bin, you can specify:




         wget -I /people,/cgi-bin http://host/people/bozo/





        share|improve this answer












        Think you might be looking for the include_directories switch?



        From the manual:




        ‘include_directories = list’
        ‘-I’ option accepts a comma-separated list of directories included in the retrieval. Any other directories will simply be ignored. The directories are absolute paths.
        So, if you wish to download from ‘http://host/people/bozo/’ following only links to bozo's colleagues in the /people directory and the bogus scripts in /cgi-bin, you can specify:




         wget -I /people,/cgi-bin http://host/people/bozo/






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jul 13 '12 at 12:28









        James

        1012




        1012



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22961%2fwget-mirroring-a-full-website-with-requisites-on-different-hosts%23new-answer', 'question_page');

            );

            Post as a guest













































































            xmpbtQf1iX8 wh7oedS8DE3NP Ee3 oW0 f YD5JgP wk CImi76F,AhOabU9
            WgYsTQDIWKf,1SQT0v70TBU6lIHH4OiUDjkikB Fmy,UkbIw2jNmWmd6apn QZX50PRWPvENj04NOoX aWhlRaN11i

            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            How many registers does an x86_64 CPU actually have?

            Displaying single band from multi-band raster using QGIS