robots.txt is redirecting to default page

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












2















Hullo,



Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



Notes:



  • This site is the only one with a separate service provider that I
    use.

  • CentOS release 6.10 (Final)

  • Webmin

  • robots.txt file permissions
    are 644









share|improve this question


























    2















    Hullo,



    Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



    I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



    Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



    Notes:



    • This site is the only one with a separate service provider that I
      use.

    • CentOS release 6.10 (Final)

    • Webmin

    • robots.txt file permissions
      are 644









    share|improve this question
























      2












      2








      2








      Hullo,



      Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



      I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



      Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



      Notes:



      • This site is the only one with a separate service provider that I
        use.

      • CentOS release 6.10 (Final)

      • Webmin

      • robots.txt file permissions
        are 644









      share|improve this question














      Hullo,



      Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



      I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



      Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



      Notes:



      • This site is the only one with a separate service provider that I
        use.

      • CentOS release 6.10 (Final)

      • Webmin

      • robots.txt file permissions
        are 644






      redirect robots.txt






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Feb 6 at 21:34









      ParapluieParapluie

      1157




      1157




















          3 Answers
          3






          active

          oldest

          votes


















          5














          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer


















          • 2





            Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            Feb 7 at 0:34











          • Wishing I could upvote this twice!

            – Parapluie
            Feb 7 at 0:34











          • @Parapluie possibly with a rule that allows robots.txt before the one you have there. I think the webserver would go sequentially through the rules and act on the first match. So if it matches robots.txt then it will act on that line.Examples here: serverfault.com/questions/213422/…

            – Serge Rivest
            Feb 11 at 22:09



















          1














          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer























          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            Feb 7 at 0:35











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            Feb 7 at 0:37











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            Feb 7 at 0:40


















          1














          A crawler should read robots.txt and follow its restrictions, but the web server cannot enforce this.



          .htaccess (or the server confía file) can be used to keep out crawlers that don’t comply, if you know who they are.






          share|improve this answer

























          • Yes, indeed. I am currently using a jail script to ban IPs who ignore the robots.txt directives. i.e.

            – Parapluie
            Feb 8 at 16:45











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "2"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952682%2frobots-txt-is-redirecting-to-default-page%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          5














          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer


















          • 2





            Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            Feb 7 at 0:34











          • Wishing I could upvote this twice!

            – Parapluie
            Feb 7 at 0:34











          • @Parapluie possibly with a rule that allows robots.txt before the one you have there. I think the webserver would go sequentially through the rules and act on the first match. So if it matches robots.txt then it will act on that line.Examples here: serverfault.com/questions/213422/…

            – Serge Rivest
            Feb 11 at 22:09
















          5














          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer


















          • 2





            Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            Feb 7 at 0:34











          • Wishing I could upvote this twice!

            – Parapluie
            Feb 7 at 0:34











          • @Parapluie possibly with a rule that allows robots.txt before the one you have there. I think the webserver would go sequentially through the rules and act on the first match. So if it matches robots.txt then it will act on that line.Examples here: serverfault.com/questions/213422/…

            – Serge Rivest
            Feb 11 at 22:09














          5












          5








          5







          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer













          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 6 at 22:09









          Serge RivestSerge Rivest

          661




          661







          • 2





            Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            Feb 7 at 0:34











          • Wishing I could upvote this twice!

            – Parapluie
            Feb 7 at 0:34











          • @Parapluie possibly with a rule that allows robots.txt before the one you have there. I think the webserver would go sequentially through the rules and act on the first match. So if it matches robots.txt then it will act on that line.Examples here: serverfault.com/questions/213422/…

            – Serge Rivest
            Feb 11 at 22:09













          • 2





            Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            Feb 7 at 0:34











          • Wishing I could upvote this twice!

            – Parapluie
            Feb 7 at 0:34











          • @Parapluie possibly with a rule that allows robots.txt before the one you have there. I think the webserver would go sequentially through the rules and act on the first match. So if it matches robots.txt then it will act on that line.Examples here: serverfault.com/questions/213422/…

            – Serge Rivest
            Feb 11 at 22:09








          2




          2





          Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

          – Parapluie
          Feb 7 at 0:34





          Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

          – Parapluie
          Feb 7 at 0:34













          Wishing I could upvote this twice!

          – Parapluie
          Feb 7 at 0:34





          Wishing I could upvote this twice!

          – Parapluie
          Feb 7 at 0:34













          @Parapluie possibly with a rule that allows robots.txt before the one you have there. I think the webserver would go sequentially through the rules and act on the first match. So if it matches robots.txt then it will act on that line.Examples here: serverfault.com/questions/213422/…

          – Serge Rivest
          Feb 11 at 22:09






          @Parapluie possibly with a rule that allows robots.txt before the one you have there. I think the webserver would go sequentially through the rules and act on the first match. So if it matches robots.txt then it will act on that line.Examples here: serverfault.com/questions/213422/…

          – Serge Rivest
          Feb 11 at 22:09














          1














          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer























          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            Feb 7 at 0:35











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            Feb 7 at 0:37











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            Feb 7 at 0:40















          1














          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer























          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            Feb 7 at 0:35











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            Feb 7 at 0:37











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            Feb 7 at 0:40













          1












          1








          1







          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer













          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 6 at 22:18









          yagmoth555yagmoth555

          12k31842




          12k31842












          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            Feb 7 at 0:35











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            Feb 7 at 0:37











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            Feb 7 at 0:40

















          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            Feb 7 at 0:35











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            Feb 7 at 0:37











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            Feb 7 at 0:40
















          The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

          – Parapluie
          Feb 7 at 0:35





          The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

          – Parapluie
          Feb 7 at 0:35













          @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

          – yagmoth555
          Feb 7 at 0:37





          @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

          – yagmoth555
          Feb 7 at 0:37













          That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

          – Parapluie
          Feb 7 at 0:40





          That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

          – Parapluie
          Feb 7 at 0:40











          1














          A crawler should read robots.txt and follow its restrictions, but the web server cannot enforce this.



          .htaccess (or the server confía file) can be used to keep out crawlers that don’t comply, if you know who they are.






          share|improve this answer

























          • Yes, indeed. I am currently using a jail script to ban IPs who ignore the robots.txt directives. i.e.

            – Parapluie
            Feb 8 at 16:45
















          1














          A crawler should read robots.txt and follow its restrictions, but the web server cannot enforce this.



          .htaccess (or the server confía file) can be used to keep out crawlers that don’t comply, if you know who they are.






          share|improve this answer

























          • Yes, indeed. I am currently using a jail script to ban IPs who ignore the robots.txt directives. i.e.

            – Parapluie
            Feb 8 at 16:45














          1












          1








          1







          A crawler should read robots.txt and follow its restrictions, but the web server cannot enforce this.



          .htaccess (or the server confía file) can be used to keep out crawlers that don’t comply, if you know who they are.






          share|improve this answer















          A crawler should read robots.txt and follow its restrictions, but the web server cannot enforce this.



          .htaccess (or the server confía file) can be used to keep out crawlers that don’t comply, if you know who they are.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Feb 8 at 4:15

























          answered Feb 8 at 2:15









          WGroleauWGroleau

          1113




          1113












          • Yes, indeed. I am currently using a jail script to ban IPs who ignore the robots.txt directives. i.e.

            – Parapluie
            Feb 8 at 16:45


















          • Yes, indeed. I am currently using a jail script to ban IPs who ignore the robots.txt directives. i.e.

            – Parapluie
            Feb 8 at 16:45

















          Yes, indeed. I am currently using a jail script to ban IPs who ignore the robots.txt directives. i.e.

          – Parapluie
          Feb 8 at 16:45






          Yes, indeed. I am currently using a jail script to ban IPs who ignore the robots.txt directives. i.e.

          – Parapluie
          Feb 8 at 16:45


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Server Fault!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952682%2frobots-txt-is-redirecting-to-default-page%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Bahrain

          Postfix configuration issue with fips on centos 7; mailgun relay