Download file with actual name by wget

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
8
down vote

favorite
6












I am trying to download a file through HTTP from a web site using wget.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file


I get only a file called index.html?acc=GSE48191.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm


I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.










share|improve this question























  • You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
    – terdon♦
    Sep 26 '17 at 12:49






  • 3




    @terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
    – Michael Kjörling
    Sep 26 '17 at 13:35






  • 1




    @MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
    – terdon♦
    Sep 26 '17 at 13:40











  • Look at the --trust-server-names argument to wget -
    – ivanivan
    Sep 26 '17 at 17:11






  • 3




    It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
    – IMSoP
    Sep 26 '17 at 21:08














up vote
8
down vote

favorite
6












I am trying to download a file through HTTP from a web site using wget.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file


I get only a file called index.html?acc=GSE48191.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm


I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.










share|improve this question























  • You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
    – terdon♦
    Sep 26 '17 at 12:49






  • 3




    @terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
    – Michael Kjörling
    Sep 26 '17 at 13:35






  • 1




    @MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
    – terdon♦
    Sep 26 '17 at 13:40











  • Look at the --trust-server-names argument to wget -
    – ivanivan
    Sep 26 '17 at 17:11






  • 3




    It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
    – IMSoP
    Sep 26 '17 at 21:08












up vote
8
down vote

favorite
6









up vote
8
down vote

favorite
6






6





I am trying to download a file through HTTP from a web site using wget.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file


I get only a file called index.html?acc=GSE48191.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm


I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.










share|improve this question















I am trying to download a file through HTTP from a web site using wget.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file


I get only a file called index.html?acc=GSE48191.



When I use:



wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm


I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.







filenames wget






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 29 '17 at 9:13









Kusalananda

106k14209327




106k14209327










asked Sep 26 '17 at 5:58









Neha

523




523











  • You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
    – terdon♦
    Sep 26 '17 at 12:49






  • 3




    @terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
    – Michael Kjörling
    Sep 26 '17 at 13:35






  • 1




    @MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
    – terdon♦
    Sep 26 '17 at 13:40











  • Look at the --trust-server-names argument to wget -
    – ivanivan
    Sep 26 '17 at 17:11






  • 3




    It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
    – IMSoP
    Sep 26 '17 at 21:08
















  • You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
    – terdon♦
    Sep 26 '17 at 12:49






  • 3




    @terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
    – Michael Kjörling
    Sep 26 '17 at 13:35






  • 1




    @MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
    – terdon♦
    Sep 26 '17 at 13:40











  • Look at the --trust-server-names argument to wget -
    – ivanivan
    Sep 26 '17 at 17:11






  • 3




    It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
    – IMSoP
    Sep 26 '17 at 21:08















You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
– terdon♦
Sep 26 '17 at 12:49




You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
– terdon♦
Sep 26 '17 at 12:49




3




3




@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
– Michael Kjörling
Sep 26 '17 at 13:35




@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
– Michael Kjörling
Sep 26 '17 at 13:35




1




1




@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
– terdon♦
Sep 26 '17 at 13:40





@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
– terdon♦
Sep 26 '17 at 13:40













Look at the --trust-server-names argument to wget -
– ivanivan
Sep 26 '17 at 17:11




Look at the --trust-server-names argument to wget -
– ivanivan
Sep 26 '17 at 17:11




3




3




It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
– IMSoP
Sep 26 '17 at 21:08




It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
– IMSoP
Sep 26 '17 at 21:08










2 Answers
2






active

oldest

votes

















up vote
30
down vote













wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


The file you are downloading is a tar archive (a binary file), provided by a dynamic link from a web server. wget would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).



However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget is able to use if you use the --content-disposition option. This option is marked "experimental" in my manual for wget.



You also need to quote the URL so that the shell does not interpret the & and ? characters in it.




The equivalent thing using curl:



curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


Or, using the equivalent long options:



 curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'



Once you have downloaded the file, you need to unpack it:



tar -xvf GSE48191_RAW.tar


Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip-compressed CEL files.






share|improve this answer





























    up vote
    8
    down vote













    The shell does the usual interpretation of characters, especially ? as wildcard (which doesn't matter here) and & as "put into background". You should have noticed the latter, because the shell response is different from a direct command.



    So you need to quote:



    wget 'http://abc/geo/download/?acc=GSE48191&format=file'





    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f394464%2fdownload-file-with-actual-name-by-wget%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      30
      down vote













      wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


      The file you are downloading is a tar archive (a binary file), provided by a dynamic link from a web server. wget would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).



      However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget is able to use if you use the --content-disposition option. This option is marked "experimental" in my manual for wget.



      You also need to quote the URL so that the shell does not interpret the & and ? characters in it.




      The equivalent thing using curl:



      curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


      Or, using the equivalent long options:



       curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'



      Once you have downloaded the file, you need to unpack it:



      tar -xvf GSE48191_RAW.tar


      Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip-compressed CEL files.






      share|improve this answer


























        up vote
        30
        down vote













        wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


        The file you are downloading is a tar archive (a binary file), provided by a dynamic link from a web server. wget would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).



        However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget is able to use if you use the --content-disposition option. This option is marked "experimental" in my manual for wget.



        You also need to quote the URL so that the shell does not interpret the & and ? characters in it.




        The equivalent thing using curl:



        curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


        Or, using the equivalent long options:



         curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'



        Once you have downloaded the file, you need to unpack it:



        tar -xvf GSE48191_RAW.tar


        Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip-compressed CEL files.






        share|improve this answer
























          up vote
          30
          down vote










          up vote
          30
          down vote









          wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


          The file you are downloading is a tar archive (a binary file), provided by a dynamic link from a web server. wget would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).



          However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget is able to use if you use the --content-disposition option. This option is marked "experimental" in my manual for wget.



          You also need to quote the URL so that the shell does not interpret the & and ? characters in it.




          The equivalent thing using curl:



          curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


          Or, using the equivalent long options:



           curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'



          Once you have downloaded the file, you need to unpack it:



          tar -xvf GSE48191_RAW.tar


          Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip-compressed CEL files.






          share|improve this answer














          wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


          The file you are downloading is a tar archive (a binary file), provided by a dynamic link from a web server. wget would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).



          However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget is able to use if you use the --content-disposition option. This option is marked "experimental" in my manual for wget.



          You also need to quote the URL so that the shell does not interpret the & and ? characters in it.




          The equivalent thing using curl:



          curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'


          Or, using the equivalent long options:



           curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'



          Once you have downloaded the file, you need to unpack it:



          tar -xvf GSE48191_RAW.tar


          Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip-compressed CEL files.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Sep 26 '17 at 13:28

























          answered Sep 26 '17 at 6:25









          Kusalananda

          106k14209327




          106k14209327






















              up vote
              8
              down vote













              The shell does the usual interpretation of characters, especially ? as wildcard (which doesn't matter here) and & as "put into background". You should have noticed the latter, because the shell response is different from a direct command.



              So you need to quote:



              wget 'http://abc/geo/download/?acc=GSE48191&format=file'





              share|improve this answer
























                up vote
                8
                down vote













                The shell does the usual interpretation of characters, especially ? as wildcard (which doesn't matter here) and & as "put into background". You should have noticed the latter, because the shell response is different from a direct command.



                So you need to quote:



                wget 'http://abc/geo/download/?acc=GSE48191&format=file'





                share|improve this answer






















                  up vote
                  8
                  down vote










                  up vote
                  8
                  down vote









                  The shell does the usual interpretation of characters, especially ? as wildcard (which doesn't matter here) and & as "put into background". You should have noticed the latter, because the shell response is different from a direct command.



                  So you need to quote:



                  wget 'http://abc/geo/download/?acc=GSE48191&format=file'





                  share|improve this answer












                  The shell does the usual interpretation of characters, especially ? as wildcard (which doesn't matter here) and & as "put into background". You should have noticed the latter, because the shell response is different from a direct command.



                  So you need to quote:



                  wget 'http://abc/geo/download/?acc=GSE48191&format=file'






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Sep 26 '17 at 6:10









                  dirkt

                  14.3k2931




                  14.3k2931



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f394464%2fdownload-file-with-actual-name-by-wget%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay