Download file with actual name by wget
Clash Royale CLAN TAG#URR8PPP
up vote
8
down vote
favorite
I am trying to download a file through HTTP from a web site using wget
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file
I get only a file called index.html?acc=GSE48191
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm
I get asd.rpm
, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.
filenames wget
add a comment |Â
up vote
8
down vote
favorite
I am trying to download a file through HTTP from a web site using wget
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file
I get only a file called index.html?acc=GSE48191
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm
I get asd.rpm
, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.
filenames wget
You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â terdonâ¦
Sep 26 '17 at 12:49
3
@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â Michael Kjörling
Sep 26 '17 at 13:35
1
@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â terdonâ¦
Sep 26 '17 at 13:40
Look at the--trust-server-names
argument towget
-
â ivanivan
Sep 26 '17 at 17:11
3
It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â IMSoP
Sep 26 '17 at 21:08
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
I am trying to download a file through HTTP from a web site using wget
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file
I get only a file called index.html?acc=GSE48191
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm
I get asd.rpm
, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.
filenames wget
I am trying to download a file through HTTP from a web site using wget
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file
I get only a file called index.html?acc=GSE48191
.
When I use:
wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm
I get asd.rpm
, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.
filenames wget
filenames wget
edited Sep 29 '17 at 9:13
Kusalananda
106k14209327
106k14209327
asked Sep 26 '17 at 5:58
Neha
523
523
You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â terdonâ¦
Sep 26 '17 at 12:49
3
@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â Michael Kjörling
Sep 26 '17 at 13:35
1
@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â terdonâ¦
Sep 26 '17 at 13:40
Look at the--trust-server-names
argument towget
-
â ivanivan
Sep 26 '17 at 17:11
3
It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â IMSoP
Sep 26 '17 at 21:08
add a comment |Â
You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â terdonâ¦
Sep 26 '17 at 12:49
3
@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â Michael Kjörling
Sep 26 '17 at 13:35
1
@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â terdonâ¦
Sep 26 '17 at 13:40
Look at the--trust-server-names
argument towget
-
â ivanivan
Sep 26 '17 at 17:11
3
It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â IMSoP
Sep 26 '17 at 21:08
You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â terdonâ¦
Sep 26 '17 at 12:49
You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â terdonâ¦
Sep 26 '17 at 12:49
3
3
@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â Michael Kjörling
Sep 26 '17 at 13:35
@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â Michael Kjörling
Sep 26 '17 at 13:35
1
1
@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â terdonâ¦
Sep 26 '17 at 13:40
@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â terdonâ¦
Sep 26 '17 at 13:40
Look at the
--trust-server-names
argument to wget
-â ivanivan
Sep 26 '17 at 17:11
Look at the
--trust-server-names
argument to wget
-â ivanivan
Sep 26 '17 at 17:11
3
3
It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â IMSoP
Sep 26 '17 at 21:08
It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â IMSoP
Sep 26 '17 at 21:08
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
30
down vote
wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
The file you are downloading is a tar
archive (a binary file), provided by a dynamic link from a web server. wget
would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).
However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget
is able to use if you use the --content-disposition
option. This option is marked "experimental" in my manual for wget
.
You also need to quote the URL so that the shell does not interpret the &
and ?
characters in it.
The equivalent thing using curl
:
curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Or, using the equivalent long options:
curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Once you have downloaded the file, you need to unpack it:
tar -xvf GSE48191_RAW.tar
Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip
-compressed CEL
files.
add a comment |Â
up vote
8
down vote
The shell does the usual interpretation of characters, especially ?
as wildcard (which doesn't matter here) and &
as "put into background". You should have noticed the latter, because the shell response is different from a direct command.
So you need to quote:
wget 'http://abc/geo/download/?acc=GSE48191&format=file'
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
30
down vote
wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
The file you are downloading is a tar
archive (a binary file), provided by a dynamic link from a web server. wget
would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).
However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget
is able to use if you use the --content-disposition
option. This option is marked "experimental" in my manual for wget
.
You also need to quote the URL so that the shell does not interpret the &
and ?
characters in it.
The equivalent thing using curl
:
curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Or, using the equivalent long options:
curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Once you have downloaded the file, you need to unpack it:
tar -xvf GSE48191_RAW.tar
Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip
-compressed CEL
files.
add a comment |Â
up vote
30
down vote
wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
The file you are downloading is a tar
archive (a binary file), provided by a dynamic link from a web server. wget
would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).
However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget
is able to use if you use the --content-disposition
option. This option is marked "experimental" in my manual for wget
.
You also need to quote the URL so that the shell does not interpret the &
and ?
characters in it.
The equivalent thing using curl
:
curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Or, using the equivalent long options:
curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Once you have downloaded the file, you need to unpack it:
tar -xvf GSE48191_RAW.tar
Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip
-compressed CEL
files.
add a comment |Â
up vote
30
down vote
up vote
30
down vote
wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
The file you are downloading is a tar
archive (a binary file), provided by a dynamic link from a web server. wget
would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).
However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget
is able to use if you use the --content-disposition
option. This option is marked "experimental" in my manual for wget
.
You also need to quote the URL so that the shell does not interpret the &
and ?
characters in it.
The equivalent thing using curl
:
curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Or, using the equivalent long options:
curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Once you have downloaded the file, you need to unpack it:
tar -xvf GSE48191_RAW.tar
Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip
-compressed CEL
files.
wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
The file you are downloading is a tar
archive (a binary file), provided by a dynamic link from a web server. wget
would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).
However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget
is able to use if you use the --content-disposition
option. This option is marked "experimental" in my manual for wget
.
You also need to quote the URL so that the shell does not interpret the &
and ?
characters in it.
The equivalent thing using curl
:
curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Or, using the equivalent long options:
curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'
Once you have downloaded the file, you need to unpack it:
tar -xvf GSE48191_RAW.tar
Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip
-compressed CEL
files.
edited Sep 26 '17 at 13:28
answered Sep 26 '17 at 6:25
Kusalananda
106k14209327
106k14209327
add a comment |Â
add a comment |Â
up vote
8
down vote
The shell does the usual interpretation of characters, especially ?
as wildcard (which doesn't matter here) and &
as "put into background". You should have noticed the latter, because the shell response is different from a direct command.
So you need to quote:
wget 'http://abc/geo/download/?acc=GSE48191&format=file'
add a comment |Â
up vote
8
down vote
The shell does the usual interpretation of characters, especially ?
as wildcard (which doesn't matter here) and &
as "put into background". You should have noticed the latter, because the shell response is different from a direct command.
So you need to quote:
wget 'http://abc/geo/download/?acc=GSE48191&format=file'
add a comment |Â
up vote
8
down vote
up vote
8
down vote
The shell does the usual interpretation of characters, especially ?
as wildcard (which doesn't matter here) and &
as "put into background". You should have noticed the latter, because the shell response is different from a direct command.
So you need to quote:
wget 'http://abc/geo/download/?acc=GSE48191&format=file'
The shell does the usual interpretation of characters, especially ?
as wildcard (which doesn't matter here) and &
as "put into background". You should have noticed the latter, because the shell response is different from a direct command.
So you need to quote:
wget 'http://abc/geo/download/?acc=GSE48191&format=file'
answered Sep 26 '17 at 6:10
dirkt
14.3k2931
14.3k2931
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f394464%2fdownload-file-with-actual-name-by-wget%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â terdonâ¦
Sep 26 '17 at 12:49
3
@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â Michael Kjörling
Sep 26 '17 at 13:35
1
@MichaelKjörling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â terdonâ¦
Sep 26 '17 at 13:40
Look at the
--trust-server-names
argument towget
-â ivanivan
Sep 26 '17 at 17:11
3
It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â IMSoP
Sep 26 '17 at 21:08