Download file with actual name by wget

up vote
8
down vote

favorite

I am trying to download a file through HTTP from a web site using wget.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file

I get only a file called index.html?acc=GSE48191.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm

I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

asked Sep 26 '17 at 5:58

Neha

523

You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â€“Â terdonâ™¦
Sep 26 '17 at 12:49

3

@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â€“Â Michael KjÃ¶rling
Sep 26 '17 at 13:35

1

@MichaelKjÃ¶rling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â€“Â terdonâ™¦
Sep 26 '17 at 13:40

Look at the --trust-server-names argument to wget -
â€“Â ivanivan
Sep 26 '17 at 17:11

3

It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â€“Â IMSoP
Sep 26 '17 at 21:08

add a commentÂ |Â

up vote
8
down vote

favorite

I am trying to download a file through HTTP from a web site using wget.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file

I get only a file called index.html?acc=GSE48191.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm

I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

asked Sep 26 '17 at 5:58

Neha

523

You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â€“Â terdonâ™¦
Sep 26 '17 at 12:49

3

@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â€“Â Michael KjÃ¶rling
Sep 26 '17 at 13:35

1

@MichaelKjÃ¶rling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â€“Â terdonâ™¦
Sep 26 '17 at 13:40

Look at the --trust-server-names argument to wget -
â€“Â ivanivan
Sep 26 '17 at 17:11

3

It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â€“Â IMSoP
Sep 26 '17 at 21:08

add a commentÂ |Â

up vote
8
down vote

favorite

I am trying to download a file through HTTP from a web site using wget.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file

I get only a file called index.html?acc=GSE48191.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm

I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

asked Sep 26 '17 at 5:58

Neha

523

I am trying to download a file through HTTP from a web site using wget.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file

I get only a file called index.html?acc=GSE48191.

When I use:

wget http://abc/geo/download/?acc=GSE48191&format=file -o asd.rpm

I get asd.rpm, but I want to download with actual name, and don't want to have manually change the name of the downloaded file.

filenames wget

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

asked Sep 26 '17 at 5:58

Neha

523

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

asked Sep 26 '17 at 5:58

Neha

523

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

edited Sep 29 '17 at 9:13

Kusalananda

106k14209327

asked Sep 26 '17 at 5:58

Neha

523

asked Sep 26 '17 at 5:58

Neha

523

asked Sep 26 '17 at 5:58

Neha

523

You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â€“Â terdonâ™¦
Sep 26 '17 at 12:49

3

@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â€“Â Michael KjÃ¶rling
Sep 26 '17 at 13:35

1

@MichaelKjÃ¶rling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â€“Â terdonâ™¦
Sep 26 '17 at 13:40

Look at the --trust-server-names argument to wget -
â€“Â ivanivan
Sep 26 '17 at 17:11

3

It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â€“Â IMSoP
Sep 26 '17 at 21:08

add a commentÂ |Â

You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â€“Â terdonâ™¦
Sep 26 '17 at 12:49

3

@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â€“Â Michael KjÃ¶rling
Sep 26 '17 at 13:35

1

@MichaelKjÃ¶rling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â€“Â terdonâ™¦
Sep 26 '17 at 13:40

Look at the --trust-server-names argument to wget -
â€“Â ivanivan
Sep 26 '17 at 17:11

3

It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â€“Â IMSoP
Sep 26 '17 at 21:08

You might want to ask this sort of question on Bioinformatics next time. It's on topic here as well, and welcome to stay, but you might get more help from people who work in the field.
â€“Â terdonâ™¦
Sep 26 '17 at 12:49

@terdon How is asking about wget and *nix shell behavior on topic on Bioinformatics?
â€“Â Michael KjÃ¶rling
Sep 26 '17 at 13:35

@MichaelKjÃ¶rling extracting information from NCBI would be, that's why I suggested it. An answer there would likely involve a simpler, more direct approach to get at the information the OP is looking for rather than a shell solution. Something like "you can get this information more easily from here" for instance.
â€“Â terdonâ™¦
Sep 26 '17 at 13:40

Look at the --trust-server-names argument to wget -
â€“Â ivanivan
Sep 26 '17 at 17:11

It's important to note that there is no such thing as "the actual name" of a resource referenced by a URL. A web server responds to a request with some content, and possibly some headers which describe that content in some way, but there doesn't have to be a file involved at all.
â€“Â IMSoP
Sep 26 '17 at 21:08

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
30
down vote

wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

The file you are downloading is a tar archive (a binary file), provided by a dynamic link from a web server. wget would normally save the file using part of the URL that you're using, but in this case that's just a REST API endpoint (or something similar) so the name would be unfriendly to work with (it would still be a valid name and the file contents would be the same).

However, in this case the server provides a "Content Disposition" header containing the actual file name, which wget is able to use if you use the --content-disposition option. This option is marked "experimental" in my manual for wget.

You also need to quote the URL so that the shell does not interpret the & and ? characters in it.

The equivalent thing using curl:

curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Or, using the equivalent long options:

 curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Once you have downloaded the file, you need to unpack it:

tar -xvf GSE48191_RAW.tar

Due to the way that this particular archive was created, this will unpack the archive's files into the current directory (so creating a new directory, moving the archive there and unpacking it there may be a good idea). The files in this archive are gzip-compressed CEL files.

edited Sep 26 '17 at 13:28

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

add a commentÂ |Â

up vote
8
down vote

The shell does the usual interpretation of characters, especially ? as wildcard (which doesn't matter here) and & as "put into background". You should have noticed the latter, because the shell response is different from a direct command.

So you need to quote:

wget 'http://abc/geo/download/?acc=GSE48191&format=file'

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f394464%2fdownload-file-with-actual-name-by-wget%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
30
down vote

wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

You also need to quote the URL so that the shell does not interpret the & and ? characters in it.

The equivalent thing using curl:

curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Or, using the equivalent long options:

 curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Once you have downloaded the file, you need to unpack it:

tar -xvf GSE48191_RAW.tar

edited Sep 26 '17 at 13:28

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

add a commentÂ |Â

up vote
30
down vote

wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

You also need to quote the URL so that the shell does not interpret the & and ? characters in it.

The equivalent thing using curl:

curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Or, using the equivalent long options:

 curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Once you have downloaded the file, you need to unpack it:

tar -xvf GSE48191_RAW.tar

edited Sep 26 '17 at 13:28

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

add a commentÂ |Â

up vote
30
down vote

wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

You also need to quote the URL so that the shell does not interpret the & and ? characters in it.

The equivalent thing using curl:

curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Or, using the equivalent long options:

 curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Once you have downloaded the file, you need to unpack it:

tar -xvf GSE48191_RAW.tar

edited Sep 26 '17 at 13:28

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

wget --content-disposition 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

You also need to quote the URL so that the shell does not interpret the & and ? characters in it.

The equivalent thing using curl:

curl -J -O 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Or, using the equivalent long options:

 curl --remote-header-name --remote-name 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE48191&format=file'

Once you have downloaded the file, you need to unpack it:

tar -xvf GSE48191_RAW.tar

edited Sep 26 '17 at 13:28

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

edited Sep 26 '17 at 13:28

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

answered Sep 26 '17 at 6:25

Kusalananda

106k14209327

add a commentÂ |Â

up vote
8
down vote

So you need to quote:

wget 'http://abc/geo/download/?acc=GSE48191&format=file'

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

add a commentÂ |Â

up vote
8
down vote

So you need to quote:

wget 'http://abc/geo/download/?acc=GSE48191&format=file'

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

add a commentÂ |Â

up vote
8
down vote

So you need to quote:

wget 'http://abc/geo/download/?acc=GSE48191&format=file'

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

So you need to quote:

wget 'http://abc/geo/download/?acc=GSE48191&format=file'

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

answered Sep 26 '17 at 6:10

dirkt

14.3k2931

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu