How can I wget from a list with multiple lines into one file name?

up vote
1
down vote

favorite

I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:

CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
HonorÃƒÂ©de.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.

I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip

I'm open to re-arranging the file somehow so that I can use
$filename $url

I've tried a few different sed commands. Sed is what I've used to clean up the XML tags, but I can't figure out how to move text to the appropriate place. The titles, names, and languages will vary for each file.

EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.

EDIT2: Here's the XML source

EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.

My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip

If what I'm doing is not best practice, I'm open to other methods.

Thanks

edited Apr 2 at 19:12

asked Apr 2 at 17:25

Matt Zabojnik

can we see an example of the original .xml file ?
â€“Â D'Arcy Nader
Apr 2 at 17:39

Thanks for asking, I've edited my question to add a link to the XML source.
â€“Â Matt Zabojnik
Apr 2 at 17:41

Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
â€“Â DopeGhoti
Apr 2 at 17:42

@DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
â€“Â Matt Zabojnik
Apr 2 at 17:45

Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
â€“Â DopeGhoti
Apr 2 at 18:12

Â |Â
show 1 more comment

up vote
1
down vote

favorite

I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:

CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
HonorÃƒÂ©de.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.

I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip

I'm open to re-arranging the file somehow so that I can use
$filename $url

EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.

EDIT2: Here's the XML source

EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.

My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip

If what I'm doing is not best practice, I'm open to other methods.

Thanks

edited Apr 2 at 19:12

asked Apr 2 at 17:25

Matt Zabojnik

can we see an example of the original .xml file ?
â€“Â D'Arcy Nader
Apr 2 at 17:39

Thanks for asking, I've edited my question to add a link to the XML source.
â€“Â Matt Zabojnik
Apr 2 at 17:41

Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
â€“Â DopeGhoti
Apr 2 at 17:42

@DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
â€“Â Matt Zabojnik
Apr 2 at 17:45

Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
â€“Â DopeGhoti
Apr 2 at 18:12

Â |Â
show 1 more comment

up vote
1
down vote

favorite

I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:

CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
HonorÃƒÂ©de.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.

I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip

I'm open to re-arranging the file somehow so that I can use
$filename $url

EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.

EDIT2: Here's the XML source

EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.

My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip

If what I'm doing is not best practice, I'm open to other methods.

Thanks

edited Apr 2 at 19:12

asked Apr 2 at 17:25

Matt Zabojnik

I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:

CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
HonorÃƒÂ©de.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.

I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip

I'm open to re-arranging the file somehow so that I can use
$filename $url

EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.

EDIT2: Here's the XML source

EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.

My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip

If what I'm doing is not best practice, I'm open to other methods.

Thanks

edited Apr 2 at 19:12

asked Apr 2 at 17:25

Matt Zabojnik

edited Apr 2 at 19:12

asked Apr 2 at 17:25

Matt Zabojnik

asked Apr 2 at 17:25

Matt Zabojnik

asked Apr 2 at 17:25

Matt Zabojnik

can we see an example of the original .xml file ?
â€“Â D'Arcy Nader
Apr 2 at 17:39

Thanks for asking, I've edited my question to add a link to the XML source.
â€“Â Matt Zabojnik
Apr 2 at 17:41

Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
â€“Â DopeGhoti
Apr 2 at 17:42

@DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
â€“Â Matt Zabojnik
Apr 2 at 17:45

Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
â€“Â DopeGhoti
Apr 2 at 18:12

Â |Â
show 1 more comment

can we see an example of the original .xml file ?
â€“Â D'Arcy Nader
Apr 2 at 17:39

Thanks for asking, I've edited my question to add a link to the XML source.
â€“Â Matt Zabojnik
Apr 2 at 17:41

Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
â€“Â DopeGhoti
Apr 2 at 17:42

@DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
â€“Â Matt Zabojnik
Apr 2 at 17:45

Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
â€“Â DopeGhoti
Apr 2 at 18:12

can we see an example of the original .xml file ?
â€“Â D'Arcy Nader
Apr 2 at 17:39

Thanks for asking, I've edited my question to add a link to the XML source.
â€“Â Matt Zabojnik
Apr 2 at 17:41

Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
â€“Â DopeGhoti
Apr 2 at 17:42

@DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
â€“Â Matt Zabojnik
Apr 2 at 17:45

Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
â€“Â DopeGhoti
Apr 2 at 18:12

Â |Â
show 1 more comment

3 Answers
3

active

oldest

votes

up vote
1
down vote

If what I'm doing is not best practice, I'm open to other methods.

I am going to suggest you don't use bash or sed etc.! And go with a python way, which is definitely a much better way to parse the xml you need to parse. I have just written and tested this with python3.6 and it does exactly what you have asked.

#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we 
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so 
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some 
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the 
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
 os.mkdir('Author'.format(str(ln.text), str(fn.text)))
 os.chdir('Author'.format(ln.text, fn.text))
 filename = wget.download(link.text)
 os.rename(filename, 'File.zip')
 os.chdir('../')

You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.

You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.

edited Apr 3 at 6:14

answered Apr 2 at 23:15

Jamie Lindsey

564

The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
â€“Â muru
Apr 3 at 4:37

add a commentÂ |Â

up vote
1
down vote

If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.

u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'

Gives output like:

English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonorÃƒÂ© de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip

After that, it's relatively simple to use xargs:

curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
 xargs -d 'n' -n2 wget -O

Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.

Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.

edited Apr 3 at 6:26

answered Apr 3 at 5:03

muru

33.3k576141

add a commentÂ |Â

up vote
0
down vote

Brute force.

If your parsed xml is in books

while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books

Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.

answered Apr 2 at 21:06

bu5hman

1,164214

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f435091%2fhow-can-i-wget-from-a-list-with-multiple-lines-into-one-file-name%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

If what I'm doing is not best practice, I'm open to other methods.

#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we 
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so 
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some 
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the 
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
 os.mkdir('Author'.format(str(ln.text), str(fn.text)))
 os.chdir('Author'.format(ln.text, fn.text))
 filename = wget.download(link.text)
 os.rename(filename, 'File.zip')
 os.chdir('../')

You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.

You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.

edited Apr 3 at 6:14

answered Apr 2 at 23:15

Jamie Lindsey

564

The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
â€“Â muru
Apr 3 at 4:37

add a commentÂ |Â

up vote
1
down vote

If what I'm doing is not best practice, I'm open to other methods.

#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we 
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so 
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some 
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the 
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
 os.mkdir('Author'.format(str(ln.text), str(fn.text)))
 os.chdir('Author'.format(ln.text, fn.text))
 filename = wget.download(link.text)
 os.rename(filename, 'File.zip')
 os.chdir('../')

You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.

You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.

edited Apr 3 at 6:14

answered Apr 2 at 23:15

Jamie Lindsey

564

The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
â€“Â muru
Apr 3 at 4:37

add a commentÂ |Â

up vote
1
down vote

If what I'm doing is not best practice, I'm open to other methods.

#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we 
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so 
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some 
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the 
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
 os.mkdir('Author'.format(str(ln.text), str(fn.text)))
 os.chdir('Author'.format(ln.text, fn.text))
 filename = wget.download(link.text)
 os.rename(filename, 'File.zip')
 os.chdir('../')

You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.

You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.

edited Apr 3 at 6:14

answered Apr 2 at 23:15

Jamie Lindsey

564

If what I'm doing is not best practice, I'm open to other methods.

#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we 
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so 
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some 
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the 
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
 os.mkdir('Author'.format(str(ln.text), str(fn.text)))
 os.chdir('Author'.format(ln.text, fn.text))
 filename = wget.download(link.text)
 os.rename(filename, 'File.zip')
 os.chdir('../')

You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.

You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.

edited Apr 3 at 6:14

answered Apr 2 at 23:15

Jamie Lindsey

564

edited Apr 3 at 6:14

answered Apr 2 at 23:15

Jamie Lindsey

564

answered Apr 2 at 23:15

Jamie Lindsey

564

answered Apr 2 at 23:15

Jamie Lindsey

564

The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
â€“Â muru
Apr 3 at 4:37

add a commentÂ |Â

The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
â€“Â muru
Apr 3 at 4:37

The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
â€“Â muru
Apr 3 at 4:37

add a commentÂ |Â

up vote
1
down vote

If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.

u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'

Gives output like:

English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonorÃƒÂ© de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip

After that, it's relatively simple to use xargs:

curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
 xargs -d 'n' -n2 wget -O

Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.

Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.

edited Apr 3 at 6:26

answered Apr 3 at 5:03

muru

33.3k576141

add a commentÂ |Â

up vote
1
down vote

If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.

u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'

Gives output like:

English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonorÃƒÂ© de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip

After that, it's relatively simple to use xargs:

curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
 xargs -d 'n' -n2 wget -O

Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.

Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.

edited Apr 3 at 6:26

answered Apr 3 at 5:03

muru

33.3k576141

add a commentÂ |Â

up vote
1
down vote

If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.

u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'

Gives output like:

English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonorÃƒÂ© de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip

After that, it's relatively simple to use xargs:

curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
 xargs -d 'n' -n2 wget -O

Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.

Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.

edited Apr 3 at 6:26

answered Apr 3 at 5:03

muru

33.3k576141

If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.

u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'

Gives output like:

English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonorÃƒÂ© de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip

After that, it's relatively simple to use xargs:

curl "$u" -sL |
 jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
 xargs -d 'n' -n2 wget -O

Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.

Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.

edited Apr 3 at 6:26

answered Apr 3 at 5:03

muru

33.3k576141

edited Apr 3 at 6:26

answered Apr 3 at 5:03

muru

33.3k576141

answered Apr 3 at 5:03

muru

33.3k576141

answered Apr 3 at 5:03

muru

33.3k576141

add a commentÂ |Â

up vote
0
down vote

Brute force.

If your parsed xml is in books

while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books

Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.

answered Apr 2 at 21:06

bu5hman

1,164214

add a commentÂ |Â

up vote
0
down vote

Brute force.

If your parsed xml is in books

while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books

Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.

answered Apr 2 at 21:06

bu5hman

1,164214

add a commentÂ |Â

up vote
0
down vote

Brute force.

If your parsed xml is in books

while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books

Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.

answered Apr 2 at 21:06

bu5hman

1,164214

Brute force.

If your parsed xml is in books

while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books

Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.

answered Apr 2 at 21:06

bu5hman

1,164214

answered Apr 2 at 21:06

bu5hman

1,164214

answered Apr 2 at 21:06

bu5hman

1,164214

answered Apr 2 at 21:06

bu5hman

1,164214

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu