How can I wget from a list with multiple lines into one file name?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite
1












I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:



CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
Honoréde.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.


I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip



I'm open to re-arranging the file somehow so that I can use
$filename $url



I've tried a few different sed commands. Sed is what I've used to clean up the XML tags, but I can't figure out how to move text to the appropriate place. The titles, names, and languages will vary for each file.



EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.



EDIT2: Here's the XML source



EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.



My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip



If what I'm doing is not best practice, I'm open to other methods.



Thanks







share|improve this question






















  • can we see an example of the original .xml file ?
    – D'Arcy Nader
    Apr 2 at 17:39










  • Thanks for asking, I've edited my question to add a link to the XML source.
    – Matt Zabojnik
    Apr 2 at 17:41










  • Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
    – DopeGhoti
    Apr 2 at 17:42










  • @DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
    – Matt Zabojnik
    Apr 2 at 17:45











  • Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
    – DopeGhoti
    Apr 2 at 18:12














up vote
1
down vote

favorite
1












I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:



CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
Honoréde.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.


I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip



I'm open to re-arranging the file somehow so that I can use
$filename $url



I've tried a few different sed commands. Sed is what I've used to clean up the XML tags, but I can't figure out how to move text to the appropriate place. The titles, names, and languages will vary for each file.



EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.



EDIT2: Here's the XML source



EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.



My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip



If what I'm doing is not best practice, I'm open to other methods.



Thanks







share|improve this question






















  • can we see an example of the original .xml file ?
    – D'Arcy Nader
    Apr 2 at 17:39










  • Thanks for asking, I've edited my question to add a link to the XML source.
    – Matt Zabojnik
    Apr 2 at 17:41










  • Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
    – DopeGhoti
    Apr 2 at 17:42










  • @DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
    – Matt Zabojnik
    Apr 2 at 17:45











  • Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
    – DopeGhoti
    Apr 2 at 18:12












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:



CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
Honoréde.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.


I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip



I'm open to re-arranging the file somehow so that I can use
$filename $url



I've tried a few different sed commands. Sed is what I've used to clean up the XML tags, but I can't figure out how to move text to the appropriate place. The titles, names, and languages will vary for each file.



EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.



EDIT2: Here's the XML source



EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.



My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip



If what I'm doing is not best practice, I'm open to other methods.



Thanks







share|improve this question














I would like to wget a list of items that I'm retrieving from an XML file.
I'm using sed to clean up the XML, and I'm ending up with output like this:



CountofMonteCristo.zip
English.
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
Alexandre.
Dumas.
LettersofTwoBrides.zip
English.
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
Honoréde.
Balzac.
BleakHouse.zip
English.
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip
Charles.
Dickens.


I'd like to use wget -i to download these files as
Language.Lastname.Firstname.Title.zip



I'm open to re-arranging the file somehow so that I can use
$filename $url



I've tried a few different sed commands. Sed is what I've used to clean up the XML tags, but I can't figure out how to move text to the appropriate place. The titles, names, and languages will vary for each file.



EDIT: Before cleaning up the tags with sed, each line is wrapped in tags, such as English and FileTitle.
I think this could be helpful in identifying patterns to re-arrange things.



EDIT2: Here's the XML source



EDIT3: Something like this looks like it would work, but I'm having trouble modifying it to suit my needs.



My ultimate goal is to organize all of the files into folders, with a hierarchy of Language -> AuthorLastnameFirstname -> Files.zip



If what I'm doing is not best practice, I'm open to other methods.



Thanks









share|improve this question













share|improve this question




share|improve this question








edited Apr 2 at 19:12

























asked Apr 2 at 17:25









Matt Zabojnik

86




86











  • can we see an example of the original .xml file ?
    – D'Arcy Nader
    Apr 2 at 17:39










  • Thanks for asking, I've edited my question to add a link to the XML source.
    – Matt Zabojnik
    Apr 2 at 17:41










  • Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
    – DopeGhoti
    Apr 2 at 17:42










  • @DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
    – Matt Zabojnik
    Apr 2 at 17:45











  • Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
    – DopeGhoti
    Apr 2 at 18:12
















  • can we see an example of the original .xml file ?
    – D'Arcy Nader
    Apr 2 at 17:39










  • Thanks for asking, I've edited my question to add a link to the XML source.
    – Matt Zabojnik
    Apr 2 at 17:41










  • Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
    – DopeGhoti
    Apr 2 at 17:42










  • @DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
    – Matt Zabojnik
    Apr 2 at 17:45











  • Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
    – DopeGhoti
    Apr 2 at 18:12















can we see an example of the original .xml file ?
– D'Arcy Nader
Apr 2 at 17:39




can we see an example of the original .xml file ?
– D'Arcy Nader
Apr 2 at 17:39












Thanks for asking, I've edited my question to add a link to the XML source.
– Matt Zabojnik
Apr 2 at 17:41




Thanks for asking, I've edited my question to add a link to the XML source.
– Matt Zabojnik
Apr 2 at 17:41












Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
– DopeGhoti
Apr 2 at 17:42




Why in the name of feral ponies are you trying to use a regular expression tool to perse and reform XML data? Use a DOM parser or other tool that is designed to parse XML to ingest the data and spit out what you need.
– DopeGhoti
Apr 2 at 17:42












@DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
– Matt Zabojnik
Apr 2 at 17:45





@DopeGhoti, Can you elaborate? I've done this before on another site using this method, so this is all I'm familiar with. Docs, examples, suggestions, for DOM parsing would be helpful. Also, I'm doing this on a headless ubuntu machine, using wget to retreive the XML file, if that matters.
– Matt Zabojnik
Apr 2 at 17:45













Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
– DopeGhoti
Apr 2 at 18:12




Question comments are not a place to dive into the weeds, but I will point you first to the manual page for xpath.
– DopeGhoti
Apr 2 at 18:12










3 Answers
3






active

oldest

votes

















up vote
1
down vote














If what I'm doing is not best practice, I'm open to other methods.




I am going to suggest you don't use bash or sed etc.! And go with a python way, which is definitely a much better way to parse the xml you need to parse. I have just written and tested this with python3.6 and it does exactly what you have asked.



#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
os.mkdir('Author'.format(str(ln.text), str(fn.text)))
os.chdir('Author'.format(ln.text, fn.text))
filename = wget.download(link.text)
os.rename(filename, 'File.zip')
os.chdir('../')


You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.



You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.






share|improve this answer






















  • The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
    – muru
    Apr 3 at 4:37

















up vote
1
down vote













If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.



u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'


Gives output like:



English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonoré de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip


After that, it's relatively simple to use xargs:



curl "$u" -sL |
jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
xargs -d 'n' -n2 wget -O


Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.



Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.






share|improve this answer





























    up vote
    0
    down vote













    Brute force.



    If your parsed xml is in books



    while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books


    Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.






    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f435091%2fhow-can-i-wget-from-a-list-with-multiple-lines-into-one-file-name%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote














      If what I'm doing is not best practice, I'm open to other methods.




      I am going to suggest you don't use bash or sed etc.! And go with a python way, which is definitely a much better way to parse the xml you need to parse. I have just written and tested this with python3.6 and it does exactly what you have asked.



      #!/usr/bin/python3
      # Let's import the modules we need
      import wget
      import os
      import requests
      from bs4 import BeautifulSoup as bs

      # Assign the url to a variable (not essential as we
      # only use it once, but it's pythonic)
      url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

      # Use requests to fetch the raw xml
      r = requests.get(url)

      # Use BeautifulSoup and lxml to parse the raw xml so
      # we can do stuff with it
      s = bs(r.text, 'lxml')

      # We need to find the data we need. This will find it and create some
      # python lists for us to loop through later

      # Find all xml tags named 'url_zip_file' and assign them to variable
      links = s.find_all('url_zip_file')

      # Find all xml tags named 'last_name' and assign them to variable
      last_names = s.find_all('last_name')

      # Find all xml tags named 'last_name' and assign them to variable
      first_names = s.find_all('first_name')

      # Find all xml tags named 'language' and assign them to variable
      language = s.find_all('language')

      # Assign the language to a variable
      english = language[0].text

      # Make our new language directory
      os.mkdir(english)

      # cd into our new language directory
      os.chdir(str(english))

      # Loop through the last names (ln), first names(fn) and links
      # so we can make the directories, download the file, rename the
      # file then we go back a directory and loop again
      for ln, fn, link in zip(last_names, first_names, links):
      os.mkdir('Author'.format(str(ln.text), str(fn.text)))
      os.chdir('Author'.format(ln.text, fn.text))
      filename = wget.download(link.text)
      os.rename(filename, 'File.zip')
      os.chdir('../')


      You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.



      You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.






      share|improve this answer






















      • The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
        – muru
        Apr 3 at 4:37














      up vote
      1
      down vote














      If what I'm doing is not best practice, I'm open to other methods.




      I am going to suggest you don't use bash or sed etc.! And go with a python way, which is definitely a much better way to parse the xml you need to parse. I have just written and tested this with python3.6 and it does exactly what you have asked.



      #!/usr/bin/python3
      # Let's import the modules we need
      import wget
      import os
      import requests
      from bs4 import BeautifulSoup as bs

      # Assign the url to a variable (not essential as we
      # only use it once, but it's pythonic)
      url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

      # Use requests to fetch the raw xml
      r = requests.get(url)

      # Use BeautifulSoup and lxml to parse the raw xml so
      # we can do stuff with it
      s = bs(r.text, 'lxml')

      # We need to find the data we need. This will find it and create some
      # python lists for us to loop through later

      # Find all xml tags named 'url_zip_file' and assign them to variable
      links = s.find_all('url_zip_file')

      # Find all xml tags named 'last_name' and assign them to variable
      last_names = s.find_all('last_name')

      # Find all xml tags named 'last_name' and assign them to variable
      first_names = s.find_all('first_name')

      # Find all xml tags named 'language' and assign them to variable
      language = s.find_all('language')

      # Assign the language to a variable
      english = language[0].text

      # Make our new language directory
      os.mkdir(english)

      # cd into our new language directory
      os.chdir(str(english))

      # Loop through the last names (ln), first names(fn) and links
      # so we can make the directories, download the file, rename the
      # file then we go back a directory and loop again
      for ln, fn, link in zip(last_names, first_names, links):
      os.mkdir('Author'.format(str(ln.text), str(fn.text)))
      os.chdir('Author'.format(ln.text, fn.text))
      filename = wget.download(link.text)
      os.rename(filename, 'File.zip')
      os.chdir('../')


      You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.



      You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.






      share|improve this answer






















      • The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
        – muru
        Apr 3 at 4:37












      up vote
      1
      down vote










      up vote
      1
      down vote










      If what I'm doing is not best practice, I'm open to other methods.




      I am going to suggest you don't use bash or sed etc.! And go with a python way, which is definitely a much better way to parse the xml you need to parse. I have just written and tested this with python3.6 and it does exactly what you have asked.



      #!/usr/bin/python3
      # Let's import the modules we need
      import wget
      import os
      import requests
      from bs4 import BeautifulSoup as bs

      # Assign the url to a variable (not essential as we
      # only use it once, but it's pythonic)
      url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

      # Use requests to fetch the raw xml
      r = requests.get(url)

      # Use BeautifulSoup and lxml to parse the raw xml so
      # we can do stuff with it
      s = bs(r.text, 'lxml')

      # We need to find the data we need. This will find it and create some
      # python lists for us to loop through later

      # Find all xml tags named 'url_zip_file' and assign them to variable
      links = s.find_all('url_zip_file')

      # Find all xml tags named 'last_name' and assign them to variable
      last_names = s.find_all('last_name')

      # Find all xml tags named 'last_name' and assign them to variable
      first_names = s.find_all('first_name')

      # Find all xml tags named 'language' and assign them to variable
      language = s.find_all('language')

      # Assign the language to a variable
      english = language[0].text

      # Make our new language directory
      os.mkdir(english)

      # cd into our new language directory
      os.chdir(str(english))

      # Loop through the last names (ln), first names(fn) and links
      # so we can make the directories, download the file, rename the
      # file then we go back a directory and loop again
      for ln, fn, link in zip(last_names, first_names, links):
      os.mkdir('Author'.format(str(ln.text), str(fn.text)))
      os.chdir('Author'.format(ln.text, fn.text))
      filename = wget.download(link.text)
      os.rename(filename, 'File.zip')
      os.chdir('../')


      You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.



      You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.






      share|improve this answer















      If what I'm doing is not best practice, I'm open to other methods.




      I am going to suggest you don't use bash or sed etc.! And go with a python way, which is definitely a much better way to parse the xml you need to parse. I have just written and tested this with python3.6 and it does exactly what you have asked.



      #!/usr/bin/python3
      # Let's import the modules we need
      import wget
      import os
      import requests
      from bs4 import BeautifulSoup as bs

      # Assign the url to a variable (not essential as we
      # only use it once, but it's pythonic)
      url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

      # Use requests to fetch the raw xml
      r = requests.get(url)

      # Use BeautifulSoup and lxml to parse the raw xml so
      # we can do stuff with it
      s = bs(r.text, 'lxml')

      # We need to find the data we need. This will find it and create some
      # python lists for us to loop through later

      # Find all xml tags named 'url_zip_file' and assign them to variable
      links = s.find_all('url_zip_file')

      # Find all xml tags named 'last_name' and assign them to variable
      last_names = s.find_all('last_name')

      # Find all xml tags named 'last_name' and assign them to variable
      first_names = s.find_all('first_name')

      # Find all xml tags named 'language' and assign them to variable
      language = s.find_all('language')

      # Assign the language to a variable
      english = language[0].text

      # Make our new language directory
      os.mkdir(english)

      # cd into our new language directory
      os.chdir(str(english))

      # Loop through the last names (ln), first names(fn) and links
      # so we can make the directories, download the file, rename the
      # file then we go back a directory and loop again
      for ln, fn, link in zip(last_names, first_names, links):
      os.mkdir('Author'.format(str(ln.text), str(fn.text)))
      os.chdir('Author'.format(ln.text, fn.text))
      filename = wget.download(link.text)
      os.rename(filename, 'File.zip')
      os.chdir('../')


      You can either save this to a file or just paste/type into a python3 interpreter cli, it's up to you.



      You will need to install python3-wget and beautifulsoup4 using pip or easy_install etc.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Apr 3 at 6:14

























      answered Apr 2 at 23:15









      Jamie Lindsey

      564




      564











      • The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
        – muru
        Apr 3 at 4:37
















      • The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
        – muru
        Apr 3 at 4:37















      The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
      – muru
      Apr 3 at 4:37




      The API also provides JSON output (with &format=json), so bs4 might not be needed at all.
      – muru
      Apr 3 at 4:37












      up vote
      1
      down vote













      If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.



      u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
      curl "$u" -sL |
      jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'


      Gives output like:



      English.DumasAlexandre.Count of Monte Cristo.zip
      http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
      English.BalzacHonoré de.Letters of Two Brides.zip
      http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
      English.DickensCharles.Bleak House.zip
      http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip


      After that, it's relatively simple to use xargs:



      curl "$u" -sL |
      jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
      xargs -d 'n' -n2 wget -O


      Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.



      Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.






      share|improve this answer


























        up vote
        1
        down vote













        If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.



        u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
        curl "$u" -sL |
        jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'


        Gives output like:



        English.DumasAlexandre.Count of Monte Cristo.zip
        http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
        English.BalzacHonoré de.Letters of Two Brides.zip
        http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
        English.DickensCharles.Bleak House.zip
        http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip


        After that, it's relatively simple to use xargs:



        curl "$u" -sL |
        jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
        xargs -d 'n' -n2 wget -O


        Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.



        Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.






        share|improve this answer
























          up vote
          1
          down vote










          up vote
          1
          down vote









          If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.



          u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
          curl "$u" -sL |
          jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'


          Gives output like:



          English.DumasAlexandre.Count of Monte Cristo.zip
          http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
          English.BalzacHonoré de.Letters of Two Brides.zip
          http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
          English.DickensCharles.Bleak House.zip
          http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip


          After that, it's relatively simple to use xargs:



          curl "$u" -sL |
          jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
          xargs -d 'n' -n2 wget -O


          Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.



          Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.






          share|improve this answer














          If you can use jq, the Librivox API also provides JSON output, and it's probably easier to parse JSON with jq than XML with proper XML tools.



          u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
          curl "$u" -sL |
          jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file'


          Gives output like:



          English.DumasAlexandre.Count of Monte Cristo.zip
          http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
          English.BalzacHonoré de.Letters of Two Brides.zip
          http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
          English.DickensCharles.Bleak House.zip
          http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip


          After that, it's relatively simple to use xargs:



          curl "$u" -sL |
          jq -r '.books | "(.language).(.authors[0].last_name + .authors[0].first_name).(.title).zip", .url_zip_file' |
          xargs -d 'n' -n2 wget -O


          Where xargs use two lines as an argument each to wget, with the first line becoming the -O option parameter and the second the URL.



          Though I'd recommend a Python-based solution like Jamie's, except using JSON and Python's builtin JSON capabilities instead of bs4.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 3 at 6:26

























          answered Apr 3 at 5:03









          muru

          33.3k576141




          33.3k576141




















              up vote
              0
              down vote













              Brute force.



              If your parsed xml is in books



              while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books


              Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.






              share|improve this answer
























                up vote
                0
                down vote













                Brute force.



                If your parsed xml is in books



                while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books


                Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.






                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Brute force.



                  If your parsed xml is in books



                  while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books


                  Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.






                  share|improve this answer












                  Brute force.



                  If your parsed xml is in books



                  while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books


                  Just recompose your lines as variables and you are good to go as long as your record blocks are padded to 5 lines.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 2 at 21:06









                  bu5hman

                  1,164214




                  1,164214






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f435091%2fhow-can-i-wget-from-a-list-with-multiple-lines-into-one-file-name%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay