Web scraping the titles and descriptions of trending YouTube videos

This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')

csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])

for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 print(title)

 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 print(description)

 except Exception as e:
 description = None

 print('n')
 csv_writer.writerow([title, description])

csv_file.close()

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

asked Dec 30 '18 at 20:57

austingae

57015

add a comment |

This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')

csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])

for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 print(title)

 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 print(description)

 except Exception as e:
 description = None

 print('n')
 csv_writer.writerow([title, description])

csv_file.close()

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

asked Dec 30 '18 at 20:57

austingae

57015

add a comment |

This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')

csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])

for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 print(title)

 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 print(description)

 except Exception as e:
 description = None

 print('n')
 csv_writer.writerow([title, description])

csv_file.close()

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

asked Dec 30 '18 at 20:57

austingae

57015

This scrapes the titles and descriptions of trending YouTube videos and writes them to a CSV file. What improvements can I make?

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get("https://www.youtube.com/feed/trending").text
soup = BeautifulSoup(source, 'lxml')

csv_file = open('YouTube Trending Titles on 12-30-18.csv','w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Title', 'Description'])

for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 print(title)

 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 print(description)

 except Exception as e:
 description = None

 print('n')
 csv_writer.writerow([title, description])

csv_file.close()

python csv web-scraping beautifulsoup youtube

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

asked Dec 30 '18 at 20:57

austingae

57015

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

asked Dec 30 '18 at 20:57

austingae

57015

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

edited Dec 30 '18 at 21:41

Jamal♦

30.3k11116226

asked Dec 30 '18 at 20:57

austingae

57015

asked Dec 30 '18 at 20:57

austingae

57015

asked Dec 30 '18 at 20:57

austingae

57015

add a comment |

4 Answers
4

active

oldest

votes

Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a GET request to https://www.googleapis.com/youtube/v3/videos?key=…&part=snippet&chart=mostpopular, you will get the same information in a documented JSON format.

Using the Python client, the code looks like:

import csv
import googleapiclient.discovery

def most_popular(yt, **kwargs):
 popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
 for video in popular['items']:
 yield video['snippet']

yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
 csv_writer = csv.writer(f)
 csv_writer.writerow(['Title', 'Description'])
 csv_writer.writerows(
 [snip['title'], snip['description']]
 for snip in most_popular(yt, maxResults=20, regionCode=…)
 )

I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.

edited Dec 30 '18 at 22:05

answered Dec 30 '18 at 21:51

200_success

129k15152414

Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?

– Josay
Dec 30 '18 at 21:56

5

@Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.

– 200_success
Dec 30 '18 at 22:06

Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.

– austingae
Dec 30 '18 at 23:55

3

Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?

– Whirl Mind
Dec 31 '18 at 15:32

3

@WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.

– 200_success
Dec 31 '18 at 16:09

add a comment |

Context manager

You open a file at the beginning of the program and close it explicitly at the end.

Python provides a nice way to allocate and release resources (such as files) easily: they are called Context managers. They give you the guarantee that the cleanup is performed at the end even in case of exception.

In your case, you could write:

with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
 ....

Exception

All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:

it's hard to know what types of error are actually expected here

most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.

Also, from the content of the except block, it looks like you are only expecting the logic about description to fail. If so, it would be clearer to put in the try (...) except the smallest amount of code.

For instance:

title = content.h3.a.text
try:
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
 description = None
print(title)
print(description)
print('n')

Proper solution

Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...

answered Dec 30 '18 at 21:55

Josay

25.6k14087

add a comment |

I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly around HTML parsing:

you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:

The SoupStrainer class allows you to choose which parts of an incoming document are parsed.
```
from bs4 import BeautifulSoup, SoupStrainer

trending_containers = SoupStrainer(class_="yt-lockup-content")
soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)
```

instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:

soup.select('.yt-lockup-content')

instead of:

soup.find_all('div', class_= "yt-lockup-content")

and:

content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')

instead of:

content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")

note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case

organize imports as per PEP8

edited Dec 31 '18 at 0:51

answered Dec 31 '18 at 0:45

alecxe

15k53478

add a comment |

I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:

from bs4 import BeautifulSoup
import requests
import csv

def soup():
 source = requests.get("https://www.youtube.com/feed/trending").text
 soup = BeautifulSoup(source, 'lxml')

def find_videos(soup):
 for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 except Exception as e:
 description = None
 yield (title, description)

with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

 csv_writer = csv.writer(csv_file)
 csv_writer.writerow(['Title', 'Description'])

 for (title, description) in find_videos(soup()):
 csv_writer.writerow([title, description])

Disclaimar: I haven't tested this code.

answered Jan 1 at 4:49

Anders

1312

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210613%2fweb-scraping-the-titles-and-descriptions-of-trending-youtube-videos%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Using the Python client, the code looks like:

import csv
import googleapiclient.discovery

def most_popular(yt, **kwargs):
 popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
 for video in popular['items']:
 yield video['snippet']

yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
 csv_writer = csv.writer(f)
 csv_writer.writerow(['Title', 'Description'])
 csv_writer.writerows(
 [snip['title'], snip['description']]
 for snip in most_popular(yt, maxResults=20, regionCode=…)
 )

I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.

edited Dec 30 '18 at 22:05

answered Dec 30 '18 at 21:51

200_success

129k15152414

Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?

– Josay
Dec 30 '18 at 21:56

5

@Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.

– 200_success
Dec 30 '18 at 22:06

Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.

– austingae
Dec 30 '18 at 23:55

3

Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?

– Whirl Mind
Dec 31 '18 at 15:32

3

@WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.

– 200_success
Dec 31 '18 at 16:09

add a comment |

Using the Python client, the code looks like:

import csv
import googleapiclient.discovery

def most_popular(yt, **kwargs):
 popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
 for video in popular['items']:
 yield video['snippet']

yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
 csv_writer = csv.writer(f)
 csv_writer.writerow(['Title', 'Description'])
 csv_writer.writerows(
 [snip['title'], snip['description']]
 for snip in most_popular(yt, maxResults=20, regionCode=…)
 )

I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.

edited Dec 30 '18 at 22:05

answered Dec 30 '18 at 21:51

200_success

129k15152414

Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?

– Josay
Dec 30 '18 at 21:56

5

@Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.

– 200_success
Dec 30 '18 at 22:06

Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.

– austingae
Dec 30 '18 at 23:55

3

Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?

– Whirl Mind
Dec 31 '18 at 15:32

3

@WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.

– 200_success
Dec 31 '18 at 16:09

add a comment |

Using the Python client, the code looks like:

import csv
import googleapiclient.discovery

def most_popular(yt, **kwargs):
 popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
 for video in popular['items']:
 yield video['snippet']

yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
 csv_writer = csv.writer(f)
 csv_writer.writerow(['Title', 'Description'])
 csv_writer.writerows(
 [snip['title'], snip['description']]
 for snip in most_popular(yt, maxResults=20, regionCode=…)
 )

I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.

edited Dec 30 '18 at 22:05

answered Dec 30 '18 at 21:51

200_success

129k15152414

Using the Python client, the code looks like:

import csv
import googleapiclient.discovery

def most_popular(yt, **kwargs):
 popular = yt.videos().list(chart='mostPopular', part='snippet', **kwargs).execute()
 for video in popular['items']:
 yield video['snippet']

yt = googleapiclient.discovery.build('youtube', 'v3', developerKey=…)
with open('YouTube Trending Titles on 12-30-18.csv', 'w') as f:
 csv_writer = csv.writer(f)
 csv_writer.writerow(['Title', 'Description'])
 csv_writer.writerows(
 [snip['title'], snip['description']]
 for snip in most_popular(yt, maxResults=20, regionCode=…)
 )

I've also restructured the code so that all of the CSV-writing code appears together, an inside a with open(…) as f: … block.

edited Dec 30 '18 at 22:05

answered Dec 30 '18 at 21:51

200_success

129k15152414

edited Dec 30 '18 at 22:05

answered Dec 30 '18 at 21:51

200_success

129k15152414

answered Dec 30 '18 at 21:51

200_success

129k15152414

answered Dec 30 '18 at 21:51

200_success

129k15152414

Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?

– Josay
Dec 30 '18 at 21:56

5

@Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.

– 200_success
Dec 30 '18 at 22:06

Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.

– austingae
Dec 30 '18 at 23:55

3

Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?

– Whirl Mind
Dec 31 '18 at 15:32

3

@WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.

– 200_success
Dec 31 '18 at 16:09

add a comment |

Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?

– Josay
Dec 30 '18 at 21:56

5

@Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.

– 200_success
Dec 30 '18 at 22:06

Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.

– austingae
Dec 30 '18 at 23:55

3

Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?

– Whirl Mind
Dec 31 '18 at 15:32

3

@WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.

– 200_success
Dec 31 '18 at 16:09

Great suggestion! I searched but didn't find that API. Are mostpopular and trending synonyms ?

– Josay
Dec 30 '18 at 21:56

@Josay When I include the regionCode in the API call ('CA' for me), then the first 20 results are identical to the list on the "Trending" page, which indicates that they are indeed different names for the same thing.

– 200_success
Dec 30 '18 at 22:06

Oh, I see; I didn't know the purpose of API, but I will look more into it. Thanks for the info.

– austingae
Dec 30 '18 at 23:55

Is the developerkey a thing required to be taken from Google ? If yes, then the scraping doesn't require it and scraping may be better from that point of view. why sign up for something that you don't need to ?

– Whirl Mind
Dec 31 '18 at 15:32

@WhirlMind Because the HTML structure of the Trending page could change at any time, and is essentially an undocumented API. The YouTube Data API is guaranteed to be stable, and any changes to it will be preceded by a suitable transition period. Obtaining a developer key is a trivial process, if you already have a Google account.

– 200_success
Dec 31 '18 at 16:09

add a comment |

Context manager

You open a file at the beginning of the program and close it explicitly at the end.

In your case, you could write:

with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
 ....

Exception

All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:

it's hard to know what types of error are actually expected here

most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.

For instance:

title = content.h3.a.text
try:
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
 description = None
print(title)
print(description)
print('n')

Proper solution

Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...

answered Dec 30 '18 at 21:55

Josay

25.6k14087

add a comment |

Context manager

You open a file at the beginning of the program and close it explicitly at the end.

In your case, you could write:

with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
 ....

Exception

All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:

it's hard to know what types of error are actually expected here

most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.

For instance:

title = content.h3.a.text
try:
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
 description = None
print(title)
print(description)
print('n')

Proper solution

Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...

answered Dec 30 '18 at 21:55

Josay

25.6k14087

add a comment |

Context manager

You open a file at the beginning of the program and close it explicitly at the end.

In your case, you could write:

with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
 ....

Exception

All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:

it's hard to know what types of error are actually expected here

most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.

For instance:

title = content.h3.a.text
try:
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
 description = None
print(title)
print(description)
print('n')

Proper solution

Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...

answered Dec 30 '18 at 21:55

Josay

25.6k14087

Context manager

You open a file at the beginning of the program and close it explicitly at the end.

In your case, you could write:

with open('YouTube Trending Titles on 12-30-18.csv','w') as file:
 ....

Exception

All exceptions are caught by except Exception as e. It may look like a good idea at first but this can lead to various issues:

it's hard to know what types of error are actually expected here

most errors are better not caught (except for special situations). For instance, if you write a typo, you'll end up with an ignored NameError or AttributeError and debugging will be more painful than it should be.

For instance:

title = content.h3.a.text
try:
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
except Exception as e:
 description = None
print(title)
print(description)
print('n')

Proper solution

Google usually offers API to retrieve things such like trending videos. I haven't found it but I'll let you try to find something that works properly. Google is your friend...

answered Dec 30 '18 at 21:55

Josay

25.6k14087

answered Dec 30 '18 at 21:55

Josay

25.6k14087

answered Dec 30 '18 at 21:55

Josay

25.6k14087

answered Dec 30 '18 at 21:55

Josay

25.6k14087

add a comment |

you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:

The SoupStrainer class allows you to choose which parts of an incoming document are parsed.
```
from bs4 import BeautifulSoup, SoupStrainer

trending_containers = SoupStrainer(class_="yt-lockup-content")
soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)
```

instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:

soup.select('.yt-lockup-content')

instead of:

soup.find_all('div', class_= "yt-lockup-content")

and:

content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')

instead of:

content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")

note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case

organize imports as per PEP8

edited Dec 31 '18 at 0:51

answered Dec 31 '18 at 0:45

alecxe

15k53478

add a comment |

you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:

The SoupStrainer class allows you to choose which parts of an incoming document are parsed.
```
from bs4 import BeautifulSoup, SoupStrainer

trending_containers = SoupStrainer(class_="yt-lockup-content")
soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)
```

instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:

soup.select('.yt-lockup-content')

instead of:

soup.find_all('div', class_= "yt-lockup-content")

and:

content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')

instead of:

content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")

note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case

organize imports as per PEP8

edited Dec 31 '18 at 0:51

answered Dec 31 '18 at 0:45

alecxe

15k53478

add a comment |

you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:

The SoupStrainer class allows you to choose which parts of an incoming document are parsed.
```
from bs4 import BeautifulSoup, SoupStrainer

trending_containers = SoupStrainer(class_="yt-lockup-content")
soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)
```

instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:

soup.select('.yt-lockup-content')

instead of:

soup.find_all('div', class_= "yt-lockup-content")

and:

content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')

instead of:

content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")

note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case

organize imports as per PEP8

edited Dec 31 '18 at 0:51

answered Dec 31 '18 at 0:45

alecxe

15k53478

you could get some speed and memory improvements if you would use a SoupStrainer to allow BeautifulSoup parse out only the desired elements from the HTML:

The SoupStrainer class allows you to choose which parts of an incoming document are parsed.
```
from bs4 import BeautifulSoup, SoupStrainer

trending_containers = SoupStrainer(class_="yt-lockup-content")
soup = BeautifulSoup(source, 'lxml', parse_only=trending_containers)
```

instead of .find_all() and .find() you could have used more concise CSS selectors. You would have:

soup.select('.yt-lockup-content')

instead of:

soup.find_all('div', class_= "yt-lockup-content")

and:

content.select_one('.yt-lockup-description.yt-ui-ellipsis.yt-ui-ellipsis-2')

instead of:

content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2")

note how I've omitted div tag names above - I think they are irrelevant as the class values actually define the type of an element in this case

organize imports as per PEP8

edited Dec 31 '18 at 0:51

answered Dec 31 '18 at 0:45

alecxe

15k53478

edited Dec 31 '18 at 0:51

answered Dec 31 '18 at 0:45

alecxe

15k53478

answered Dec 31 '18 at 0:45

alecxe

15k53478

answered Dec 31 '18 at 0:45

alecxe

15k53478

add a comment |

I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:

from bs4 import BeautifulSoup
import requests
import csv

def soup():
 source = requests.get("https://www.youtube.com/feed/trending").text
 soup = BeautifulSoup(source, 'lxml')

def find_videos(soup):
 for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 except Exception as e:
 description = None
 yield (title, description)

with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

 csv_writer = csv.writer(csv_file)
 csv_writer.writerow(['Title', 'Description'])

 for (title, description) in find_videos(soup()):
 csv_writer.writerow([title, description])

Disclaimar: I haven't tested this code.

answered Jan 1 at 4:49

Anders

1312

add a comment |

I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:

from bs4 import BeautifulSoup
import requests
import csv

def soup():
 source = requests.get("https://www.youtube.com/feed/trending").text
 soup = BeautifulSoup(source, 'lxml')

def find_videos(soup):
 for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 except Exception as e:
 description = None
 yield (title, description)

with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

 csv_writer = csv.writer(csv_file)
 csv_writer.writerow(['Title', 'Description'])

 for (title, description) in find_videos(soup()):
 csv_writer.writerow([title, description])

Disclaimar: I haven't tested this code.

answered Jan 1 at 4:49

Anders

1312

add a comment |

I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:

from bs4 import BeautifulSoup
import requests
import csv

def soup():
 source = requests.get("https://www.youtube.com/feed/trending").text
 soup = BeautifulSoup(source, 'lxml')

def find_videos(soup):
 for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 except Exception as e:
 description = None
 yield (title, description)

with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

 csv_writer = csv.writer(csv_file)
 csv_writer.writerow(['Title', 'Description'])

 for (title, description) in find_videos(soup()):
 csv_writer.writerow([title, description])

Disclaimar: I haven't tested this code.

answered Jan 1 at 4:49

Anders

1312

I would separate the "input" (finding titles and descriptions) from the output (writing to screen or file). One good way to do that is to use a generator:

from bs4 import BeautifulSoup
import requests
import csv

def soup():
 source = requests.get("https://www.youtube.com/feed/trending").text
 soup = BeautifulSoup(source, 'lxml')

def find_videos(soup):
 for content in soup.find_all('div', class_= "yt-lockup-content"):
 try:
 title = content.h3.a.text
 description = content.find('div', class_="yt-lockup-description yt-ui-ellipsis yt-ui-ellipsis-2").text
 except Exception as e:
 description = None
 yield (title, description)

with open('YouTube Trending Titles on 12-30-18.csv', 'w') as csv_file:

 csv_writer = csv.writer(csv_file)
 csv_writer.writerow(['Title', 'Description'])

 for (title, description) in find_videos(soup()):
 csv_writer.writerow([title, description])

Disclaimar: I haven't tested this code.

answered Jan 1 at 4:49

Anders

1312

answered Jan 1 at 4:49

Anders

1312

answered Jan 1 at 4:49

Anders

1312

answered Jan 1 at 4:49

Anders

1312

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu