Selecting Custom Field from RSS feed - python

So I'm trying to select the information within a custom feed from an RSS file. So far I'm making use of feed parser to access more common fields within the file. I was wondering if anyone would know how to achieve this given that there are numerous custom fields. The specific custom field i'm trying to select is the venue address.
<item>
<title>First 5 Forever babies, books and rhymes</title>
<description>Thursday, June 9, 2022, 9:30&nbsp;&ndash;&nbsp;10am <br/><br/><img src="https://www.trumba.com/i/DgBXwGE0qzQ0Bl7V7ynz3DTV.jpg" title="First 5 Forever babies, books and rhymes"
<link>https://www.brisbane.qld.gov.au/trumba?trumbaEmbed=view%3devent%26eventid%3d159467519</link>
<x-trumba:ealink>https://eventactions.com/eventactions/brisbane-city-council#/actions/030ma0wxb5wympxw4nud1vrr72</x-trumba:ealink>
<category>2022/06/09 (Thu)</category>
<guid isPermaLink="false">http://uid.trumba.com/event/159467519</guid>
<x-trumba:masterid isPermaLink="false">http://uid.trumba.com/master/159467517</x-trumba:masterid>
<xCal:summary>First 5 Forever babies, books and rhymes</xCal:summary>
<xCal:location>Nundah Library</xCal:location>
<xCal:dtstart>2022-06-08T23:30:00Z</xCal:dtstart>
<x-trumba:localstart tzAbbr="EAST" tzCode="260">2022-06-09T09:30:00</x-trumba:localstart>
<x-trumba:formatteddatetime>Thursday, June 9, 2022, 9:30 - 10am</x-trumba:formatteddatetime>
<xCal:dtend>2022-06-09T00:00:00Z</xCal:dtend>
<x-trumba:localend tzAbbr="EAST" tzCode="260">2022-06-09T10:00:00</x-trumba:localend>
<x-microsoft:cdo-alldayevent>false</x-microsoft:cdo-alldayevent>
<x-trumba:customfield name="Event Type" id="21" type="number">Library events</x-trumba:customfield>
<x-trumba:customfield name="Venue" id="22542" type="text">Nundah Library</x-trumba:customfield>
<x-trumba:customfield name="Venue address" id="22505" type="text">Nundah Library, 1 Bage Street (via Primrose Lane), Nundah</x-trumba :customfield>
<x-trumba:customfield name="Parent event" id="42212" type="text">First 5 Forever children's literacy sessions</x-trumba:customfield>
<x-trumba:customfield name="Age range" id="21858" type="text">Infants and toddlers</x-trumba:customfield>
<x-trumba:customfield name="Cost" id="22177" type="text">Free</x-trumba:customfield>
<x-trumba:customfield name="Event type" id="21859" type="text">Free</x-trumba:customfield>
<x-trumba:customfield name="Library event types" id="22496" type="text">Babies, books & rhymes,Children's literacy</x-trumba:customfield>
<x-trumba:customfield name="Event image" id="40" type="uri" imageWidth="1290" imageHeight="775">https://www.trumba.com/i/DgBXwGE0qzQ0Bl7V7ynz3DTV.jpg</x-trumba:customfield>
<x-trumba:customfield name="Age" id="23562" type="text">0-1 year olds</x-trumba:customfield>
<x-trumba:categorycalendar>Brisbane's calendar|Library events</x-trumba:categorycalendar>
Examples of the code I have used previously to retrieve information can be seen below:
blog_feed = feedparser.parse(url)
posts = blog_feed.entries
for post in posts:
#collecting the title for each individual item in the RSS file
title = post.title
#selecting the entire item as "word"
word = posts[counter]
counter = counter+1
#we know that the date of the event is always stored after the code (category)
date = word.category
After attempting to use BS4 I can successfully retrieve the address, but I am still unsure if it is possible to make use of this method within a loop to find the address of each item within the RSS file and then append the address to a main list given another parameter is true.
with open("brisbane-city-council.rss") as fp:
soup = BeautifulSoup(fp, "html.parser")
addrress = soup.find("x-trumba:customfield", id="22505")
print(addrress)
Below is the for loop I am using.
for post in posts:
#collecting the title for each individual item in the RSS file
title = post.title
#selecting the entire item as "word"
word = posts[counter]
counter = counter+1
#we know that the date of the event is always stored after the code (category)
date = word.category
#pulling down the link as it is unique for each event
link = word.link
#formatting date for ease of use and to allow functionality to be completed
date = date.split(' (')
date = date[0]
date = datetime.datetime.strptime(date, "%Y/%m/%d").date()
if date > start_date and date < end_date:
post_list.append(title)
description = post.summary
h = html2text.HTML2Text()
h.ignore_links = True
description = h.handle(description)
description_list.append(description)
link_list.append(link)
else:
continue

Related

Extract data from text file using Python (or any language)

I have a text file that looks like:
First Name Bob
Last name Smith
Phone 555-555-5555
Email bob#bob.com
Date of Birth 11/02/1986
Preferred Method of Contact Text Message
Desired Appointment Date 04/29
Desired Appointment Time 10am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
First Name john
Last name Smith
Phone 555-555-4444
Email john#gmail.com
Date of Birth 03/02/1955
Preferred Method of Contact Text Message
Desired Appointment Date 05/22
Desired Appointment Time 9am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
.... and so on
I need to extract each entry to a csv file, so the data should look like: first name, last name, phone, email, etc. I don't even know where to start on something like this.
first of all you'll need to open the text file in read mode.
I'd suggest using a context manager like so:
with open('path/to/your/file.txt', 'r') as file:
for line in file.readlines():
# do something with the line (it is a string)
as for managing the info you could build some intermediate structure, for example a dictionary or a list of dictionaries, and then translate that into a CSV file with the csv module.
you could for example split the file whenever there is a blank line, maybe like this:
with open('Downloads/test.txt', 'r') as f:
my_list = list() # this will be the final list
entry = dict() # this contains each user info as a dict
for line in f.readlines():
if line.strip() == "": # if line is empty start a new dict
my_list.append(entry) # and append the old one to the list
entry = dict()
else: # otherwise split the line and create new dict
line_items = line.split(r' ')
print(line_items)
entry[line_items[0]] = line_items[1]
print(my_list)
this code won't work because your text is not formatted in a consistent way: you need to find a way to make the split between "title" and "content" (like "first name" and "bob") in a consistent way. I suggest maybe looking at regex and fixing the txt file by making spacing more consistent.
assuming the data resides in a:
a="""
First Name Bob
Last name Smith
Phone 555-555-5555
Email bob#bob.com
Date of Birth 11/02/1986
Preferred Method of Contact Text Message
Desired Appointment Date 04/29
Desired Appointment Time 10am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
First Name john
Last name Smith
Phone 555-555-4444
Email john#gmail.com
Date of Birth 03/02/1955
Preferred Method of Contact Text Message
Desired Appointment Date 05/22
Desired Appointment Time 9am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
"""
line_sep = "\n" # CHANGE ME ACCORDING TO DATA
fields = ["First Name", "Last name", "Phone",
"Email", "Date of Birth", "Preferred Method of Contact",
"Desired Appointment Date", "Desired Appointment Time",
"City", "Location", "IP Address", "User-Agent","Referrer"]
records = a.split(line_sep * 2)
all_records = []
for record in records:
splitted_record = record.split(line_sep)
one_record = {}
csv_record = []
for f in fields:
found = False
for one_field in splitted_record:
if one_field.startswith(f):
data = one_field[len(f):].strip()
one_record[f] = data
csv_record.append(data)
found = True
if not found:
csv_record.append("")
all_records.append(";".join(csv_record))
one_record will have the record as dictionary and csv_record will have it as a list of fields (ordered as fields variable)
Edited to add: ignore this answer, the code from Koko Jumbo looks infinitely more sensible and actually gives you a CVS file at the end of it! It was a fun exercise though :)
Just to expand on fcagnola's code a bit.
If it's a quick and dirty one-off, and you know that the data will be consistently presented, the following should work to create a list of dictionaries with the correct key/value pairing. Each line is processed by splitting the line and comparing the line number (reset to 0 with each new dict) against an array of values that represent where the boundary between key and value falls.
For example, "First Name Bob" becomes ["First","Name","Bob"]. The function has been told that linenumber= 0 so it checks entries[linenumber] to get the value "2", which it uses to join the key name (items 0 & 1) and then join the data (items 2 onwards). The end result is ["First Name", "Bob"] which is then added to the dictionary.
class Extract:
def extractEntry(self,linedata,lineindex):
# Hardcoded list! The quick and dirty part.
# This is specific to the example data provided. The entries
# represent the index to be used when splitting the string
# between the key and the data
entries = (2,2,1,1,3,4,3,3,1,1,2,2,1)
return self.createNewEntry(linedata,entries[lineindex])
def createNewEntry(self,linedata,dataindex):
list_data = linedata.split()
key = " ".join(list_data[:dataindex])
data = " ".join(list_data[dataindex:])
return [key,data]
with open('test.txt', 'r') as f:
my_list = list() # this will be the final list
entry = dict() # this contains each user info as a dict
extr = Extract() # class for splitting the entries into key/value
x = 0
for line in f.readlines():
if line.strip() == "": # if line is empty start a new dict
my_list.append(entry) # and append the old one to the list
entry = dict()
x = 0
else: # otherwise split the line and create new dict
extracted_data = extr.extractEntry(line,x)
entry[extracted_data[0]] = extracted_data[1]
x += 1
my_list.append(entry)
print(my_list)

Running multiple querys on YouTube API by looping through title columns of CSV python

I am using YouTubes API to get comment data from a list of music videos. The way I have it working right now is by manually typing in my query and then writing the data to a csv file and repeating for each song like such.
query = "song title"
query_results = service.search().list(
part = 'snippet',
q = query,
order = 'relevance', # You can consider using viewCount
maxResults = 20,
type = 'video', # Channels might appear in search results
relevanceLanguage = 'en',
safeSearch = 'moderate',
).execute()
What I would like to do is use the title and artist columns from a csv file that I have containing the song titles I am trying to gather data for so I can run the program once without having to manually type in the song each time.
A friend suggested using something like this
import pandas as pd
data = pd.read_csv("metadata.csv")
def songtitle():
for i in data.index:
title = data.loc[i,'title']
title = '\"' + title + '\"'
artist = data.loc[i,'artist']
return(artist, title)
But I am not sure how I would make this work because when I run this, it is only returning the final row of data, and even if it did run correctly, how I would handle getting the entire program to repeat it self for every instance of a new song.
you can save song title and artist to a list, the loop over that list to get details.
def get_songTitles():
data = pd.read_csv("metadata.csv")
return data['artist'].tolist(),data['title'].tolist()
artist, song_titles = get_songTitles()
for song in song_titles:
query_results = service.search().list(
part = 'snippet',
q = song,
order = 'relevance', # You can consider using viewCount
maxResults = 20,
type = 'video', # Channels might appear in search results
relevanceLanguage = 'en',
safeSearch = 'moderate',
).execute()

Trying to format text when pulling from webpage HTML

I've created a basic counter for words in a song, but am having trouble formatting the album title and artist name from a given page on this lyrics website. Here's an example of what I am focused on:
I want to format it in this way:
Album Title: [Album Title] (Release_year)
Artist: [Artist Name]
I'm running into two problems:
The album title isn't enclosed in its own tag, so if I call the h1 tag I get both the album name, release year and artist name. How do I call them separately, or how do I break them up when calling them?
The album name has two blank lines and two blank spaces included in the string. How do I get rid of them? The release year prints right next to the album title, which is exactly what I'm looking for, but I cant get the album title to format properly.
This is what I currently have:
song_artist = soup.find("a",{"class":"artist"}).get_text()
album_title = soup.find("h1",{"class":"album_name"}).get_text()
print "Album Title: " + str(album_title)
print "Song Artist: " + str(song_artist.title())
which produces:
Thank you!!
album_title = soup.find("h1",{"class":"album_name"}).find(text=True).strip()
album_year = soup.find("span",{"class":"release_year"}).get_text().strip()
print 'Album Title: {} {}'.format(album_title, album_year)

Searching datastore for data according to grandparent ancestor

Below is python code where I am trying to get reservations information from the Reservations Model.
i=0
for c in courts:
court = names[i]
i=i+1
c_key=c.key()
logging.info("c_key: %s " % c_key)
weekday_key= db.Key.from_path('Courts', 'c_key', 'Days', weekday)
logging.info("weekday_key: %s " % weekday_key)
logging.info("weekday: %s " % weekday)
logging.info("court: %s " % court)
reservation = db.Query(Reservations)
nlimit=2*len(times)
reservations = reservation.fetch(limit=nlimit)
logging.info("reservations: %s " % len(reservations))
There are only two court entities in my Courts database, court1 and court2.
There also only 14 weekday entities in my Days database, 7 for court1 and 7 for court2, named Sunday, ... , Saturday. In the current example I am trying to get the key for the 2 Monday Days, one for court1 and one for court2.
I don't understand why according to the log below, I am getting the same weekday_key for the two different courts which have different keys c_key themselves.
In the log below, whether I put into the db.Key.from_path( command 'c_key' or 'court' I get exactly the same result, which shows that the values of the 2 weekday_keys are identical, not different as I expected.
INFO 2012-09-10 21:25:19,189 views.py:226] c_key: ag1kZXZ-c2NoZWR1bGVycicLEglMb2NhdGlvbnMiBlJvZ2VycwwLEgZDb3VydHMiBmNvdXJ0MQw
INFO 2012-09-10 21:25:19,189 views.py:228] weekday_key: ag1kZXZ-c2NoZWR1bGVyciELEgZDb3VydHMiBWNfa2V5DAsSBERheXMiBk1vbmRheQw
INFO 2012-09-10 21:25:19,189 views.py:229] weekday: Monday
INFO 2012-09-10 21:25:19,189 views.py:230] court: court1
INFO 2012-09-10 21:25:19,192 views.py:235] reservations: 1
INFO 2012-09-10 21:25:19,192 views.py:226] c_key: ag1kZXZ-c2NoZWR1bGVycicLEglMb2NhdGlvbnMiBlJvZ2VycwwLEgZDb3VydHMiBmNvdXJ0Mgw
INFO 2012-09-10 21:25:19,192 views.py:228] weekday_key: ag1kZXZ-c2NoZWR1bGVyciELEgZDb3VydHMiBWNfa2V5DAsSBERheXMiBk1vbmRheQw
INFO 2012-09-10 21:25:19,192 views.py:229] weekday: Monday
INFO 2012-09-10 21:25:19,192 views.py:230] court: court2
INFO 2012-09-10 21:25:19,195 views.py:235] reservations: 1
My Models are as follows.
class Courts(db.Model): #parent is Locations, courtname is key_name
location = db.ReferenceProperty(Locations)
timezone = db.StringProperty()
class Days (db.Model): #parent is Courts, name is key_name, day of week
court = db.ReferenceProperty(Courts)
startTime = db.ListProperty(int)
endTime = db.ListProperty(int)
class Reservations(db.Model): #parent is Days, hour, minute HH:MM is key_name
weekday = db.ReferenceProperty(Days)
day = db.IntegerProperty()
nowweekday = db.IntegerProperty()
name = db.StringProperty()
hour = db.IntegerProperty()
minute = db.IntegerProperty()
You're calculating the keys using the string 'c_key' each time, not the value of the variable c_key.
However even if you fix this it still won't work, since you want the ID of the court, not the full key path.
i=0
for c in courts:
court_id = names[i]
i=i+1
weekday_key = db.Key.from_path('Courts', c.key().name(), 'Days', weekday)
reservation=Reservations.all()
reservation.ancestor(weekday_key)
nlimit=2*len(times)
reservations = reservation.fetch(limit=nlimit)
What I don't like about this answer is that weekday_key is the same for all c in courts. That does not seem right.
How do I construct a query for all Reservations on a specific day in Days at a specific court in Courts on a specific weekday in (week)Days?
You know the values for the keys so you make up a key by hand (so to speak) and then make your query with that key as the ancestor.
So for example:
key = ndb.Key(BlogPost, 12345)
qry = Comment.query(ancestor=key)
but here you'd use something like
key = ndb.key(Locations, "Place1", Courts, "Name_Of_Court")
result = Reservations.query(ancestor=key)
and so on, so you are working your way down the chain and building the key with all the information you have (i.e what court they want to reserve).
Then the results of your ancestor query will be those models that have the key you passed as their ancestors.
https://developers.google.com/appengine/docs/python/ndb/queries#ancestor

Youtube API v3 and python to generate list of views/likes on own youtube videos

I'm trying to get a RaspberryPi3 Python based birdhouse reading the popularity of its own uploaded videos (which enables it to deduct which ones should be deleted and avoid hundreds of uploaded files).
I thought the best way to read #views/#likes, was to use yt_analytics_report.py
When I input it always returns 0 values:
When I input:
$python yt_analytics_report.py --filters="video==bb_o--nP1mg"
or
$python yt_analytics_report.py --filters="channel==UCGXIq7h2UeFh9RhsSEsFsiA"
The output is:
$ python yt_analytics_report.py --filters="video==bb_o--nP1mg"
{'metrics': 'views,estimatedMinutesWatched,averageViewDuration', 'filters': 'video==bb_o--nP1mg', 'ids': 'channel==MINE', 'end_date': '2018-01-12', 'start_date': '2018-01-06'}
Please visit this URL to authorize this application: [note: here was url with authe sequence etc. which I ackced] followed by the result:
views estimatedMinutesWatched averageViewDuration
0.0 0.0 0.0
I'm new to this; The last 3 days I've been testing a variety of filters, but the result is always the same. I guess I do something severely wrong.
The (auto sensor triggered) video uploads work excellent so I presume the root cause is related to the way I'm using the yt-analytics example.
Any suggestions on rootcause or alternative methods to retrieve #views/#likes of self uploaded youtubes are appreciated.
After a few days trying I have found a solution how to generate with Python and the Youtube API v3 a list which contains views, likes etc of uploaded videos of my own Youtube channel.
I would like to share the complete code just in case anyone is having the same challenge. The code contains remarks and referrals to additional information.
Please be aware that using the API consumes API-credits... This implies that (when you run this script continuesly or often) you can run out of your daily maximum numbers of API-credits set by Google.
# This python 2.7.14 example shows how to retrieve with Youtube API v3 a list of uploaded Youtube videos in a channel and also
# shows additional statistics of each individual youtube video such as number of views, likes etc.
# Please notice that YOU HAVE TO change API_KEY and Youtube channelID
# Have a look at the referred examples to get to understand how the API works
#
# The code consists of two parts:
# - The first part queries the videos in a channel and stores it in a list
# - The second part queries in detail each individual video
#
# Credits to the Coding 101 team, the guy previously guiding me to a query and Google API explorer who/which got me on track.
#
# RESULTING EXAMPLE OUTPUT: The output of the will look a bit like this:
#
# https://www.youtube.com/watch?v=T3U2oz_Y8T0
# Upload date: 2018-01-13T09:43:27.000Z
# Number of views: 8
# Number of likes: 2
# Number of dislikes: 0
# Number of favorites:0
# Number of comments: 0
#
# https://www.youtube.com/watch?v=EFyC8nhusR8
# Upload date: 2018-01-06T14:24:34.000Z
# Number of views: 6
# Number of likes: 2
# Number of dislikes: 0
# Number of favorites:0
# Number of comments: 0
#
#
import urllib #importing to use its urlencode function
import urllib2 #for making http requests
import json #for decoding a JSON response
#
API_KEY = 'PLACE YOUR OWN YOUTUBE API HERE' # What? How? Learn here: https://www.youtube.com/watch?v=JbWnRhHfTDA
ChannelIdentifier = 'PLACE YOUR OWN YOUTUBE channelID HERE' # What? How? Learn here: https://www.youtube.com/watch?v=tf42K4pPWkM
#
# This first part will query the list of videos uploaded of a specific channel
# The identification is done through the ChannelIdentifier hwich you have defined as a variable
# The results from this first part will be stored in the list videoMetadata. This will be used in the second part of the code below.
#
# This code is based on the a very good example from Coding 101 which you can find here:https://www.youtube.com/watch?v=_M_wle0Iq9M
#
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&channelId='+ChannelIdentifier+'&maxResults=50&type=video&key='+API_KEY
response = urllib2.urlopen(url) #makes the call to YouTube
videos = json.load(response) #decodes the response so we can work with it
videoMetadata = [] #declaring our list
for video in videos['items']:
if video['id']['kind'] == 'youtube#video':
videoMetadata.append(video['id']['videoId']) #Appends each videoID and link to our list
#
# In this second part, a loop will run through the listvideoMetadata
# During each step the details a specific video are retrieved and displayed
# The structure of the API-return can be tested with the API explorer (which you can excecute without OAuth):
# https://developers.google.com/apis-explorer/#p/youtube/v3/youtube.videos.list?part=snippet%252CcontentDetails%252Cstatistics&id=Ks-_Mh1QhMc&_h=1&
#
for metadata in videoMetadata:
print "https://www.youtube.com/watch?v="+metadata # Here the videoID is printed
SpecificVideoID = metadata
SpecificVideoUrl = 'https://www.googleapis.com/youtube/v3/videos?part=snippet%2CcontentDetails%2Cstatistics&id='+SpecificVideoID+'&key='+API_KEY
response = urllib2.urlopen(SpecificVideoUrl) #makes the call to a specific YouTube
videos = json.load(response) #decodes the response so we can work with it
videoMetadata = [] #declaring our list
for video in videos['items']:
if video['kind'] == 'youtube#video':
print "Upload date: "+video['snippet']['publishedAt'] # Here the upload date of the specific video is listed
print "Number of views: "+video['statistics']['viewCount'] # Here the number of views of the specific video is listed
print "Number of likes: "+video['statistics']['likeCount'] # etc
print "Number of dislikes: "+video['statistics']['dislikeCount']
print "Number of favorites:"+video['statistics']['favoriteCount']
print "Number of comments: "+video['statistics']['commentCount']
print "\n"
Building on Sefo's answer above, I was able to clean up the outputs a bit.
The first function creates a list of relevant videos (you could replace this however you want), and the second iterates through this list and grabs the statistics and basic text data associated with each individual video.
The output is a list of dictionaries, perfect for conversion into a pandas DataFrame.
def youtube_search_list(q, max_results=10):
# Call the search.list method to retrieve results matching the specified
# query term.
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey=DEVELOPER_KEY)
# Call the search.list method to retrieve results matching the specified
# query term.
search_response = youtube.search().list(
q=q,
part='id,snippet',
maxResults=max_results,
order='viewCount'
).execute()
return search_response
def youtube_search_video(q='spinners', max_results=10):
max_results = max_results
order = "viewCount"
token = None
location = None
location_radius = None
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey=DEVELOPER_KEY)
q=q
#Return list of matching records up to max_search
search_result = youtube_search_list(q, max_results)
videos_list = []
for search_result in search_result.get("items", []):
if search_result["id"]["kind"] == 'youtube#video':
temp_dict_ = {}
#Available from initial search
temp_dict_['title'] = search_result['snippet']['title']
temp_dict_['vidId'] = search_result['id']['videoId']
#Secondary call to find statistics results for individual video
response = youtube.videos().list(
part='statistics, snippet',
id=search_result['id']['videoId']
).execute()
response_statistics = response['items'][0]['statistics']
response_snippet = response['items'][0]['snippet']
snippet_list = ['publishedAt','channelId', 'description',
'channelTitle', 'tags', 'categoryId',
'liveBroadcastContent', 'defaultLanguage', ]
for val in snippet_list:
try:
temp_dict_[val] = response_snippet[val]
except:
#Not stored if not present
temp_dict_[val] = 'xxNoneFoundxx'
stats_list = ['favoriteCount', 'viewCount', 'likeCount',
'dislikeCount', 'commentCount']
for val in stats_list:
try:
temp_dict_[val] = response_statistics[val]
except:
#Not stored if not present
temp_dict_[val] = 'xxNoneFoundxx'
#add back to main list
videos_list.append(temp_dict_)
return videos_list
this code will help you a ton, i was struggling with this for a long time, just provide the API key and the youtube channel name and the channel id in the search list
from googleapiclient.discovery import build
DEVELOPER_KEY = "paste your API key here"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
title = []
channelId = []
channelTitle = []
categoryId = []
videoId = []
viewCount = []
likeCount = []
dislikeCount = []
commentCount = []
favoriteCount = []
category = []
tags = []
videos = []
tags = []
max_search = 50
order = "relevance"
token = None
location = None
location_radius = None
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey=DEVELOPER_KEY)
search_result = youtube.search().list(q="put the channel name here", type="video",
pageToken=token, order=order, part="id,snippet",
maxResults=max_search, location=location,
locationRadius=location_radius, channelId='put the
channel ID here').execute() # this line is to get the videos of the channel by the
name of it
for search_result in search_result.get("items", []):
if search_result["id"]["kind"] == 'youtube#video':
title.append(search_result['snippet']['title']) # the title of the video
videoId.append(search_result['id']['videoId']) # the ID of the video
response = youtube.videos().list(part='statistics, snippet',
id=search_result['id'][
'videoId']).execute() # this is the other request because the statistics and
snippet require this because of the API
channelId.append(response['items'][0]['snippet']['channelId']) # channel ID,
which is constant here
channelTitle.append(response['items'][0]['snippet']['channelTitle']) # channel
title, also constant
categoryId.append(response['items'][0]['snippet']['categoryId']) # stores
the categories of the videos
favoriteCount.append(response['items'][0]['statistics']['favoriteCount']) #
stores the favourite count of the videos
viewCount.append(response['items'][0]['statistics']['viewCount']) #
stores the view counts
"""
the likes and dislikes had a bug all along, which required the if else instead of
just behaving like the viewCount"""
if 'likeCount' in response['items'][0]['statistics'].keys(): # checks
for likes count then restores it, if no likes stores 0
likeCount.append(response['items'][0]['statistics']['likeCount'])
else:
likeCount.append('0')
if 'dislikeCount' in response['items'][0]['statistics'].keys(): # checks for dislikes count then stores it, if no dislikes found returns 0
dislikeCount.append(response['items'][0]['statistics']['dislikeCount'])
else:
likeCount.append('0')
if 'commentCount' in response['items'][0]['statistics'].keys(): # checks for comment count then stores it, if no comment found stores 0
commentCount.append(response['items'][0]['statistics']['commentCount'])
else:
commentCount.append('0')
if 'tags' in response['items'][0]['snippet'].keys(): # checks for tags count then stores it, if none found stores 0
tags.append(response['items'][0]['snippet']['tags'])
else:
tags.append('0')
youtube_dict = {
'tags': tags,
'channelId': channelId,
'channelTitle': channelTitle,
'categoryId': categoryId,
'title': title,
'videoId': videoId,
'viewCount': viewCount,
'likeCount': likeCount,
'dislikeCount': dislikeCount,
'commentCount': commentCount,
'favoriteCount': favoriteCount
}
for x in youtube_dict:
print(x)
for y in youtube_dict[x]:
print(y)
please re indent the code because the site ruined the indentation of python to make the code in the code section not in the words section. good luck

Categories

Resources