Google Drive API, uploading file [Python] - python

I'm use API for upload my backup from server to my google drive. Authorization passed without problems and file is uploaded, but the file does not appear in the list.
Code:
import httplib2
import pprint
from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials
from apiclient.http import MediaFileUpload
f = file('privatekey.p12', 'rb')
key = f.read()
f.close()
credentials = SignedJwtAssertionCredentials('1234567890#developer.gserviceaccount.com', key, scope='https://www.googleapis.com/auth/drive')
http = httplib2.Http()
http = credentials.authorize(http)
drive_service = build('drive', 'v2', http=http)
media_body = MediaFileUpload('/path/to/file/document.txt', mimetype='text/plain', resumable=True)
body = {
'title': 'My document',
'description': 'A test document',
'mimeType': 'text/plain'
}
file = drive_service.files().insert(body=body, media_body=media_body).execute()
pprint.pprint(file)
Response from api:
{u'alternateLink': u'https://docs.google.com/file/d/0B-FWSwzP0SeyamY1MXFIMmFDZWc/edit?usp=drivesdk',
u'appDataContents': False,
u'copyable': True,
u'createdDate': u'2014-01-04T14:41:19.108Z',
u'description': u'A test document',
u'downloadUrl': u'https://doc-0c-6s-docs.googleusercontent.com/docs/securesc/376up7hhina7i2kr3lb8jjr3i1qgs9i8/hbgdu1q3abhdnhdr75jrpjohg4aphvci/1388844000000/08619299632362135867/08619299632362135867/0B-FWSwzP0SeyamY1MXFIMmFDZWc?h=16653014193614665626&e=download&gd=true',
u'editable': True,
u'etag': u'"G9loKy74Mg0FQ-YRqtCj_yTTrpg/MTM4ODg0NjQ3OTAwMw"',
u'fileExtension': u'',
u'fileSize': u'5',
u'iconLink': u'https://ssl.gstatic.com/docs/doclist/images/icon_10_text_list.png',
u'id': u'0B-FWSwzP0SeyamY1MXFIMmFDZWc',
u'kind': u'drive#file',
u'labels': {u'hidden': False,
u'restricted': False,
u'starred': False,
u'trashed': False,
u'viewed': True},
u'lastModifyingUser': {u'displayName': u'1234567890#developer.gserviceaccount.com',
u'isAuthenticatedUser': True,
u'kind': u'drive#user',
u'permissionId': u'08619299632362135867'},
u'lastModifyingUserName': u'1234567890#developer.gserviceaccount.com',
u'lastViewedByMeDate': u'2014-01-04T14:41:19.003Z',
u'md5Checksum': u'ad0234829205b9033196ba818f7a872b',
u'mimeType': u'text/plain',
u'modifiedByMeDate': u'2014-01-04T14:41:19.003Z',
u'modifiedDate': u'2014-01-04T14:41:19.003Z',
u'originalFilename': u'My document',
u'ownerNames': [u'1234567890#developer.gserviceaccount.com'],
u'owners': [{u'displayName': u'1234567890#developer.gserviceaccount.com',
u'isAuthenticatedUser': True,
u'kind': u'drive#user',
u'permissionId': u'08619299632362135867'}],
u'parents': [{u'id': u'0AOFWSwzP0SeyUk9PVA',
u'isRoot': True,
u'kind': u'drive#parentReference',
u'parentLink': u'https://www.googleapis.com/drive/v2/files/0AOFWSwzP0SeyUk9PVA',
u'selfLink': u'https://www.googleapis.com/drive/v2/files/0B-FWSwzP0SeyamY1MXFIMmFDZWc/parents/0AOFWSwzP0SeyUk9PVA'}],
u'quotaBytesUsed': u'5',
u'selfLink': u'https://www.googleapis.com/drive/v2/files/0B-FWSwzP0SeyamY1MXFIMmFDZWc',
u'shared': False,
u'title': u'My document',
u'userPermission': {u'etag': u'"G9loKy74Mg0FQ-YRqtCj_yTTrpg/ebrUqOkKZ6bmVEtr5zEJa5EOB38"',
u'id': u'me',
u'kind': u'drive#permission',
u'role': u'owner',
u'selfLink': u'https://www.googleapis.com/drive/v2/files/0B-FWSwzP0SeyamY1MXFIMmFDZWc/permissions/me',
u'type': u'user'},
u'webContentLink': u'https://docs.google.com/uc?id=0B-FWSwzP0SeyamY1MXFIMmFDZWc&export=download',
u'writersCanShare': True}

Just search for My document in your google drive , you will find this uploaded files

In Drive v3, we could upload using the create function:
A = service.files().create(media_body = 'pig.png',body = {'name':'pig'}).execute()
Although I have tried and this only works for media file types.
API link:
https://developers.google.com/resources/api-libraries/documentation/drive/v3/python/latest/drive_v3.files.html

Related

Python Download Roblox Data

I'm attempting to format json from the Roblox API. I have a names.txt that stores all of the names. This is how the file looks
rip_robson0007
Abobausrip
app_58230
kakoytochelik123
Ameliathebest727
Sherri0708
HixPlayk
mekayla_091
ddddorffg
ghfgrgt7nfdbfj
TheWolfylikedog
paquita12345jeje
hfsgfhsgfhgfhds
It stores a bunch of usernames seperated by a new line. The code is suppose to use the names and for each name get the JSON from this endpoint https://api.roblox.com/users/get-by-username?username={name} & format it as I have in my code. It always returns error 429 and doesn't save any of the data.
This is the code:
import json
import requests
import time
# Read the names from the text file
with open("./txt/names.txt", "r") as f:
names = f.read().split("\n")
# Initialize an empty dictionary to store the users
users = {}
# Iterate through the names
for name in names:
time.sleep(5)
response = requests.get(f"https://api.roblox.com/users/get-by-username?username={name}")
# Check the status code of the response
if response.status_code != 200:
print(f"Failed to get data for {name}: {response.status_code}")
continue
# Try to parse the response as JSON
try:
user_data = response.json()
except ValueError:
print(f"Failed to parse JSON for {name}")
continue
# Extract the necessary information from the response
user_id = user_data["Id"]
username = user_data["Username"]
avatar_uri = user_data["AvatarUri"]
avatar_final = user_data["AvatarFinal"]
is_online = user_data["IsOnline"]
# Add the user's information to the dictionary
users[user_id] = {
"Id": user_id,
"Username": username,
"AvatarUri": avatar_uri,
"AvatarFinal": avatar_final,
"IsOnline": is_online
}
# Save the dictionary to a JSON file
with open("users.json", "w") as f:
json.dump(users, f)
You can often overcome this with judicious use of proxies.
Start by getting a list of proxies from which you will make random selections. I have a scraper that acquires proxies from https://free-proxy-list.net
My list (extract) looks like this:-
http://103.197.71.7:80 - no
http://163.116.177.33:808 - yes
'yes' means that HTTPS is supported. This list currently contains 95 proxies. It varies depending on the response from my scraper
So we start by parsing the proxy list. Subsequently we choose proxies at random before trying to access the Roblox API. This may not run quickly because the proxies are not necessarily reliable. They are free after all.
from requests import get as GET, packages as PACKAGES
from random import choice as CHOICE
from concurrent.futures import ThreadPoolExecutor as TPE
PACKAGES.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ALL:#SECLEVEL=1'
ROBLOX_API = 'https://api.roblox.com/users/get-by-username'
TIMEOUT = 1
def get_proxies():
http, https = list(), list()
with open('proxylist.txt') as p:
for line in p:
proxy_url, _, supports_https = line.split()
_list = https if supports_https == 'yes' else http
_list.append(proxy_url)
return http, https
http, https = get_proxies()
def process(name):
params = {'username': name.strip()}
while True:
try:
proxy = {'http': CHOICE(http), 'https': CHOICE(https)}
(r := GET(ROBLOX_API, params=params, proxies=proxy, timeout=TIMEOUT)).raise_for_status()
if (j := r.json()).get('success', True):
print(j)
break
except Exception as e:
pass
with open('names.txt') as names:
with TPE() as executor:
executor.map(process, names)
In principle, the while loop in process() could get stuck so it might make sense to limit the number of retries.
This produces the following output:
{'Id': 4082578648, 'Username': 'paquita12345jeje', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 2965702542, 'Username': 'mekayla_091', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 4079018794, 'Username': 'app_58230', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 3437922948, 'Username': 'kakoytochelik123', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 4082346906, 'Username': 'Abobausrip', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 2988555289, 'Username': 'HixPlayk', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 3286921649, 'Username': 'Sherri0708', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 1441252794, 'Username': 'ghfgrgt7nfdbfj', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 4088896225, 'Username': 'ddddorffg', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 3443374919, 'Username': 'TheWolfylikedog', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 3980932331, 'Username': 'Ameliathebest727', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 3773237135, 'Username': 'rip_robson0007', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}
{'Id': 4082991447, 'Username': 'hfsgfhsgfhgfhds', 'AvatarUri': None, 'AvatarFinal': False, 'IsOnline': False}

Get folder and files Google Drive API with Shared Device and Service Account

I'm working with a Google Service Account, I have access to Google Drive API and a Shared Unit.
I need to get access to all the files and folders from a Shared Unit.
I tried a lot of different ways to do this.
drive_service.files().list(
q = f"'{parent_folder}' in parents",
spaces = 'drive',
supportsTeamDrives=True
).execute()
>> {'kind': 'drive#fileList', 'incompleteSearch': False, 'files': []}
drive_service.files().list(
q = f" parents in '{parent_folder}'",
spaces = 'drive',
supportsTeamDrives=True
).execute()
>> {'kind': 'drive#fileList', 'incompleteSearch': False, 'files': []}
drive_service.files().list(
spaces = 'drive',
supportsTeamDrives=True
).execute()
>> {'kind': 'drive#fileList', 'incompleteSearch': False, 'files': []}
drive_service.drives().list().execute()
>> {'kind': 'drive#driveList',
'drives': [{'kind': 'drive#drive',
'id': '0AOELwkzr21lFUk9VA',
'name': 'foo'}]}
I know a have access because I can upload files to the parent folder.
Also, there are files in the parent folder.
Do you have any clue?
Thank you for your time
I figure it out.
An additional parameter had to be passed:
includeItemsFromAllDrives = True,
supportsAllDrives = True
This works:
drive_service.files().list(
q = f"'{parent_folder}' in parents",
spaces = 'drive',
includeItemsFromAllDrives = True,
supportsAllDrives = True
).execute()

Using Python and YouTube API to get all comment and replies [duplicate]

I have been desperately seeking a solution to crawl all comments and corresponding replies for my research. Am having a very hard time creating a data frame that includes comment data in correct and corresponding orders.
I am gonna share my code here so you professionals can take a look and give me some insights.
def get_video_comments(service, **kwargs):
comments = []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
comment2 = item['snippet']['topLevelComment']['snippet']['publishedAt']
comment3 = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
comment4 = item['snippet']['topLevelComment']['snippet']['likeCount']
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply['snippet']['textDisplay']
rtime = reply['snippet']['publishedAt']
rlike = reply['snippet']['likeCount']
data = {'Reply ID': [rauthor], 'Reply Time': [rtime], 'Reply Comments': [rtext], 'Reply Likes': [rlike]}
print(rauthor)
print(rtext)
data = {'Comment':[comment],'Date':[comment2],'ID':[comment3], 'Likes':[comment4]}
result = pd.DataFrame(data)
result.to_csv('youtube.csv', mode='a',header=False)
print(comment)
print(comment2)
print(comment3)
print(comment4)
print('==============================')
comments.append(comment)
# Check if another page exists
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return comments
When I do this, my crawler collects comments but doesn't collect some of the replies that are under certain comments.
How can I make it collect comments and their corresponding replies and put them in a single data frame?
Update
So, somehow I managed to pull the information I wanted at the output section of Jupyter Notebook. All I have to do now is to append the result at the data frame.
Here is my updated code:
def get_video_comments(service, **kwargs):
comments = []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
comment2 = item['snippet']['topLevelComment']['snippet']['publishedAt']
comment3 = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
comment4 = item['snippet']['topLevelComment']['snippet']['likeCount']
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply['snippet']['textDisplay']
rtime = reply['snippet']['publishedAt']
rlike = reply['snippet']['likeCount']
print(rtext)
print(rtime)
print(rauthor)
print('Likes: ', rlike)
print(comment)
print(comment2)
print(comment3)
print("Likes: ", comment4)
print('==============================')
comments.append(comment)
# Check if another page exists
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return comments
The result is:
As you can see, the comments grouped under ======== lines are the comment and corresponding replies underneath.
What would be a good way to append the result into the data frame?
According to the official doc, the property replies.comments[] of CommentThreads resource has the following specification:
replies.comments[] (list)
A list of one or more replies to the top-level comment. Each item in the list is a comment resource.
The list contains a limited number of replies, and unless the number of items in the list equals the value of the snippet.totalReplyCount property, the list of replies is only a subset of the total number of replies available for the top-level comment. To retrieve all of the replies for the top-level comment, you need to call the Comments.list method and use the parentId request parameter to identify the comment for which you want to retrieve replies.
Consequently, if wanting to obtain all reply entries associated to a given top-level comment, you will have to use the Comments.list API endpoint queried appropriately.
I recommend you to read my answer to a very much related question; there are three sections:
Top-Level Comments and Associated Replies,
The property nextPageToken and the parameter pageToken, and
API Limitations Imposed by Design.
From the get go, you'll have to acknowledge that the API (as currently implemented) does not allow to obtain all top-level comments associated to a given video when the number of those comments exceeds a certain (unspecified) upper bound.
For what concerns a Python implementation, I would suggest that you do structure the code as follows:
def get_video_comments(service, video_id):
request = service.commentThreads().list(
videoId = video_id,
part = 'id,snippet,replies',
maxResults = 100
)
comments = []
while request:
response = request.execute()
for comment in response['items']:
reply_count = comment['snippet'] \
['totalReplyCount']
replies = comment.get('replies')
if replies is not None and \
reply_count != len(replies['comments']):
replies['comments'] = get_comment_replies(
service, comment['id'])
# 'comment' is a 'CommentThreads Resource' that has it's
# 'replies.comments' an array of 'Comments Resource'
# Do fill in the 'comments' data structure
# to be provided by this function:
...
request = service.commentThreads().list_next(
request, response)
return comments
def get_comment_replies(service, comment_id):
request = service.comments().list(
parentId = comment_id,
part = 'id,snippet',
maxResults = 100
)
replies = []
while request:
response = request.execute()
replies.extend(response['items'])
request = service.comments().list_next(
request, response)
return replies
Note that the ellipsis dots above -- ... -- would have to be replaced with actual code that fills in the array of structures to be returned by get_video_comments to its caller.
The simplest way (useful for quick testing) would be to have ... replaced with comments.append(comment) and then the caller of get_video_comments to simply pretty print (using json.dump) the object obtained from that function.
Based on stvar' answer and the original publication here I built this code:
import os
import pickle
import csv
import json
import google.oauth2.credentials
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
CLIENT_SECRETS_FILE = "client_secret.json" # for more information to create your credentials json please visit https://python.gotrained.com/youtube-api-extracting-comments/
SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'
def get_authenticated_service():
credentials = None
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
credentials = pickle.load(token)
# Check if the credentials are invalid or do not exist
if not credentials or not credentials.valid:
# Check if the credentials have expired
if credentials and credentials.expired and credentials.refresh_token:
credentials.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
CLIENT_SECRETS_FILE, SCOPES)
credentials = flow.run_console()
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(credentials, token)
return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)
def get_video_comments(service, **kwargs):
request = service.commentThreads().list(**kwargs)
comments = []
while request:
response = request.execute()
for comment in response['items']:
reply_count = comment['snippet'] \
['totalReplyCount']
replies = comment.get('replies')
if replies is not None and \
reply_count != len(replies['comments']):
replies['comments'] = get_comment_replies(
service, comment['id'])
# 'comment' is a 'CommentThreads Resource' that has it's
# 'replies.comments' an array of 'Comments Resource'
# Do fill in the 'comments' data structure
# to be provided by this function:
comments.append(comment)
request = service.commentThreads().list_next(
request, response)
return comments
def get_comment_replies(service, comment_id):
request = service.comments().list(
parentId = comment_id,
part = 'id,snippet',
maxResults = 1000
)
replies = []
while request:
response = request.execute()
replies.extend(response['items'])
request = service.comments().list_next(
request, response)
return replies
if __name__ == '__main__':
# When running locally, disable OAuthlib's HTTPs verification. When
# running in production *do not* leave this option enabled.
os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
service = get_authenticated_service()
videoId = input('Enter Video id : ') # video id here (the video id of https://www.youtube.com/watch?v=vedLpKXzZqE -> is vedLpKXzZqE)
comments = get_video_comments(service, videoId=videoId, part='id,snippet,replies', maxResults = 1000)
with open('youtube_comments', 'w', encoding='UTF8') as f:
writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in comments:
# convert the tuple to a list and write to the output file
writer.writerow([row])
it returns a file called youtube_comments with this format:
"{'kind': 'youtube#commentThread', 'etag': 'gvhv4hkH0H2OqQAHQKxzfA-K_tA', 'id': 'UgzSgI1YEvwcuF4cPwN4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'topLevelComment': {'kind': 'youtube#comment', 'etag': 'qpuKZcuD4FKf6BHgRlMunersEeU', 'id': 'UgzSgI1YEvwcuF4cPwN4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'This is a comment', 'textOriginal': 'This is a comment', 'authorDisplayName': 'Gabriell Magana', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLRGBvo2ZncDP1xGjlX6anfUufNYi9b3w9kYZFDl=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UCKAa4FYftXsN7VKaPSlCivg', 'authorChannelId': {'value': 'UCKAa4FYftXsN7VKaPSlCivg'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 8, 'publishedAt': '2019-05-22T12:38:34Z', 'updatedAt': '2019-05-22T12:38:34Z'}}, 'canReply': True, 'totalReplyCount': 0, 'isPublic': True}}"
"{'kind': 'youtube#commentThread', 'etag': 'DsgDziMk7mB7xN4OoX7cmqlbDYE', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'topLevelComment': {'kind': 'youtube#comment', 'etag': 'NYjvYM9W_umBafAfQkdg1P9apgg', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'This is another comment', 'textOriginal': 'This is another comment', 'authorDisplayName': 'Mary Montes', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLTg1b1yw8BX8Af0PoTR_t5OOwP9Cfl9_qL-o1iikw=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UC_GP_8HxDPsqJjJ3Fju_UeA', 'authorChannelId': {'value': 'UC_GP_8HxDPsqJjJ3Fju_UeA'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 9, 'publishedAt': '2019-05-15T05:10:49Z', 'updatedAt': '2019-05-15T05:10:49Z'}}, 'canReply': True, 'totalReplyCount': 3, 'isPublic': True}, 'replies': {'comments': [{'kind': 'youtube#comment', 'etag': 'Tu41ENCZYNJ2KBpYeYz4qgre0H8', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg.8uwduw6ppF79DbfJ9zMKxM', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'this is first reply', 'parentId': 'UgytsI51LU6BWRmYtBB4AaABAg', 'authorDisplayName': 'JULIO EMPRESARIO', 'authorProfileImageUrl': 'https://yt3.ggpht.com/eYP4MBcZ4bON_pHtdbtVsyWnsKbpNKye2wTPhgkffkMYk3ZbN0FL6Aa1o22YlFjn2RVUAkSQYw=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UCrpB9oZZZfmBv1aQsxrk66w', 'authorChannelId': {'value': 'UCrpB9oZZZfmBv1aQsxrk66w'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 2, 'publishedAt': '2020-09-15T04:06:50Z', 'updatedAt': '2020-09-15T04:06:50Z'}}, {'kind': 'youtube#comment', 'etag': 'OrpbnJddwzlzwGArCgtuuBsYr94', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg.8uwduw6ppF795E1w8RV1DJ', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'the second replay', 'textOriginal': 'the second replay', 'parentId': 'UgytsI51LU6BWRmYtBB4AaABAg', 'authorDisplayName': 'Anatolio27 Diaz', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLR1hOySIxEkvRCySExHjo3T6zGBNkvuKpPkqA=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UC04N8BM5aUwDJf-PNFxKI-g', 'authorChannelId': {'value': 'UC04N8BM5aUwDJf-PNFxKI-g'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 2, 'publishedAt': '2020-02-19T18:21:06Z', 'updatedAt': '2020-02-19T18:21:06Z'}}, {'kind': 'youtube#comment', 'etag': 'sPmIwerh3DTZshLiDVwOXn_fJx0', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg.8uwduw6ppF78wwH6Aabh4y', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'A third reply', 'textOriginal': 'A third reply', 'parentId': 'UgytsI51LU6BWRmYtBB4AaABAg', 'authorDisplayName': 'Voy detrás de mi pasión', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLTgzZ3ZFvkmmAlMzA77ApM-2uGFfvOBnzxegYEX=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UCvv6QMokO7KcJCDpK6qZg3Q', 'authorChannelId': {'value': 'UCvv6QMokO7KcJCDpK6qZg3Q'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 2, 'publishedAt': '2019-07-03T18:45:34Z', 'updatedAt': '2019-07-03T18:45:34Z'}}]}}"
Now it is necessary a second step in order to information required. For this I a set of bash script toos like cut, awk and set:
cut -d ":" -f 10- youtube_comments | sed -e "s/', '/\n/g" -e "s/'//g" | awk '/replies/{print "------------------------****---------::: Replies: "$6" :::---------******--------------------------------"}!/replies/{print}' |sed '/^textOriginal:/,/^authorDisplayName:/{/^authorDisplayName/!d}' |sed '/^authorProfileImageUrl:\|^authorChannelUrl:\|^authorChannelId:\|^etag:\|^updatedAt:\|^parentId:\|^id:/d' |sed 's/<[^>]*>//g' | sed 's/{textDisplay/{\ntextDisplay/' |sed '/^snippet:/d' | awk -F":" '(NF==1){print "========================================COMMENT==========================================="}(NF>1){a=0; print $0}' | sed 's/textDisplay: //g' | sed 's/authorDisplayName/User/g' | sed 's/T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}Z//g' | sed 's/likeCount: /Likes:/g' | sed 's/publishedAt: //g' > output_file
The final result is a file called output_file with this format:
========================================COMMENT===========================================
This is a comment
User: Robert Everest
Likes:8, 2019-05-22
========================================COMMENT===========================================
This is another comment
User: Anna Davis
Likes:9, 2019-05-15
------------------------****---------::: Replies: 3, :::---------******--------------------------------
this is first reply
User: John Doe
Likes:2, 2020-09-15
the second replay
User: Caraqueno
Likes:2, 2020-02-19
A third reply
User: Rebeca
Likes:2, 2019-07-03
The python script requires of the file token.pickle to work, it is generated the first time the python script run and when it expired, it have to be deleted and generated again.
I had a similar issue that the OP does and managed to solve it, but someone in the community closed my question after I solved it and can't post there. I'm posting it here for fidelity.
The YouTube API doesn't allow users to grab nested replies to comments. What it does allow is you to get the replies to the comments and all the comments i.e. Video --> Comments --> Comment Replies ---> Reply To Reply et al. Knowing this limitation we can write code to get all the top Comments, and then break into those comments to get the first-level replies.
Moduels
import os
import googleapiclient.discovery #required for using googleapi
import pandas as pd #require for data munging. We use pd.json_normalize to create the tables
import numpy as np #just good to have
import json # the requests are returned as json objects.
from datetime import datetime #good to have for date modification
Get All Comments Function
For a given vidId, this function will get the first 100 comments and place them into a df. It then use a while loop to check to see if the response api contains nextPageToken. While it does, it will continue to run to get all the comments until either all the comments are pulled or you run out of credits, whichever happens first.
def vidcomments(vidId):
# Disable OAuthlib's HTTPS verification when running locally.
# *DO NOT* leave this option enabled in production.
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = "yourapikey" #<--- insert API key here
youtube = googleapiclient.discovery.build(
api_service_name, api_version, developerKey = DEVELOPER_KEY)
request = youtube.commentThreads().list(
part="snippet, replies",
order="time",
maxResults=100,
textFormat="plainText",
videoId=vidId
)
response = request.execute()
full = pd.json_normalize(response, record_path=['items'])
while response:
if 'nextPageToken' in response:
response = youtube.commentThreads().list(
part="snippet",
maxResults=100,
textFormat='plainText',
order='time',
videoId=vidId,
pageToken=response['nextPageToken']
).execute()
df2 = pd.json_normalize(response, record_path=['items'])
full = full.append(df2)
else:
break
return full
Get All Replies To Comments Function
For a particular parentId, get all the first-level replies. Like the vidcomments() function noted above, it will run until all replies to all comments are pulled or you run out of credits, whichever happens first.
def repliesto(parentId):
# Disable OAuthlib's HTTPS verification when running locally.
# *DO NOT* leave this option enabled in production.
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = DevKey #your dev key
youtube = googleapiclient.discovery.build(
api_service_name, api_version, developerKey = DEVELOPER_KEY)
request = youtube.comments().list(
part="snippet",
maxResults=100,
parentId=parentId,
textFormat="plainText"
)
response = request.execute()
replies = pd.json_normalize(response, record_path=['items'])
while response:
if 'nextPageToken' in response:
response = youtube.comments().list(
part="snippet",
maxResults=100,
parentId=parentId,
textFormat="plainText",
pageToken=response['nextPageToken']
).execute()
df2 = pd.json_normalize(response, record_path=['items'])
replies = pd.concat([replies, df2], sort=False)
else:
break
return replies
Putting It Together
First, run the vidcomments function to get all the comments information. Then use the code below to get all the reply information using a for loop to pull in each topLevelComment.id into a list, then use the list and another for loop to build the replies dataframe. This will create two separate Dataframes, one for Comments and another for Replies. After creating both of these Dataframes you can then join them in a way that makes sense for your purpose, either concat/union or a join/merge.
replyto = []
for reply in full[(full['snippet.totalReplyCount']>0)]
['snippet.topLevelComment.id']:
replyto.append(reply)
# create an empty DF to store all the replies
# use a for loop to place each item in our replyto list into the function defined above
replies = pd.DataFrame()
for reply in replyto:
df = repliesto(reply)
replies = pd.concat([replies, df], ignore_index=True)

YouTube Data API to crawl all comments and replies

I have been desperately seeking a solution to crawl all comments and corresponding replies for my research. Am having a very hard time creating a data frame that includes comment data in correct and corresponding orders.
I am gonna share my code here so you professionals can take a look and give me some insights.
def get_video_comments(service, **kwargs):
comments = []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
comment2 = item['snippet']['topLevelComment']['snippet']['publishedAt']
comment3 = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
comment4 = item['snippet']['topLevelComment']['snippet']['likeCount']
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply['snippet']['textDisplay']
rtime = reply['snippet']['publishedAt']
rlike = reply['snippet']['likeCount']
data = {'Reply ID': [rauthor], 'Reply Time': [rtime], 'Reply Comments': [rtext], 'Reply Likes': [rlike]}
print(rauthor)
print(rtext)
data = {'Comment':[comment],'Date':[comment2],'ID':[comment3], 'Likes':[comment4]}
result = pd.DataFrame(data)
result.to_csv('youtube.csv', mode='a',header=False)
print(comment)
print(comment2)
print(comment3)
print(comment4)
print('==============================')
comments.append(comment)
# Check if another page exists
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return comments
When I do this, my crawler collects comments but doesn't collect some of the replies that are under certain comments.
How can I make it collect comments and their corresponding replies and put them in a single data frame?
Update
So, somehow I managed to pull the information I wanted at the output section of Jupyter Notebook. All I have to do now is to append the result at the data frame.
Here is my updated code:
def get_video_comments(service, **kwargs):
comments = []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
comment2 = item['snippet']['topLevelComment']['snippet']['publishedAt']
comment3 = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
comment4 = item['snippet']['topLevelComment']['snippet']['likeCount']
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply['snippet']['textDisplay']
rtime = reply['snippet']['publishedAt']
rlike = reply['snippet']['likeCount']
print(rtext)
print(rtime)
print(rauthor)
print('Likes: ', rlike)
print(comment)
print(comment2)
print(comment3)
print("Likes: ", comment4)
print('==============================')
comments.append(comment)
# Check if another page exists
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return comments
The result is:
As you can see, the comments grouped under ======== lines are the comment and corresponding replies underneath.
What would be a good way to append the result into the data frame?
According to the official doc, the property replies.comments[] of CommentThreads resource has the following specification:
replies.comments[] (list)
A list of one or more replies to the top-level comment. Each item in the list is a comment resource.
The list contains a limited number of replies, and unless the number of items in the list equals the value of the snippet.totalReplyCount property, the list of replies is only a subset of the total number of replies available for the top-level comment. To retrieve all of the replies for the top-level comment, you need to call the Comments.list method and use the parentId request parameter to identify the comment for which you want to retrieve replies.
Consequently, if wanting to obtain all reply entries associated to a given top-level comment, you will have to use the Comments.list API endpoint queried appropriately.
I recommend you to read my answer to a very much related question; there are three sections:
Top-Level Comments and Associated Replies,
The property nextPageToken and the parameter pageToken, and
API Limitations Imposed by Design.
From the get go, you'll have to acknowledge that the API (as currently implemented) does not allow to obtain all top-level comments associated to a given video when the number of those comments exceeds a certain (unspecified) upper bound.
For what concerns a Python implementation, I would suggest that you do structure the code as follows:
def get_video_comments(service, video_id):
request = service.commentThreads().list(
videoId = video_id,
part = 'id,snippet,replies',
maxResults = 100
)
comments = []
while request:
response = request.execute()
for comment in response['items']:
reply_count = comment['snippet'] \
['totalReplyCount']
replies = comment.get('replies')
if replies is not None and \
reply_count != len(replies['comments']):
replies['comments'] = get_comment_replies(
service, comment['id'])
# 'comment' is a 'CommentThreads Resource' that has it's
# 'replies.comments' an array of 'Comments Resource'
# Do fill in the 'comments' data structure
# to be provided by this function:
...
request = service.commentThreads().list_next(
request, response)
return comments
def get_comment_replies(service, comment_id):
request = service.comments().list(
parentId = comment_id,
part = 'id,snippet',
maxResults = 100
)
replies = []
while request:
response = request.execute()
replies.extend(response['items'])
request = service.comments().list_next(
request, response)
return replies
Note that the ellipsis dots above -- ... -- would have to be replaced with actual code that fills in the array of structures to be returned by get_video_comments to its caller.
The simplest way (useful for quick testing) would be to have ... replaced with comments.append(comment) and then the caller of get_video_comments to simply pretty print (using json.dump) the object obtained from that function.
Based on stvar' answer and the original publication here I built this code:
import os
import pickle
import csv
import json
import google.oauth2.credentials
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
CLIENT_SECRETS_FILE = "client_secret.json" # for more information to create your credentials json please visit https://python.gotrained.com/youtube-api-extracting-comments/
SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'
def get_authenticated_service():
credentials = None
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
credentials = pickle.load(token)
# Check if the credentials are invalid or do not exist
if not credentials or not credentials.valid:
# Check if the credentials have expired
if credentials and credentials.expired and credentials.refresh_token:
credentials.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
CLIENT_SECRETS_FILE, SCOPES)
credentials = flow.run_console()
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(credentials, token)
return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)
def get_video_comments(service, **kwargs):
request = service.commentThreads().list(**kwargs)
comments = []
while request:
response = request.execute()
for comment in response['items']:
reply_count = comment['snippet'] \
['totalReplyCount']
replies = comment.get('replies')
if replies is not None and \
reply_count != len(replies['comments']):
replies['comments'] = get_comment_replies(
service, comment['id'])
# 'comment' is a 'CommentThreads Resource' that has it's
# 'replies.comments' an array of 'Comments Resource'
# Do fill in the 'comments' data structure
# to be provided by this function:
comments.append(comment)
request = service.commentThreads().list_next(
request, response)
return comments
def get_comment_replies(service, comment_id):
request = service.comments().list(
parentId = comment_id,
part = 'id,snippet',
maxResults = 1000
)
replies = []
while request:
response = request.execute()
replies.extend(response['items'])
request = service.comments().list_next(
request, response)
return replies
if __name__ == '__main__':
# When running locally, disable OAuthlib's HTTPs verification. When
# running in production *do not* leave this option enabled.
os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
service = get_authenticated_service()
videoId = input('Enter Video id : ') # video id here (the video id of https://www.youtube.com/watch?v=vedLpKXzZqE -> is vedLpKXzZqE)
comments = get_video_comments(service, videoId=videoId, part='id,snippet,replies', maxResults = 1000)
with open('youtube_comments', 'w', encoding='UTF8') as f:
writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in comments:
# convert the tuple to a list and write to the output file
writer.writerow([row])
it returns a file called youtube_comments with this format:
"{'kind': 'youtube#commentThread', 'etag': 'gvhv4hkH0H2OqQAHQKxzfA-K_tA', 'id': 'UgzSgI1YEvwcuF4cPwN4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'topLevelComment': {'kind': 'youtube#comment', 'etag': 'qpuKZcuD4FKf6BHgRlMunersEeU', 'id': 'UgzSgI1YEvwcuF4cPwN4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'This is a comment', 'textOriginal': 'This is a comment', 'authorDisplayName': 'Gabriell Magana', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLRGBvo2ZncDP1xGjlX6anfUufNYi9b3w9kYZFDl=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UCKAa4FYftXsN7VKaPSlCivg', 'authorChannelId': {'value': 'UCKAa4FYftXsN7VKaPSlCivg'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 8, 'publishedAt': '2019-05-22T12:38:34Z', 'updatedAt': '2019-05-22T12:38:34Z'}}, 'canReply': True, 'totalReplyCount': 0, 'isPublic': True}}"
"{'kind': 'youtube#commentThread', 'etag': 'DsgDziMk7mB7xN4OoX7cmqlbDYE', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'topLevelComment': {'kind': 'youtube#comment', 'etag': 'NYjvYM9W_umBafAfQkdg1P9apgg', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'This is another comment', 'textOriginal': 'This is another comment', 'authorDisplayName': 'Mary Montes', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLTg1b1yw8BX8Af0PoTR_t5OOwP9Cfl9_qL-o1iikw=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UC_GP_8HxDPsqJjJ3Fju_UeA', 'authorChannelId': {'value': 'UC_GP_8HxDPsqJjJ3Fju_UeA'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 9, 'publishedAt': '2019-05-15T05:10:49Z', 'updatedAt': '2019-05-15T05:10:49Z'}}, 'canReply': True, 'totalReplyCount': 3, 'isPublic': True}, 'replies': {'comments': [{'kind': 'youtube#comment', 'etag': 'Tu41ENCZYNJ2KBpYeYz4qgre0H8', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg.8uwduw6ppF79DbfJ9zMKxM', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'this is first reply', 'parentId': 'UgytsI51LU6BWRmYtBB4AaABAg', 'authorDisplayName': 'JULIO EMPRESARIO', 'authorProfileImageUrl': 'https://yt3.ggpht.com/eYP4MBcZ4bON_pHtdbtVsyWnsKbpNKye2wTPhgkffkMYk3ZbN0FL6Aa1o22YlFjn2RVUAkSQYw=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UCrpB9oZZZfmBv1aQsxrk66w', 'authorChannelId': {'value': 'UCrpB9oZZZfmBv1aQsxrk66w'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 2, 'publishedAt': '2020-09-15T04:06:50Z', 'updatedAt': '2020-09-15T04:06:50Z'}}, {'kind': 'youtube#comment', 'etag': 'OrpbnJddwzlzwGArCgtuuBsYr94', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg.8uwduw6ppF795E1w8RV1DJ', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'the second replay', 'textOriginal': 'the second replay', 'parentId': 'UgytsI51LU6BWRmYtBB4AaABAg', 'authorDisplayName': 'Anatolio27 Diaz', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLR1hOySIxEkvRCySExHjo3T6zGBNkvuKpPkqA=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UC04N8BM5aUwDJf-PNFxKI-g', 'authorChannelId': {'value': 'UC04N8BM5aUwDJf-PNFxKI-g'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 2, 'publishedAt': '2020-02-19T18:21:06Z', 'updatedAt': '2020-02-19T18:21:06Z'}}, {'kind': 'youtube#comment', 'etag': 'sPmIwerh3DTZshLiDVwOXn_fJx0', 'id': 'UgytsI51LU6BWRmYtBB4AaABAg.8uwduw6ppF78wwH6Aabh4y', 'snippet': {'videoId': 'tGTaBt4Hfd0', 'textDisplay': 'A third reply', 'textOriginal': 'A third reply', 'parentId': 'UgytsI51LU6BWRmYtBB4AaABAg', 'authorDisplayName': 'Voy detrás de mi pasión', 'authorProfileImageUrl': 'https://yt3.ggpht.com/ytc/AKedOLTgzZ3ZFvkmmAlMzA77ApM-2uGFfvOBnzxegYEX=s48-c-k-c0x00ffffff-no-rj', 'authorChannelUrl': 'http://www.youtube.com/channel/UCvv6QMokO7KcJCDpK6qZg3Q', 'authorChannelId': {'value': 'UCvv6QMokO7KcJCDpK6qZg3Q'}, 'canRate': True, 'viewerRating': 'none', 'likeCount': 2, 'publishedAt': '2019-07-03T18:45:34Z', 'updatedAt': '2019-07-03T18:45:34Z'}}]}}"
Now it is necessary a second step in order to information required. For this I a set of bash script toos like cut, awk and set:
cut -d ":" -f 10- youtube_comments | sed -e "s/', '/\n/g" -e "s/'//g" | awk '/replies/{print "------------------------****---------::: Replies: "$6" :::---------******--------------------------------"}!/replies/{print}' |sed '/^textOriginal:/,/^authorDisplayName:/{/^authorDisplayName/!d}' |sed '/^authorProfileImageUrl:\|^authorChannelUrl:\|^authorChannelId:\|^etag:\|^updatedAt:\|^parentId:\|^id:/d' |sed 's/<[^>]*>//g' | sed 's/{textDisplay/{\ntextDisplay/' |sed '/^snippet:/d' | awk -F":" '(NF==1){print "========================================COMMENT==========================================="}(NF>1){a=0; print $0}' | sed 's/textDisplay: //g' | sed 's/authorDisplayName/User/g' | sed 's/T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}Z//g' | sed 's/likeCount: /Likes:/g' | sed 's/publishedAt: //g' > output_file
The final result is a file called output_file with this format:
========================================COMMENT===========================================
This is a comment
User: Robert Everest
Likes:8, 2019-05-22
========================================COMMENT===========================================
This is another comment
User: Anna Davis
Likes:9, 2019-05-15
------------------------****---------::: Replies: 3, :::---------******--------------------------------
this is first reply
User: John Doe
Likes:2, 2020-09-15
the second replay
User: Caraqueno
Likes:2, 2020-02-19
A third reply
User: Rebeca
Likes:2, 2019-07-03
The python script requires of the file token.pickle to work, it is generated the first time the python script run and when it expired, it have to be deleted and generated again.
I had a similar issue that the OP does and managed to solve it, but someone in the community closed my question after I solved it and can't post there. I'm posting it here for fidelity.
The YouTube API doesn't allow users to grab nested replies to comments. What it does allow is you to get the replies to the comments and all the comments i.e. Video --> Comments --> Comment Replies ---> Reply To Reply et al. Knowing this limitation we can write code to get all the top Comments, and then break into those comments to get the first-level replies.
Moduels
import os
import googleapiclient.discovery #required for using googleapi
import pandas as pd #require for data munging. We use pd.json_normalize to create the tables
import numpy as np #just good to have
import json # the requests are returned as json objects.
from datetime import datetime #good to have for date modification
Get All Comments Function
For a given vidId, this function will get the first 100 comments and place them into a df. It then use a while loop to check to see if the response api contains nextPageToken. While it does, it will continue to run to get all the comments until either all the comments are pulled or you run out of credits, whichever happens first.
def vidcomments(vidId):
# Disable OAuthlib's HTTPS verification when running locally.
# *DO NOT* leave this option enabled in production.
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = "yourapikey" #<--- insert API key here
youtube = googleapiclient.discovery.build(
api_service_name, api_version, developerKey = DEVELOPER_KEY)
request = youtube.commentThreads().list(
part="snippet, replies",
order="time",
maxResults=100,
textFormat="plainText",
videoId=vidId
)
response = request.execute()
full = pd.json_normalize(response, record_path=['items'])
while response:
if 'nextPageToken' in response:
response = youtube.commentThreads().list(
part="snippet",
maxResults=100,
textFormat='plainText',
order='time',
videoId=vidId,
pageToken=response['nextPageToken']
).execute()
df2 = pd.json_normalize(response, record_path=['items'])
full = full.append(df2)
else:
break
return full
Get All Replies To Comments Function
For a particular parentId, get all the first-level replies. Like the vidcomments() function noted above, it will run until all replies to all comments are pulled or you run out of credits, whichever happens first.
def repliesto(parentId):
# Disable OAuthlib's HTTPS verification when running locally.
# *DO NOT* leave this option enabled in production.
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = DevKey #your dev key
youtube = googleapiclient.discovery.build(
api_service_name, api_version, developerKey = DEVELOPER_KEY)
request = youtube.comments().list(
part="snippet",
maxResults=100,
parentId=parentId,
textFormat="plainText"
)
response = request.execute()
replies = pd.json_normalize(response, record_path=['items'])
while response:
if 'nextPageToken' in response:
response = youtube.comments().list(
part="snippet",
maxResults=100,
parentId=parentId,
textFormat="plainText",
pageToken=response['nextPageToken']
).execute()
df2 = pd.json_normalize(response, record_path=['items'])
replies = pd.concat([replies, df2], sort=False)
else:
break
return replies
Putting It Together
First, run the vidcomments function to get all the comments information. Then use the code below to get all the reply information using a for loop to pull in each topLevelComment.id into a list, then use the list and another for loop to build the replies dataframe. This will create two separate Dataframes, one for Comments and another for Replies. After creating both of these Dataframes you can then join them in a way that makes sense for your purpose, either concat/union or a join/merge.
replyto = []
for reply in full[(full['snippet.totalReplyCount']>0)]
['snippet.topLevelComment.id']:
replyto.append(reply)
# create an empty DF to store all the replies
# use a for loop to place each item in our replyto list into the function defined above
replies = pd.DataFrame()
for reply in replyto:
df = repliesto(reply)
replies = pd.concat([replies, df], ignore_index=True)

Choosing an equlation from api response Python

Im trying to work with api responses;
Here is the example response that comes from api;
{u'blog': {u'followed': False, u'is_adult': False, u'can_subscribe': False, u'is_nsfw': False, u'ask': True, u'likes': 920, u'is_blocked_from_primary': False, u'can_submit': True, u'ask_anon': True, u'subscribed': False, u'share_likes': True, u'updated': 1493576375, u'description': u'<p>"Che hai dei bellissimi occhi quando mi cerchi."</p><p>18 </p><p>Beginner Wiccan and Witch </p><p>\U0001f312\U0001f315\U0001f318</p>', u'total_posts': 13992, u'submission_page_title': u'Submit', u'submission_terms': {u'title': u'Submit', u'tags': [], u'guidelines': u'', u'accepted_types': [u'text', u'photo', u'quote', u'link', u'video']}, u'name': u'darknessinmyheartt', u'url': u'http://darknessinmyheartt.tumblr.com/', u'ask_page_title': u'lets ask something!/ haydi sor!', u'title': u'"Laurel"', u'posts': 13992, u'reply_conditions': u'3', u'can_send_fan_mail': False}}
how can ı get only the value of u'updated' from that response
u'updated': 1493576375
I have to define that value to "x"
I think the tumblr api response is a python dict, based on that, try:
x = client.blog_info('darknessinmyheartt')
print x['blog']['updated']
You got json response, in Python you can get value from json data, via key.
For your example, you need to do next things:
your_data['blog']['updated']
via your_data['blog'] You will get object with key, values
{'followed': False, u'is_adult': False, u'can_subscribe': False, ....}
and via your_data['blog']['updated'] you will get value 1493576375

Categories

Resources