Crawling youtube user info - python

I'm trying to crawl Youtube to retrieve information about a group of users (approx. 200 people). I'm interested in looking for relationships between the users:
contacts
subscribers
subscriptions
what videos they commented on
etc
I've managed to get contact information with the following source:
import gdata.youtube
import gdata.youtube.service
from gdata.service import RequestError
from pub_author import KEY, NAME_REGEX
def get_details(name):
yt_service = gdata.youtube.service.YouTubeService()
yt_service.developer_key = KEY
contact_feed = yt_service.GetYouTubeContactFeed(username=name)
contacts = [ e.title.text for e in contact_feed.entry ]
return contacts
I can't seem the get the other bits of information I need. The reference guide says that I can grab the XML feed from http://gdata.youtube.com/feeds/api/users/username/subscriptions?v=2 (for some arbitrary user). However, if I try to get other users' subscriptions, I get the a 403 error with the following message:
User must be logged in to access these subscriptions.
If I use the gdata API:
sub_feed = yt_service.GetYouTubeSubscriptionFeed(username=name)
sub = [ e.title.text for e in contact_feed.entry ]
then I get the same error.
How can I get these subscriptions without logging in? It should be possible, as you can access this information without logging in to the Youtube web-site.
Also, there seems to be no feed for the subscribers of particular user. Is this information available through the API?
EDIT
So, it appears this can't be done through the API. I had to do this the quick and dirty way:
for f in `cat users.txt`; do wget "www.youtube.com/profile?user=$f&view=subscriptions" --output-document subscriptions/$f.html; done
Then use this script to get out the usernames from the downloaded HTML files:
"""Extract usernames from a Youtube profile using regex"""
import re
def main():
import sys
lines = open(sys.argv[1]).read().split('\n')
#
# The html files has two <a href="..."> tags for each user: once for an
# image thumbnail, and once for a text link.
#
users = set()
for l in lines:
match = re.search('<a href="/user/(?P<name>[^"]+)" onmousedown', l)
if match:
users.add(match.group('name'))
users = list(users)
users.sort()
print users
if __name__ == '__main__':
main()

In order to access a user's subscriptions feed without the user being logged in, the user must check the "Subscribe to a channel" checkbox under his Account Sharing settings.
Currently, there is no direct way to get a channel's subscribers through the gdata API. In fact, there has been an outstanding feature request for it that has remained open for over 3 years! See Retrieving a list of a user's subscribers?.

Related

I need a way to pull 100 of the most popular channels and their ID's from YouTube using the youtube api in the python language

http://www.youtube.com/dev/ ive already checked the youtube api dev info and have not found anything pertaining to this.
Obtain API youtube
To be able to make requests of this type you must have an API key,
Go to link https://console.developers.google.com/
Create a new project
find youtube Data API v3 and click on it
Enable the API
go to credentials and create one for the project
write it down and insert it in the script below
This script uses ÁPI created previously to make requests on the channels by creating name in a random way and inserts the data in two files in the first it stores all the info in the second only the id, the channel name and the link to the channel, I hope it is what you are looking for ;)
import json
import urllib.request
import string
import random
channels_to_extract = 100
API_KEY = '' #your api key
while True:
random_name = ''.join(random.choice(string.ascii_uppercase) for _ in range(random.randint(3,10))) # for random name of channel to search
urlData = "https://www.googleapis.com/youtube/v3/search?key={}&maxResults={}&part=snippet&type=channel&q={}".format(API_KEY,channels_to_extract,random_name)
webURL = urllib.request.urlopen(urlData)
data = webURL.read()
encoding = webURL.info().get_content_charset('utf-8')
results = json.loads(data.decode(encoding))
results_id={}
if results['pageInfo']["totalResults"]>=channels_to_extract: # may return 0 result because is a random name
break # when find a result break
for result in results['items']:
results_id[result['id']['channelId']]=[result["snippet"]["title"],'https://www.youtube.com/channel/'+result['id']['channelId']] # get id and link of channel for all result
with open("all_info_channels.json","w") as f: # write all info result in a file json
json.dump(results,f,indent=4)
with open("only_id_channels.json","w") as f: # write only id of channels result in a file json
json.dump(results_id,f,indent=4)
for channelId in results_id.keys():
print('Link --> https://www.youtube.com/channel/'+channelId) # link at youtube channel for all result

We are trying from Instagram importing Instagramy from python but it's not working

I want to scrape the data from Instagram different public and private accounts. But when I run the following code.
from instagramy import Instagram
# Connecting the profile
user = Instagram("ihtishamKhattak")
# printing the basic details like
# followers, following, bio
print(user.is_verified())
print(user.popularity())
print(user.get_biography())
# return list of dicts
posts = user.get_posts_details()
print('\n\nLikes', 'Comments')
for post in posts:
likes = post["likes"]
comments = post["comment"]
print(likes,comments)
You can have a look at the instagramy pypi page.
There is no Instagram python module. Also make sure, that you have no script calles instagramy.py
change your script to
from instagramy import InstagramUser
# Connecting the profile
user = InstagramUser("ihtishamKhattak")
Beware that this might work only for few request if you do not provide a Session ID.

Download photo Stats for a given ID from Flickr with python

I am trying to download some photos from Flickr. With My KEY and Secret, I am able to search and download using these lines of code
image_tag = 'seaside'
extras = ','.join(SIZES[0])
flickr = FlickrAPI(KEY, SECRET)
photos = flickr.walk(text=image_tag, # it will search by image title and image tags
extras=extras, # get the urls for each size we want
privacy_filter=1, # search only for public photos
per_page=50,
sort='relevance',
safe_search = 1 )
Using this I am able to acquire the url and the photo ID but I would like to download photostats too (likes, views), but I can't find an appropriate command that starting from the ID of the photo allows me to download the stats.
You can find what you are looking for exactly on Flickr web page, in the API's documentation:
https://www.flickr.com/services/api/flickr.stats.getPhotoStats.html
Calling the method:
flickr.stats.getPhotoStats
with arguments:
api_key, date, photo_id
You will receive what you look for in the following format:
<stats views="24" comments="4" favorites="1" />
Remember to generate before your authentication token, there is a link in this same page on how to generate it, if you still didn't.

Get information for Facebook page in Facebook API graph

I have a simple question in python.
If I have a Facebook page ID let say '6127898346' for example, how can I retrieve this page information such as (Likes count) , and store the result on a file?
Use some sort of Facebook API package, like https://github.com/pythonforfacebook/facebook-sdk available with pip install facebook-sdk.
import facebook
graph = facebook.GraphAPI()
page = graph.get_object('6127898346')
print '{} has {} likes.'.format(page['name'], page['likes'])
Easy way to save everything:
import json
with open('outf.json', 'w') as f:
json.dump(page, f)
Facebook has updated their API and now the same requests would need access token. The graph API is also changed and to get likes count, you can do following:
def get_fb_page_like_count(fb_page_id)
graph = facebook.GraphAPI(access_token=your_access_token)
args = {'fields': 'likes'}
page = graph.get_object(fb_page_id, **args)
return page.get('likes', 0)

Using the Python GData API, cannot get editable video entry

I am having trouble getting a video entry which includes a link rel="edit". I need such an entry in order to be able to call DeleteVideoEntry(...) on it.
I am retrieving the video using GetYouTubeVideoEntry(youtube_id=XXXXXXX). My yt_service is initialized with a username, password, and a developer key. I use ProgrammaticLogin. This part seems to work fine. I use the same yt_service to upload said video earlier. Also, if I change the developer key to something bogus (during debugging) and try to authenticate, I get a 403 error. This leads me to believe that authentication works OK.
Needsless to say, the video entry retrieved with GetYouTubeVideoEntry(youtube_id=XXXXXXX) does not contain the edit link and I cannot use the entry in a DeleteVideoEntry(...) call.
Is there some special way to get a video entry which will contain a link element with a rel="edit"? Can anyone suggest some way to resolve my issue? Could this possibly be a bug?
Update:
For the records, when I tried getting the feed of all my uploads, and then looping through the video entries, the video entries do have an edit link. So using this works:
uri = 'http://gdata.youtube.com/feeds/api/users/%s/uploads' % username
feed = yt_service.GetYouTubeVideoFeed(uri)
for entry in feed.entry:
yt_service.DeleteVideoEntry(entry)
But this does not:
entry = yt_service.GetYouTubeVideoEntry(video_id = video.youtube_id)
yt_service.DeleteVideoEntry(entry)
Using the same yt_service.
I've just deleted youtube video using gdata and ProgrammaticLogin()
Here is some steps to reproduce:
import gdata.youtube.service
yt_service = gdata.youtube.service.YouTubeService()
yt_service.developer_key = 'developer_key'
yt_service.email = 'email'
yt_service.password = 'password'
yt_service.ProgrammaticLogin()
# video_id should looks like 'iu6Gq-tUsTc'
uri = 'https://gdata.youtube.com/feeds/api/users/%s/uploads/%s' % (username, video_id)
entry = yt_service.GetYouTubeUserEntry(uri=uri)
response = yt_service.DeleteVideoEntry(entry)
print response # True
yt_service.GetYouTubeVideoFeed(uri) works because GetYouTubeVideoFeed doesn't check uri and just calls self.Get(uri, ...) but originaly, I think, it expected 'https://gdata.youtube.com/feeds/api/videos' uri.
vice versa yt_service.GetYouTubeVideoEntry() use YOUTUBE_VIDEO_URI = 'https://gdata.youtube.com/feeds/api/videos' but this entry doesn't contains rel="edit"
Hope that helps you out
You can view the HTTP headers of the generated requests by setting the debug flag to true. This is as simple as:
yt_service = gdata.youtube.service.YouTubeService()
yt_service.debug = True
You can read about this in the documentation here.

Categories

Resources