I want to be able to use urllib2.urlopen() or requests.get() with http://plus.google.com/* url's.
Using python, how would I go about doing that? I need to login first, but how?
The following code returns something along the lines of:
"Your browser's cookie functionality is turned off. Please turn it on."
Well, the cookie itself is created and, and I tested robots.txt, there are no diallows... I also tried switching user agents, no luck.
cookie_filename = "google.cookie"
email = raw_input("Enter your Google username: ")
password = getpass.getpass("Enter your password: ")
self.cj = cookielib.MozillaCookieJar(cookie_filename)
self.cj.load()
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel = 0),
urllib2.HTTPSHandler(debuglevel = 0),
urllib2.HTTPCookieProcessor(self.cj)
)
urllib2.install_opener(self.opener)
login_page_url = 'https://www.google.com/accounts/ServiceLogin?passive=true&service=grandcentral'
authenticate_url = 'https://www.google.com/accounts/ServiceLoginAuth?service=grandcentral'
gv_home_page_url = 'https://www.google.com/voice/#inbox'
# Load sign in page
login_page_contents = self.opener.open(login_page_url).read()
# Find GALX value
galx_match_obj = re.search(r'name="GALX"\s*value="([^"]+)"', login_page_contents, re.IGNORECASE)
galx_value = galx_match_obj.group(1) if galx_match_obj.group(1) is not None else ''
# Set up login credentials
login_params = urllib.urlencode( {
'Email' : email,
'Passwd' : password,
'continue' : 'https://www.google.com/voice/account/signin',
'GALX': galx_value
})
# Login
resp = self.opener.open(authenticate_url, login_params).readlines()
print resp
self.opener.open(authenticate_url, login_params).readlines()
self.cj.save()
# Open GV home page
gv_home_page_contents = self.opener.open(gv_home_page_url).read()
print gv_home_page_contents
Related
The website I need to login to has an initial page for only entering the username and a second where I just enter the password, but the username block is still displayed with the username filled in.
I post the username and password as such:
session = requsts.session()
prelogin_url = ''
result = session.get(prelogin_url)
payload = {'j_username': 'username'}
result = session.post(
prelogin_url,
data = payload
)
# page 2
login_url = ''
result = session.get(login_url)
password = {'j_password': 'password'}
result = session.post(
login_url,
data = password
)
When I check the content, it shows me that the login failed. There are no tokens.
What am I missing here?
Thanks.
I'm trying to scrape username and media information from a list of Instagram user using unofficial instagram-Api with python.
The library is there
I understand how I can scrape information from the user that is logged but i can't understand how i can refer to another username.
This is the code for taking my Instagram information
from InstagramAPI import InstagramAPI
import time
username = 'myUser'
pwd = 'mypass'
API = InstagramAPI(username,pwd)
API.login()
time.sleep(2)
API.getProfileData()
pk = API.LastJson['user']['pk']
maxid = ''
while True:
API.getUserFeed(pk, maxid)
feed = API.LastJson
if 'fail' in feed['status']:
break
for i in range(0, len(feed['items']) - 1):
mediadata = feed['items'][i]
print("\033[0;34m"
"\n------------------------\n"
"Media number: "
"Like count: "
"Comment count: "
.format(i,
mediadata['like_count'],
mediadata['comment_count']))
if feed['items'][i]['caption'] is None:
print("Caption: ["
"\033[0;31m"
"No Caption available"
"\033[0;34m"
"]\n")
else:
caption = mediadata['caption']['text']
if len(caption) > 30:
caption = caption[:30] + ' (...)'
print("Caption:
[\033[0;32m{}\033[0;34m]\n".format(caption))
I solved it with:
API.searchUsername('IG_USERNAME') before this code: pk = API.LastJson['user']['pk'].
It Works for me.
Enjoy :D
I'm trying to write a script that does the following:
obtains a list of album (photoset) ID's from my flickr account
list the image titles from each album (photoset) into a text file named as the album title
Here's what I have so far:
import flickrapi
from xml.etree import ElementTree
api_key = 'xxxx'
api_secret = 'xxxx'
flickr = flickrapi.FlickrAPI(api_key, api_secret)
(token, frob) = flickr.get_token_part_one(perms='write')
if not token: raw_input("Press ENTER after you authorized this program")
flickr.get_token_part_two((token, frob))
sets = flickr.photosets_getList(user_id='xxxx')
for elm in sets.getchildren()[0]:
title = elm.getchildren()[0].text
print ("id: %s setname: %s photos: %s") %(elm.get('id'), title, elm.get('photos'))
The above simply outputs the result to the screen like this:
id: 12453463463252553 setname: 2006-08 photos: 371
id: 23523523523532523 setname: 2006-07 photos: 507
id: 53253253253255532 setname: 2006-06 photos: 20
... etc ...
From there, I've got the following which I assumed would list all the image titles in the above album:
import flickrapi
from xml.etree import ElementTree
api_key = 'xxxx'
api_secret = 'xxxx'
flickr = flickrapi.FlickrAPI(api_key, api_secret)
(token, frob) = flickr.get_token_part_one(perms='write')
if not token: raw_input("Press ENTER after you authorized this program")
flickr.get_token_part_two((token, frob))
photos = flickr.photosets_getPhotos(photoset_id='12453463463252553')
for elm in photos.getchildren()[0]:
title = elm.getchildren()[0].text
print ("%s") %(elm.get('title'))
Unfortunately it just spits out a index out of range index error.
I stuck with it and had a hand from a friend to come up with the following which works as planned:
import flickrapi
import os
from xml.etree import ElementTree
api_key = 'xxxx'
api_secret = 'xxxx'
flickr = flickrapi.FlickrAPI(api_key, api_secret)
(token, frob) = flickr.get_token_part_one(perms='write')
if not token: raw_input("Press ENTER after you authorized this program")
flickr.get_token_part_two((token, frob))
sets = flickr.photosets_getList(user_id='xxxx')
for set in sets.getchildren()[0]:
title = set.getchildren()[0].text
filename = "%s.txt" % (title)
f = open(filename,'w')
print ("Getting Photos from set: %s") % (title)
for photo in flickr.walk_set(set.get('id')):
f.write("%s" % (photo.get('title')))
f.close()
Its quite easy if you use python-flickr-api. The complicated part is getting authorization from flickr to access private information.
Here is some (untested) code you can use:
import os
import flickr_api as flickr
# If all you want to do is get public information,
# then you need to set the api key and secret
flickr.set_keys(api_key='key', api_secret='sekret')
# If you want to fetch private/hidden information
# then in addition to the api key and secret,
# you also need to authorize your application.
# To do that, we request the authorization URL
# to get the value of `oauth_verifier`, which
# is what we need.
# This step is done only once, and we save
# the token. So naturally, we first check
# if the token exists or not:
if os.path.isfile('token.key'):
flickr.set_auth_handler('token.key')
else:
# This is the first time we are running,
# so get the token and save it
auth = flickr.auth.AuthHandler()
url = auth.get_authorization_url('read') # Get read permissions
session_key = raw_input('''
Please visit {} and then copy the value of oauth_verifier:'''.format(url))
if len(session_key.strip()):
auth.set_verifier(session_key.strip())
flickr.set_auth_handler(auth)
# Save this token for next time
auth.save('token.key')
else:
raise Exception("No authorization token provided, quitting.")
# If we reached this point, we are good to go!
# First thing we want to do is enable the cache, so
# we don't hit the API when not needed
flickr.enable_cache()
# Fetching a user, by their username
user = flickr.Person.findByUserName('username')
# Or, we don't know the username:
user = flickr.Person.findByEmail('some#user.com')
# Or, if we want to use the authenticated user
user = flickr.test.login()
# Next, fetch the photosets and their corresponding photos:
photo_sets = user.getPhotosets()
for pset in photo_sets:
print("Getting pictures for {}".format(pset.title))
photos = pset.getPhotos()
for photo in photos:
print('{}'.format(photo.info.title))
# Or, just get me _all_ the photos:
photos = user.getPhotos()
# If you haven't logged in,
# photos = user.getPublicPhotos()
for photo in photos:
print('{}'.format(photo.info.title))
Hi this is my piece of code to get the number of friends a user has by python . It returns nothing. Can anyone tell me which access privilege should i grant or anything wrong what i have done so far?
friend_count = 0
q = urllib.urlencode({'SELECT friend_count FROM user WHERE uid': 784877761})
url = 'https://api.facebook.com/method/fql.query?query=' + q
request = urllib2.Request(url)
data = urllib2.urlopen(request)
doc = parse(data)
friend_count_node = doc.getElementsByTagName("friend_count")
test = friend_count_node[0].firstChild.nodeValue
logging.info(test)
Try replacing your code with this:
q = urllib.urlencode({'SELECT friend_count FROM user WHERE uid = 784877761'})
url = 'https://graph.facebook.com/fql?q=' + q
You should not need an access token to get the friend_count.
I have the following Handlers
First the user calls this Handler and gets redirected to Facebook:
class LoginFacebookHandler(BasicHandler):
def get(self):
user = self.auth.get_user_by_session()
if not user:
h = hashlib.new('sha512')
h.update(str(datetime.now())+"abc")
nonce = h.hexdigest()
logging.info("hash "+str(nonce))
memcache.set(str(nonce), True, 8600)
#facebook_uri = "https://www.facebook.com/dialog/oauth?client_id=%s&redirect_uri=%s&state=%s&scope=%s" % ("20773", "http://upstrackapp.appspot.com/f", str(nonce), "email")
data = {"client_id": 20773, "redirect_uri": "http://***.appspot.com/f", "state": str(nonce), "scope": "email"}
facebook_uri = "https://www.facebook.com/dialog/oauth?%s" % (urllib.urlencode(data))
self.redirect(facebook_uri)
After he authorized my app facebook redirects to the redirect URI (Handler):
class CreateUserFacebookHandler(BasicHandler):
def get(self):
state = self.request.get('state')
code = self.request.get('code')
logging.info("state "+state)
logging.info("code "+code)
if len(code) > 3 and len(state) > 3:
cached_state = memcache.get(str(state))
logging.info("cached_state "+str(cached_state))
if cached_state:
#memcache.delete(str(state))
data = { "client_id": 20773, "redirect_uri": "http://***.appspot.com/f", "client_secret": "7f587", "code": str(code)}
graph_url = "https://graph.facebook.com/oauth/access_token?%s" % (urllib.urlencode(data))
logging.info("grph url "+graph_url)
result = urlfetch.fetch(url=graph_url, method=urlfetch.GET)
if result.status_code == 200:
fb_response = urlparse.parse_qs(result.content)
access_token = fb_response["access_token"][0]
token_expires = fb_response["expires"][0]
logging.info("access token "+str(access_token))
logging.info("token expires "+str(token_expires))
if access_token:
api_data = { "access_token": str(access_token)}
api_url = "https://graph.facebook.com/me?%s" % (urllib.urlencode(api_data))
logging.info("api url "+api_url)
api_result = urlfetch.fetch(url=api_url, method=urlfetch.GET)
if api_result.status_code == 200:
api_content = json.loads(api_result.content)
user_id = str(api_content["id"])
email = str(api_content["email"])
logging.info("user id "+str(user_id))
logging.info("email "+str(email))
h = hashlib.new('sha512')
h.update(str(user_id)+"abc")
password = h.hexdigest()
expire_data = datetime.now() + timedelta(seconds=int(token_expires))
user = self.auth.store.user_model.create_user(email, password_raw=password, access_token=access_token, token_expires=expire_data, fb_id=user_id)
else:
self.response.write.out.write("error contacting the graph api")
else:
self.response.out.write("access token not long enough")
else:
self.response.out.write("error while contacting facebook server")
else:
self.response.out.write("error no cached state")
else:
self.response.out.write("error too short")
Mostly this works until the code tries to retrieve an access_token and I end up getting "error while contacting....".
The funny thing is, that I log all URLs, states etc. so I go into my Logs, copy&paste the URL that urlfetch tried to open (fb api->access_token) paste it into my browser and voilĂ I get my access_token + expires.
The same thing happens sometimes when the code tries to fetch the user information from the graph (graph/me).
The key problem is not facebook.
It is the AppEngine deployment process.
I always tested changes in the code live, not local, since the OAuth wouldn't properly work.
So the deployment -> flush casche -> flush database process seemed to have a certain delay causing artifacts to remain, which confused the code.
So if you have to test such things like OAuth live, I'd recommend deploying the changes as a new version of the app and after deployment you shall delete all data that could act as artifacts in the new version.