Browserless access to LinkedIn with Python

Browserless access to LinkedIn with Python - python

I'm writing a command-line application that accesses linkedin. I'm using the python-linkedin API.
Things work as I expected, but I have a really big gripe about the authentication process. Currently, I need to:
Start my application and wait for it to print an authentication URL
Go to that URL with my browser
Give my blessing for the application and wait for it to redirect me to a URL
Extract the access token from the URL
Input that access token into my application
Do what I need to do with linkedin
I don't like doing steps 2 to 5 manually so I would like to automate them. What I was thinking of doing was:
Use a headless client like mechanize to access the URL from step 1 above
Scrape the screen and give my blessing automatically (may be required to input username and password -- I know these, so it's OK)
Wait to be redirected and grab the redirection URL
Extract the token from the URL
PROFIT!
Question time:
Looking around, this guy right here on SO tried to do something similar but was told that it's impossible. Why?
Then, this guy here does it in Jython and HtmlUnit. Should be possible with straight Python and mechanize, right?
Finally, has anybody seen a solution with straight Python and mechanize (or any other headless browser alternative)? I don't want to reinvent the wheel, but will code it up if necessary.
EDIT:
Code to initialize tokens (using the approach of the accepted answer):
api = linkedin.LinkedIn(KEY, SECRET, RETURN_URL)
result = api.request_token()
if not result:
print 'Initialization error:', api.get_error()
return
print 'Go to URL:', api.get_authorize_url()
print 'Enter verifier: ',
verifier = sys.stdin.readline().strip()
if not result:
print 'Initialization error:', api.get_error()
return
result = api.access_token(verifier=verifier)
if not result:
print 'Initialization error:', api.get_error()
return
fin = open('tokens.pickle', 'w')
for t in (api._request_token, api._request_token_secret,
api._access_token, api._access_token_secret ):
pickle.dump(t, fin)
fin.close()
print 'Initialization complete.'
Code to use tokens:
api = linkedin.LinkedIn(KEY, SECRET, RETURN_URL)
tokens = tokens_fname()
try:
fin = open(tokens)
api._request_token = pickle.load(fin)
api._request_token_secret = pickle.load(fin)
api._access_token = pickle.load(fin)
api._access_token_secret = pickle.load(fin)
except IOError, ioe:
print ioe
print 'Please run `python init_tokens.py\' first'
return
profiles = api.get_search({ 'name' : name })

As you are planning on authorizing yourself just once, and then making calls to the API for your own information, I would just manually retrieve your access token rather than worrying about automating it.
The user access token generated by LinkedIn when you authorize a given application is permanent unless you specify otherwise on the authorization screen. All you need to do is to generate the authorization screen with your application, go through the process and upon success echo out and store your user access token (token and secret). Once you have that, you can hard code those into a file, database, etc and when making calls to the API, use those.
It's in PHP, but this demo does basically this. Just modify the demo.php script to echo out your token as needed.

I have not tried it myself, but I believe in theory it should be possible with Selenium WebDriver with PyVirtualDisplay. This idea is described here.

Related

Microsoft Graph API Read Mail with Python

I'm trying to create a python script that continuously reads mail from a service account in my organization. I'm attempting to use the Microsoft Graph API, but the more I read, the more confused I get. I have registered an app in Azure Portal and have my client id, client secret, etc, then it's my understanding you have to use those, call the API that requires you to paste a url into your browser to log in to consent access, and that provides a token that only lasts an hour? How can I do this programmatically?
I guess my question is, has anyone had any luck doing this with the graph api? How can I do this without having to do the browser handshake every hour? I would like to be able to just run this script and let it run without worrying about needing to refresh a token ever so often. Am I just dumb, or is this way too complicated lol. Any python examples on how people are authenticating to the graph api and staying authenticated would be greatly appreciated!

I was just working on something similar today. (Microsoft recently deprecated basic authentication for exchange, and I can no longer send mail using a simple username/password from a web application I support.)
Using the microsoft msal python library https://github.com/AzureAD/microsoft-authentication-library-for-python, and the example in sample/device_flow_sample.py, I was able to build a user-based login that retrieves an access token and refresh token in order to stay logged in (using "device flow authentication"). The msal library handles storing and reloading the token cache, as well as refreshing the token whenever necessary.
Below is the code for logging in the first time
#see https://github.com/AzureAD/microsoft-authentication-library-for-python/blob/dev/sample/device_flow_sample.py
import sys
import json
import logging
import os
import atexit
import requests
import msal
# logging
logging.basicConfig(level=logging.DEBUG) # Enable DEBUG log for entire script
logging.getLogger("msal").setLevel(logging.INFO) # Optionally disable MSAL DEBUG logs
# config
config = dict(
authority = "https://login.microsoftonline.com/common",
client_id = 'YOUR CLIENT ID',
scope = ["User.Read"],
username = 'user#domain',
cache_file = 'token.cache',
endpoint = 'https://graph.microsoft.com/v1.0/me'
)
# cache
cache = msal.SerializableTokenCache()
if os.path.exists(config["cache_file"]):
cache.deserialize(open(config["cache_file"], "r").read())
atexit.register(lambda:
open(config["cache_file"], "w").write(cache.serialize())
if cache.has_state_changed else None)
# app
app = msal.PublicClientApplication(
config["client_id"], authority=config["authority"],
token_cache=cache)
# exists?
result = None
accounts = app.get_accounts()
if accounts:
logging.info("found accounts in the app")
for a in accounts:
print(a)
if a["username"] == config["username"]:
result = app.acquire_token_silent(config["scope"], account=a)
break
else:
logging.info("no accounts in the app")
# initiate
if result:
logging.info("found a token in the cache")
else:
logging.info("No suitable token exists in cache. Let's get a new one from AAD.")
flow = app.initiate_device_flow(scopes=config["scope"])
if "user_code" not in flow:
raise ValueError(
"Fail to create device flow. Err: %s" % json.dumps(flow, indent=4))
print(flow["message"])
sys.stdout.flush() # Some terminal needs this to ensure the message is shown
# Ideally you should wait here, in order to save some unnecessary polling
input("Press Enter after signing in from another device to proceed, CTRL+C to abort.")
result = app.acquire_token_by_device_flow(flow) # By default it will block
# You can follow this instruction to shorten the block time
# https://msal-python.readthedocs.io/en/latest/#msal.PublicClientApplication.acquire_token_by_device_flow
# or you may even turn off the blocking behavior,
# and then keep calling acquire_token_by_device_flow(flow) in your own customized loop.
if result and "access_token" in result:
# Calling graph using the access token
graph_data = requests.get( # Use token to call downstream service
config["endpoint"],
headers={'Authorization': 'Bearer ' + result['access_token']},).json()
print("Graph API call result: %s" % json.dumps(graph_data, indent=2))
else:
print(result.get("error"))
print(result.get("error_description"))
print(result.get("correlation_id")) # You may need this when reporting a bug
You'll need to fix up the config, and update the scope for the appropriate privileges.
All the magic is in here:
result = app.acquire_token_silent(config["scope"], account=a)
and putting the Authorization access_token in the requests headers:
graph_data = requests.get( # Use token to call downstream service
config["endpoint"],
headers={'Authorization': 'Bearer ' + result['access_token']},).json()
As long as you call acquire_token_silent before you invoke any graph APIs, the tokens will stay up to date. The refresh token is good for 90 days or something, and automatically updates. Once you login, the tokens will be updated and stored in the cache (and persisted to a file), and will stay alive more-or-less indefinitely (there are some things that can invalidate it on the server side).
Unfortunately, I'm still having problems because it's an unverified multi-tenant application. I successfully added the user as a guest in my tenant, and the login works, but as soon as I try to get more interesting privileges in scope, the user can't log in - I'll either have to get my mpn verified, or get my client's 3rd party IT guys admin to grant permission for this app in their tenant. If I had admin privileges for their tenant, I'd probably be looking at the daemon authentication method instead of user-based.
(to be clear, the code above is the msal example almost verbatim, with config and persistence tweaks)

Fetching URL from a redirected target using Python

I'm building a Twitch chat-bot, integrating some Spotify features using Spotipy library.
The goal behind the implementation is to achieve full-automated Spotipfy API Authentication for the bot.
How the Spotify API and Spotipy library work is, an authorization token is needed first in order to do anything over Spotify-end. So that's why, whenever the bot is initially run over my VPS, it prompts me to copy a URL from the console, locate it on a browser to wait for its redirect and paste on the console the redirected URL including the desired token. That's how the authentication object retrieves the token data.
To automate this process, I've seen several solutions via Flask or Django.
Django implementation would be useful for me, since I also have Django environment active on the same VPS, except that Django environment runs on Python 2.7 while my Twitch chat-bot runs on a separate Python 3.6 environment. Hence, I would like to keep them separate unless there is no way to implement such automation without listening redirects over Django, Flask or any other web-framework. Unfortunately, my bot can only run on Python 3.6 or higher.
I'm specifically curious if there is any built-in function or a lightweight library to handle such operation.
The function which I'm using to fetch Spotify Auth token is:
def fetchSpotiToken():
global spotiToken, spoti
spotiToken = spotifyAuth.get_cached_token()
if not spotiToken:
spAuthURL = spotifyAuth.get_authorize_url()
print(spAuthURL)
# Prints the URL that Spotify API will redirect to
authResp = input("Enter URL")
# Console user is expected to visit the URL and submit the new redirected URL on console
respCode = spotifyAuth.parse_response_code(authResp)
spotiToken = spotifyAuth.get_access_token(respCode)
elif spotifyAuth.is_token_expired(spotifyAuth.get_cached_token()):
spotiToken = spotifyAuth.refresh_access_token(spotiToken["refresh_token"])
spoti = spotipy.Spotify(auth=spotiToken["access_token"])
return [spotiToken, spoti]
PS: I've been developing Python only for couple of weeks, even after doing some research, I wasn't able to find a solution to this problem in a way that I need. I'm not sure if it's even possible to achieve it that way. So, if that's impossible, please excuse me for my lack of knowledge.

I've found the solution myself.
It seems that requests is a good match for this example.
Following snippet works perfectly for now.
def tryFetchSpotiToken():
global spotiToken, spoti
try:
spotiToken = spotifyAuth.get_cached_token()
except:
if not spotiToken:
spAuthURL = spotifyAuth.get_authorize_url()
htReq = requests.get(spAuthURL)
htRed = htReq.url
respCode = spotifyAuth.parse_response_code(htRed)
spotiToken = spotifyAuth.get_access_token(respCode)
elif spotifyAuth.is_token_expired(spotifyAuth.get_cached_token()):
spotiToken = spotifyAuth.refresh_access_token(spotiToken["refresh_token"])
spoti = spotipy.Spotify(auth=spotiToken["access_token"])

python linkedin oauth2 - where is http_api.py?

I'm trying to get this example to work from https://github.com/ozgur/python-linkedin. I'm using his example. When I run this code. I don't get the RETURN_URL and authorization_code talked about in the example. I'm not sure why, I think it is because I'm not setting up the HTTP API example correctly. I can't find http_api.py, and when I visit http://localhost:8080, I get a "this site can't be reached".
from linkedin import linkedin
API_KEY = 'wFNJekVpDCJtRPFX812pQsJee-gt0zO4X5XmG6wcfSOSlLocxodAXNMbl0_hw3Vl'
API_SECRET = 'daJDa6_8UcnGMw1yuq9TjoO_PMKukXMo8vEMo7Qv5J-G3SPgrAV0FqFCd0TNjQyG'
RETURN_URL = 'http://localhost:8000'
authentication = linkedin.LinkedInAuthentication(API_KEY, API_SECRET, RETURN_URL, linkedin.PERMISSIONS.enums.values())
# Optionally one can send custom "state" value that will be returned from OAuth server
# It can be used to track your user state or something else (it's up to you)
# Be aware that this value is sent to OAuth server AS IS - make sure to encode or hash it
#authorization.state = 'your_encoded_message'
print authentication.authorization_url # open this url on your browser
application = linkedin.LinkedInApplication(authentication)

http_api.py is one of the examples provided in the package. This is an HTTP server that will handle the response from LinkedIn's OAuth end point, so you'll need to boot it up for the example to work.
As stated in the guide, you'll need to execute that example file to get the server working. Note you'll also need to supply the following environment variables: LINKEDIN_API_KEY and LINKEDIN_API_SECRET.
You can run the example file by downloading the repo and calling LINKEDIN_API_KEY=yourkey LINKEDIN_API_SECRET=yoursecret python examples/http_api.py. Note you'll need Python 3.4 for it to work.

How do I access onedrive in an automated fashion without user interaction?

I am trying to access my own docs & spreadsheets via onedrive's api. I have:
import requests
client_id = 'my_id'
client_secret = 'my_secret'
scopes = 'wl.offline_access%20wl.signin%20wl.basic'
response_type = 'token' # also have tried "code"
redirect_uri = 'https://login.live.com/oauth20_desktop.srf'
base_url = 'https://apis.live.net/v5.0/'
r = requests.get('https://login.live.com/oauth20_authorize.srf?client_id=%s&scope=%s&response_type=%s&redirect_uri=%s' % (client_id, scopes, response_type, redirect_uri))
print r.text
(For my client I've also tried both "Mobile or desktop client app:" set to "Yes" and "No")
This will return the html for the user to manually click on. Since the user is me and it's my account how do I access the API without user interaction?
EDIT #1:
For those confused on what I'm looking for it would be the equivalent of Google's Service Account (OAuth2): https://console.developers.google.com/project

You cannot "bypass" the user interaction.
However you are very close to getting it to work. If you want to gain an access token in python you have to do it through the browser. You can use the web browser library to open the default web browser. It will look something like this (your app must be a desktop app):
import webbrowser
webbrowser.open("https://login.live.com/oauth20_authorize.srf?client_id=foo&scope=bar&response_type=code&redirect_uri=https://login.live.com/oauth20_desktop.srf")
This will bring you to the auth page, sign in and agree to the terms (it will differ depending on scope). It will direct you to a page where the url looks like:
https://login.live.com/oauth20_desktop.srf?code=<THISISTHECODEYOUWANT>&lc=foo
Copy this code from the browser and have your python script take it as input.
You can then make a request as described here using the code you received from the browser.
You will receive a response described here

Incorrect URL without an access token for facebook login using rauth

I have the code from rauth site:
https://github.com/litl/rauth/blob/master/examples/facebook-cli.py
(The code can be found at the end of this post for reference)
running the program in the command line will open a firefox window and the following message is shown from facebook site:
Success
SECURITY WARNING: Please treat the URL above as you would your password and do not share it with anyone.
when the facebook is logged in beforehand. Even if not logged in, the facebook login window opens up and after logging in using username/password the above message is shown in firefox window.
Now the URL generated in the address bar:
https://www.facebook.com/connect/blank.html#_=_
Which is obviously an incorrect one and it gives exception from the subsequent python code.
Now how can I debug what the problem is?
Thanks
PS:
from rauth.service import OAuth2Service
import re
import webbrowser
# Get a real consumer key & secret from:
# https://developers.facebook.com/apps
facebook = OAuth2Service(
client_id='xxxxxxx',
client_secret='yyyyyyy',
name='facebook',
authorize_url='https://graph.facebook.com/oauth/authorize',
access_token_url='https://graph.facebook.com/oauth/access_token',
base_url='https://graph.facebook.com/')
redirect_uri = 'https://www.facebook.com/connect/login_success.html'
params = {'scope': 'read_stream',
'response_type': 'token',
'redirect_uri': redirect_uri}
authorize_url = facebook.get_authorize_url(**params)
print 'Visit this URL in your browser: ' + authorize_url
webbrowser.open(authorize_url);
url_with_code = raw_input('Copy URL from your browser\'s address bar: ')
access_token = re.search('\#access_token=([^&]*)', url_with_code).group(1)
session = facebook.get_session(access_token)
user = session.get('me').json()
print 'currently logged in as: ' + user['link']

This is happening due to a change on Facebook's end that strips the URL of the access_token programmatically. It happens on a timer, before a human could conceivably copy it out of the URL bar. The example is broken but I don't have an immediate fix for you so I might suggest you take a look at the Flask application instead, which is a more practical demonstration of rauth anyway.
The relevant bit of JS you're fighting:
setTimeout(function() {window.history.replaceState && window.history.replaceState({}, "", "blank.html#_=_");},500);

Not sure whether your q is still outstanding. I came up with a solution last year. But the code is in Powershell; therefore, it can only be used directly on Window machines. Run this script in Powershell and the stripped-out access token is printed out in the shell. Let me know whether it works on your end or not. Thx!
http://groups.yahoo.com/neo/groups/sas_academy/conversations/messages/591

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Browserless access to LinkedIn with Python - python

I have not tried it myself, but I believe in theory it should be possible with Selenium WebDriver with PyVirtualDisplay. This idea is described here.

Related

Microsoft Graph API Read Mail with Python

Fetching URL from a redirected target using Python

python linkedin oauth2 - where is http_api.py?

How do I access onedrive in an automated fashion without user interaction?

Incorrect URL without an access token for facebook login using rauth

Categories

Resources