Python SAML Authentication Automation - python

I am trying to work on a project that collects all my monthly utility amounts and disperses the amounts owed to my roommates. I have managed to programmatically log into two websites but I am having trouble with the last, as they are using SAML (https://www.blackhillsenergy.com/). I have inspected the web requests with Chrome's Developer Tools but I am not getting any breakthroughs. I attempted to use requests_ecp but I am not having any luck with that either. I get the idea of SAML but having a hard time understanding their implementation and how I can use it in my script. Below is my sample code? Any ideas?
def get_bh_bill():
url = 'https://www.blackhillsenergy.com/cpm/v1/user/accounts?username={fill here}'
bh_login = ''
bh_pass = ‘'
# Start a session so we can have persistent cookies
session = requests.session()
session.auth = HTTPECPAuth('https://sso.blackhillsenergy.com', username=bh_login, password=bh_pass)
acc_res = session.get(url)
acc_soup = BeautifulSoup(acc_res.text, "html.parser")
print(acc_soup.prettify())
return '0000'

SAML, typically, works like this.
You hit the desired site, they see you are not authenticated, so they create a SAML request, route it through your browser, and send you to an IdP, Identity Provider.
The IdP reads the SAML request, and then asks you for credentials. Once authenticated, it creates a SAML response, and routes that back to the original site, through your browser.
The routing is done by presenting a simple HTML form containing the SAML Request/Response, and a teeny bit of javascript to submit it. This is how it moves information across domains (SAML is typically done across domains, this is why it doesn't use cookies.)
What your script needs to do is basically follow the workflow, submit the forms automatically, login when asked, and submit the forms back. It's a multi step workflow. There may well be a bunch of redirects involved as well.

Related

How to access parameters in a redirect URI in Flask?

I'm working in Flask on creating a JMML ("Join my mailing list") widget that submits data to an email marketing platform, and the platform follows an OAuth2 flow. The basic flow is:
I create access URL using a the base API URL, an API key, and a redirect URI
The program accesses this URL, and the user of the program is redirected to the marketing platform to log in and grant access.
The marketing platform performs another redirect back to the redirect URI that I provided. The URI is appended with the access token that I need to provide with app POST requests of my JMML. Here's an example of what the returned URI looks like:
http://localhost:5000/redirect_url#access_token=2C1zxo3O0J1yo5Odolypuo9DSmcI
Here's the problem I'm having: I have no idea how, programmatically, to use that final redirect url/uri as a variable in Python.I could make the user copy/paste it into a field, but there's gotta be a better way. I honestly don't even know the terminology for a redirected-redirect like this.
It's pathetic, and I'm lost, but here's what I have so far:
#app.route('/redirect_url')
def redirect_url():
# I have no idea how to actaully get the parameter out of the redirect url.
pass
I've checked the API documentation for the email marketing company's API, but they only provide code tips for handling Oauth2 in Ruby and PHP. Help!
There is a good blog post by Miguel Grinberg, where he describes how to work with OAuth in the flask application. Though I think that workflow will stay the same with any other web application.
Based on this it seems like you should be able to get the access token by getting the variable parameter from the url. I do not have your full code so i cant test, nor have I tried it with an # in the url, but this should work
#app.route('/originalurl')
#app.route('/redirect_url#<access_token>')
def show_user_profile(access_token):
if access_token:
#do work
return redirect(url_for('Anotherview')
return render_template('template.hmtl')
Otherwise we need more info on the api you are using Oauth with

Scraping data from external site with username and password

I have an application with many users, some of these users have an account on an external website with data I want to scrape.
This external site has a members area protected with a email/password form. This sets some cookies when submitted (a couple of ASP ones). You can then pull up the needed page and grab the data the external site holds for the user that just logged in.
The external site has no API.
I envisage my application asking users for their credentials to the external site, logging in on their behalf and grabbing the data we want.
How would I go about this in Python, i.e. do I need to run a GUI web browser on the server that Python prods to handle the cookies (I'd rather not)?
Find the call the page makes to the backend by inspecting what is the format of the login call in your browser's inspector.
Make the same request after using either getpass to get user credentials from the terminal or via a GUI. You can use urllib2 to make the requests.
Save all the cookies from the response in a cookiejar.
Reuse the cookies in subsequent requests and fetch data.
Then, profit.
Usually, this is performed with session.
I'm recommending you to use requests library (http://docs.python-requests.org/en/latest/) in order to do that.
You can use the Session feature (http://docs.python-requests.org/en/latest/user/advanced/#session-objects). Simply perform an authentication HTTP request (url and parameters depends of the website you want to request), and then, perform a request towards the ressource you want to scrape.
Without further information, we cannot help you more.

How can I scrape information from HowLongToBeat.com? It doesn't use a variable in the URL

I'm trying to scrape information from How Long to Beat, how can I make a request for a search without having to put the search-term in the URL?
EDIT for clarity:
The problem I face is that the site doesn't use something like http://www.howlongtobeat.com/search.php?s=search-term, therefore I cannot do something like
url = 'http://www.howlongtobeat.com/search.php?s='
search_term = raw_input("Search: ")
r = requests.get(url + search_term)
In other words, when you type the search-term in the search dialog, the site doesn't refresh nor show a change in the URL so I can't find a way to search from outside the site.
I'm sorry if I made grammar mistakes, english is not my first language.
This is because the page is driven by AJAX requests - it updates automatically without redirecting you to visible URL.
If you open developer tools in your browser (F12) and navigate to Network panel, you will see that there are indeed requests sent to the server. I typed "test2" and got following:
As you see, request is sent to a URL that looks like this: http://www.howlongtobeat.com/search_main.php?t=games&page=1&sorthead=popular&sortd=Normal%20Order&plat=&detail=0.
I typed "test2", but it's nowhere to be seen.
That's because it was sent using POST request, e.g. the parameters were embedded in the HTTP request itself, not the URL. When I navigated to "Params" tab in the Developer Tools, indeed I could see my input:
queryString: "test2"
So in order to use this search form, you should send a POST request to that URL containing variable "queryString" filled with whatever value you need.
I strongly encourage asking the site owners' about an API, though. Using publicly available form engines that are designed to be used by end-users in automated fashion is considered unethical.

Python urllib2 accesses page without sending authentication details

I was reading urllib2 tutorial wherein it mentions that in order to access a page that requires authentication (e.g. valid username and password), the server first sends an HTTP header with error code 401 and (python) client then sends a request with authentication details.
Now, the problem in my case is that there exist two different versions of a webpage, one that can be accessed without supplying any authentication details and one that is quite different when authentication details are supplied (i.e. when the user is logged in the system). As an example think about url www.gmail.com, when you are not logged in you get a log-in page, but if your browser remembers you from your last login then the result is your email account homepage with your inbox displayed.
I follow all the details to set up an handler for authentication and install an opener. However everytime I request the page get back the version of the webpage that does not have the user logged-in.
How can I access the other version of webpage that has the user logged-in?
Requests makes this easy. As its creators say:
Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken.
Try using Mechanize. It has cookie handling features that would allow your program to be "logged in" even though it's not a real person.

Python - Facebook connect without browser based-redirect

title might be a bit of a misnomer.
I have two subdomains setup in a web application, one for the application frontend at www. (which is using a loose PHP router coupled with backbone, require, and jquery for UI) and a data tier setup at data. (built entire in python utilizing Werkzeug)
We decided to go to this kind of architecture since at some point mobile app will be integrated into the equation, and they can just as easily send HTTP connections to the data subdomain. The data subdomain renders all its responses in JSON, which the JS frontend expects universally as a response be it a valid query response or an error.
This feels like the most organized arrangement I've ever had on a project, until Facebook connect appeared on the manifesto.
Ideally, I'd like to keep as much of the 'action' of the site behind the scenes on the data subdomain, especially authentication stuff, so that the logic is available to the phones and mobile devices as well down the line if need be. Looking into the facebook docs, it appears however that the browser and its session object are critical components in their OAuth flow.
User authentication and app authorization are handled at the same time
by redirecting the user to our OAuth Dialog. When invoking this
dialog, you must pass in your app id that is generated when you create
your application in our Developer App (the client_id parameter) and
the URL that the user's browser will be redirected back to once app
authorization is completed (the redirect_uri parameter). The
redirect_uri must be in the path of the Site URL you specify in
Website section of the Summary tab in the Developer App. Note, your
redirect_uri can not be a redirector.
I've done this before in PHP and am familiar with the process, and it's a trivial matter of about 3 header redirects when the logic in question has direct access to the browser. My problem is that the way our new app is structured, all the business logic is sequestered on the other subdomain. What are my options to simulate the redirect? can I have the python subdomain send the HTTP data, receiving it from the www domain using CURL like all the other requests? Can I return a json object to the user's browser that instructs on doing the redirects? Will Facebook even accept requests from a data subdomain, whose redirect_uri is at subdomain www? I find that last sentence in the quoted section probably discounts that as a possibility, but I've looked through their docs and it doesn't explicity say that this would be treated as a violation.
Has anyone had any experience setting up something like this with Facebook with a similar architecture? Perhaps I should just fold and put the FB login logic directly into PHP and handle it there.
Thanks!

Categories

Resources