Better way to auth to services - python

I am learning web-scraping with python-mechanize. At the moment, to enter a secure site, I have been entering data into forms manually then submitting. Like this:
br.open("www.example.org/login.hmtl")
br.select_form(nr=0)
br['uname'] = "USERNAME"
br['pword'] = "PASSWORD"
br.submit()
I assume that under the hood, this is being sent to the server as a 'GET' or 'POST' request and the information I type in is encoded in a url. Is there a way for me to find out what the format of this url is so that I can encode the information myself? I am using chrome, it would be great to be able to somehow identify the structure of a form's submit request.

you can enable logging in mechanize. for a tutorial see http://wwwsearch.sourceforge.net/mechanize/hints.html#logging

Related

Can't Login to Website Using Python Session Request

I'm new to web scraping, and I'm attempting to log in to imagingrewardsprogram.com using requests.Session(). I've been able to successfully log in to other websites, and I'm stumped why I haven't been able to log into this one.
When I login to the site in Google Chrome and view the form data in developer tools, I'm able to see that the form data I'm passing in to my code is identical to the form data I pass in to the web browser ("user" and "password"). I'm sure there's something else I should be passing in that I'm missing, but I'm not sure what it is.
Here is my code:
loginURL = 'https://imagingrewardsprogram.com'
requestURL = ''https://imagingrewardsprogram.com/merlin/pnaimaging?command=get&style=home'
payload = {
'user': myusername,
'password': mypassword,
'command':'get',
'style':'home'
}
with requests.Session() as session:
post = session.post(loginURL, data=payload)
r = session.get(requestURL)
print(r.text)
The output I get is a page that says, "Either your session has expired or an error occurred while obtaining your account information."
Any guidance is appreciated!
Maybe one reason can be website you are trying to access uses better security that does not allow automatic process to login.
So, thats why you are unable to create a session using a script.
Security like captcha and re- captcha are used to prevent automatic login.

Accepting and Sending Cookies with Mechanize

I need to fill in a login form on a webpage that requires cookies and get some information about the resultant page. Since this needs to be done at very weird hours at night, I'd like to automate the process and am therefore using mechanize (any other suggestions are welcome - note that I have to run my script on a school server, on which I cannot install new software. Mechanize is pure python so I am able to get around this problem).
The problem is that the page that hosts the login form requires that I be able to accept and send cookies. Ideally, I'd like to be able to accept and send all cookies that I the server sends me, rather than hard-code my own cookies.
So, I set out to write my script with mechanize, but I seem to be handling cookies wrong. Since I can't find helpful documentation anywhere (please point it out if I'm blind), I am asking here.
Here is my mechanize script:
import mechanize as mech
br = mech.Browser()
br.set_handle_robots(False)
print "No Robots"
br.set_handle_redirect(True)
br.open("some internal uOttawa website")
br.select_form(nr=0)
br.form['j_username'] = 'my username'
print "Login: ************"
br.form['j_password'] = 'my password'
print "Password: ************"
response = br.submit()
print response.read()
This prints the following
No Robots
Login: ************
Password: ************
<html>
<body>
<img src="/idp/images/uottawa-logo-dark.png" />
<h3>ERROR</h3>
<p>
An error occurred while processing your request. Please contact your helpdesk or
user ID office for assistance.
</p>
<p>
This service requires cookies. Please ensure that they are enabled and try your
going back to your desired resource and trying to login again.
</p>
<p>
Use of your browser's back button may cause specific errors that can be resolved by
going back to your desired resource and trying to login again.
</p>
<p>
If you think you were sent here in error,
please contact technical support
</p>
</body>
</html>
This is indeed the page that I would get if I disabled cookies on my Chrome browser and attempted the same thing.
I've tried adding a cookie jar as follows, with no luck.
br = mech.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
I took a look at multiple mechanize documentation sources. One of them mention
A common mistake is to use mechanize.urlopen(), and the .extract_cookies() and
.add_cookie_header() methods on a cookie object themselves.
If you use mechanize.urlopen() (or OpenerDirector.open()),
the module handles extraction and adding of cookies by itself,
so you should not call .extract_cookies() or .add_cookie_header().
This seems to say that my first method should work, but it doesn't.
I'd appreciate any help with this - it's confusing, and there seems to be a severe lack of documentation.
I came across the exact same message while authenticating a Shibboleth website with Mechanize, just because I made the same mistake than you. And it looks like I figured it out.
Short answer
The link you need to open is:
br.open("https://web30.uottawa.ca/Shibboleth.sso/Login?target=https://web30.uottawa.ca/hr/web/post-register")
Instead of:
br.open("https://idp.uottawa.ca/idp/login.jsp?actionUrl=%2Fidp%2FAuthn%2FUserPassword")
Why?
Shibboleth: Connect easily and securely to a variety of services with
one simple login.
The Shibboleth login itself is useless if you don't tell him which service you want to login. Let's analyse the HTTP headers and compare the cookies you get for both queries.
1. Opening https://idp.uottawa.ca/idp/login.jsp?actionUrl=%2Fidp%2FAuthn%2FUserPassword
Cookie: JSESSIONID=C2D4A19B2994BFA287A328F71A281C49; _ga=GA1.2.1233451770.1401374115; arp_scroll_position=-1; tools-resize=tools-resize-small; lang-prev-page=en; __utma=251309913.1233451770.1401374115.1401375882.1401375882.1; __utmb=251309913.14.9.1401376471057; __utmz=251309913.1401375882.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); lang=en
2. Opening https://web30.uottawa.ca/Shibboleth.sso/Login?target=https://web30.uottawa.ca/hr/web/post-register
Cookie: JSESSIONID=8D6BEA53823CC1C3045B2CE3B1D61DB0; _idp_authn_lc_key=fc18251e-e5aa-4f77-bb17-5e893d8d3a43; _ga=GA1.2.1233451770.1401374115; arp_scroll_position=-1; tools-resize=tools-resize-small; lang-prev-page=en; __utma=251309913.1233451770.1401374115.1401375882.1401375882.1; __utmb=251309913.16.9.1401378064938; __utmz=251309913.1401375882.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); lang=en
What's the difference? You got one more cookie: _idp_authn_lc_key=1c21128c-2fd7-45d2-adac-df9db4d0a9ad;. I suppose it is the cookie saying "I want to login there".
During the authentication process, the IdP will set a cookie named
_idp_authn_lc_key. This cookie contains only information necessary to identify the current authentication process (which usually spans
multiple requests/responses) and is deleted after the authentication
process completes.
Source: https://wiki.shibboleth.net/confluence/display/SHIB2/IdPCookieUsage
How did I find that link? I indeed digged the web and found that https://web30.uottawa.ca/hr/web/en/user/registration redirects to the login form with the following link:
<a href="https://web30.uottawa.ca/Shibboleth.sso/Login?target=https://web30.uottawa.ca/hr/web/post-register"
class="button standard"><span>Create your account using infoweb</span></a>
So that was not a problem with Mechanize, but more that Shibboleth is a little hard to understand at first glance. You will find more information on the Shibboleth authentification flow here.
The website you're submitting your form data to probably needs a CSRF token (a cookie provided in the form you're skipping the download of.)
Try using Requests:
http://docs.python-requests.org/en/latest/user/quickstart/#cookies
Look for the cookies and/or hidden form fields and then fire away.

Using Python to sign into website, fill in a form, then sign out

As part of my quest to become better at Python I am now attempting to sign in to a website I frequent, send myself a private message, and then sign out. So far, I've managed to sign in (using urllib, cookiejar and urllib2). However, I cannot work out how to fill in the required form to send myself a message.
The form is located at /messages.php?action=send. There's three things that need to be filled for the message to send: three text fields named name, title and message. Additionally, there is a submit button (named "submit").
How can I fill in this form and send it?
import urllib
import urllib2
name = "name field"
data = {
"name" : name
}
encoded_data = urllib.urlencode(data)
content = urllib2.urlopen("http://www.abc.com/messages.php?action=send",
encoded_data)
print content.readlines()
just replace http://www.abc.com/messages.php?action=send with the url where your form is being submitted
reply to your comment: if the url is the url where your form is located, and you need to do this just for one website, look at the source code of the page and find
<form method="POST" action="some_address.php">
and put this address as parameter for urllib2.urlopen
And you have to realise what submit button does.
It just send a Http request to the url defined by action in the form.
So what you do is to simulate this request with urllib2
You can use mechanize to work easily with this. This will ease your work of submitting the form. Don't forget to check with the parameters like name, title, message by seeing the source code of the html form.
import mechanize
br = mechanize.Browser()
br.open("http://mywebsite.com/messages.php?action=send")
br.select_form(nr=0)
br.form['name'] = 'Enter your Name'
br.form['title'] = 'Enter your Title'
br.form['message'] = 'Enter your message'
req = br.submit()
You want the mechanize library. This lets you easily automate the process of browsing websites and submitting forms/following links. The site I've linked to has quite good examples and documentation.
Try to work out the requests that are made (e.g. using the Chrome web developer tool or with Firefox/Firebug) and imitate the POST request containing the desired form data.
In addition to the great mechanize library mentioned by Andrew, in case I'd also suggest you use BeautifulSoup to parse the HTML.
If you don't want to use mechanize but still want an easy, clean solution to create HTTP requests, I recommend the excellend requests module.
To post data to webpage, use cURL something like this,
curl -d Name="Shrimant" -d title="Hello world" -d message="Hello, how are you" -d Form_Submit="Send" http://www.example.com/messages.php?action=send
The ā€œ-dā€ option tells cURL that the next item is some data to be sent to the server at http://www.example.com/messages.php?action=send

HTML + AJAX + Python + Session?

I am trying to understand how to use AJAX with Python Sessions. I understand the basics of how sessions work. When I add AJAX to the mix, my mind is having a difficult time understanding the technical details. Part of me thinks AJAX and Python Sessoins are not possible. I know web frameworks exist that probably do all this magic but I want to learn how to do it myself before jumping into a framework.
I am trying to create a webpage with a login form using HTML, AJAX, Python, and Sessions. If the user is able to log in, a session should be created (I assume that's correct). Otherwise, an error message should be returned.
Here is the basic application layout:
login.html : HTML form with username & password input boxes and
submit button
ajax.js : contains AJAX function that communicates with server-side
script
check_user.py : checks if username & password are correct, creates
session or returns error
welcome.html : only accessible if username & password are correct
welcome_1.html : only accessible if username & password are correct
I prefer to keep the HTML, Javascript, and Python code in separate files as opposed to creating HTML with Python or using inline Javascript in HTML.
Here is the application logic:
user visits login.html
enters username & password
clicks submit button
submit button calls ajax function
ajax function sends username & password to check_user.py
check_user.py checks if username & password are correct
if not correct, return JSON formatted error message
else, correct
create session ID (SID)
place SID in cookie
return cookie to ajax function
redirect user to welcome.html
welcome.html
on page load, ajax function requests user's cookie
if no cookie, redirect to login.html
else, ajax function sends cookie to check_user.py
check_user.py opens cookie & verifies the SID
if not correct, redirect user to login.html
else, correct
redirect user to welcome.html
I think I am misunderstanding how ajax is supposed to handle the returned cookie information. It is also possible that I am misunderstanding other parts, too. Please clarify :-)
I think I will follow this document when writing my Python session code. Sound ok?
I am considering using jQuery for the AJAX stuff and other Javascript coding.
Thank you!
Remember that the AJAX request is the same as any other HTTP request to the server. The session is maintained on the server side, but as far as the server can tell, a request from the browser is a request from the browser. An AJAX request can get and set the cookie just like any other request can. The method you've outlined above should work fine. Alternately, you could check for the existence of the session on your front page, and write the cookie then.

How can I pass my ID and my password to a website in Python using Google App Engine?

Here is a piece of code that I use to fetch a web page HTML source (code) by its URL using Google App Engine:
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print "content-type: text/plain"
print
print result.content
Everything is fine here, but sometimes I need to get an HTML source of a page from a site where I am registered and can only get an access to that page if I firstly pass my ID and password. (It can be any site, actually, like any mail-account-providing site like Yahoo: https://login.yahoo.com/config/mail?.src=ym&.intl=us or any other site where users get free accounts by firstly getting registered there).
Can I somehow do it in Python (trough "Google App Engine")?
You can check for an HTTP status code of 401, "authorization required", and provide the kind of HTTP authorization (basic, digest, whatever) that the site is asking for -- see e.g. here for more details (there's not much that's GAE specific here -- it's a matter of learning HTTP details and obeying them!-).
As Alex said you can check for status code and see what type of autorization it wants, but you can not generalize it as some sites will not give any hint or only allow login thru a non standard form, in those cases you may have to automate the login process using forms, for that you can use library like twill (http://twill.idyll.org/)
or code a specific form submit for each site.

Categories

Resources