Mechanicalsoup Not Logging Me In - python

I am trying to login to a site with mechanicalsoup, but when I submit the form, it keeps me on the same page. I have done a lot of researching for this and could not find an answer.
br.open(domain + action)
form = br.select_form()
user_info = getUserInfo()
br["ff_login_id"] = user_info["eid"]
br["ff_password"] = user_info["password"]
br["empl-login_submit"] = "SUBMITTED"
br.get_current_form().print_summary()
res = br.submit(form, domain)
print(res) #This is getting a response 200

If you have used Mechanize to do this in the past, then it should be possible with MechanicalSoup, unless the website has changed.
Depending on what action is, this could be an issue with the URL passed to submit. The preferred method to submit forms is:
res = br.submit_selected()
This will ensure that you are passing the correct URL to the submit function. Perhaps give that a try to see if it solves your problem.

Related

Using robobrowser on ASP.net website

I have been working on this for couple days now. Can not really find how to make this work. I am fairly new to aspx websites and fetching information out of them.
I am trying to login/authenticate on a website that uses aspx pages. So I followed this thread which really helped me get this in motion. (Last Answer)
Following those directions, I write:
url = "http://samplewebsite/Main/Index.aspx" # Logon page
username = "user"
password = "password"
browser = RoboBrowser(history=True)
# This retrieves __VIEWSTATE and friends
browser.open(url)
signin = browser.get_form(id='form1')
print(signin)
This is the outcome of that print statement:
<RoboForm __VIEWSTATE=/wEPDwULLTE5ODM2NTU1MzJkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBQlidG5TdWJtaXRriD1xvrfrHuJ/0xbQM08yEjyoUg==, __VIEWSTATEGENERATOR=E78488FE, adminid=, btnSubmit=, pswd=>
So it is obvious that I am retrieving the information correctly. Now I have 3 input fields:
adminid
btnSubmit
pswd
Which I can use in the following manner:
signin["adminid"].value = username
signin["pswd"].value = password
signin["btnSubmit"].value = "btnSubmit.x=29&btnSubmit.y=22"
My only problem is the last field btnSubmit which I do not know how to input a value since this is of the following type:
<input type="image" name="btnSubmit" id="btnSubmit" tabindex="3" src="../image/login_btn.gif" style="height:41px;width:57px;border-width:0px;" />
when I submit on the website, using the Chrome Tools I get the following outcome:
__VIEWSTATE:/wEPDwULLTE5ODM2NTU1MzJkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBQlidG5TdWJtaXRriD1xvrfrHuJ/0xbQM08yEjyoUg==
__VIEWSTATEGENERATOR:E78488FE
adminid:user
btnSubmit.x:23
btnSubmit.y:15
pswd:password
Where basically the x,y positions are where I clicked on the page. Really do not know how to do this request through Python. Used this to no avail.
When you click on an input object of type image, two form values are set, the button name plus .x for the first, and .y for the other.
However, pressing Enter in a regular text input field will also submit a form, so you don't have to click on a submit button. I'd just leave the value empty altogether.
There is not much flexibility in the way robobrowser handles form submits, to avoid using the submit button you'd have to delete it from the form outright:
del signin.fields['btnSubmit']
before submitting.
If you must submit using the image button, then you'll have to teach Robobrowser how to handle that type; currently it has no handling for these. The following adds that:
from functools import wraps
from robobrowser.forms import form
from robobrowser.forms.fields import Submit, Input
class ImageSubmit(Submit):
def serialize(self):
return {self.name + '.x': '0', self.name + '.y': '0'}
def include_image_submit(parse_field):
#wraps(parse_field)
def wrapper(tag, tags):
field = parse_field(tag, tags)
if type(field) is Input: # not a subclass, exactly this class
if field._parsed.get('type') == 'image':
field = ImageSubmit(field._parsed)
return field
return wrapper
form._parse_field = include_image_submit(form._parse_field)
at which point you can use browser.submit_form(signin, signin['btnSubmit']) to submit the form and the correct fields will be included.
I've submitted a pull request to the robobrowser project to add image submit support.

get URL of any submission for subreddits

I am trying to use PRAW to get new posts from subreddits on Reddit. The following code snippet shows how I get new items on a particular subreddit.
Is there a way to also get the URL of the particular submission?
submissions = r.get_subreddit('todayilearned')
submission = submissions.get_new(limit=1)
sub = [str(x) for x in submission]
print sub
PRAW allows you to do this:
To get the submitted link you can use submission.url
[submission] = submissions.get_new(limit=1)
print submission.url
Or if you're looking for the URL for the actual post to Reddit then you can use permalink
[submission] = submissions.get_new(limit=1)
print submission.permalink
The documentation lists a short_link property that returns a shortened version of the url to the submission. It does not appear that the full url is similarly provided, though it seems that it could be reconstructed from the subreddit name and the submission's id, which is stored in submission.id.
In summary, use:
[submission] = submissions.get_new(limit=1)
submission.short_link
to get a link to the submission.

I would like to automate a process of submitting jobs online and collecting job-ids using Python

I want to submit a protein sequence online for HMM comparison using HHpred tool and collect its job id so that I can collect the output later and process. But I have 1200 such sequences and I need to automate the process very badly. I tried to use mechanize package in Python but I couldn't understand properly as there is no documentation available.
import mechanize
ur = "http://toolkit.tuebingen.mpg.de/hhpred/"
request = mechanize.Request(ur)
response = mechanize.urlopen(request)
forms = mechanize.ParseResponse(response, backwards_compat=False)
print response.code
form = forms[0]
print form
original_text = form["jobid"]
form["jobid"] = '''MNDKSKNMMKNFIRTFAGLLLAILLILGFFLLVFPKAGDRFLADKKVSTLSAKNLTYAALGDSLTEGVGDATGQGGF VPLFAKDIENKTDSSVSSQNFGKAGDTSTQIYNRMMKSKKITDGLKKADIITITIGGNDVLKV
IRDNVSKLSSMTEKDFTKPEELYQARVKKLLDKIREDNPKAQIYVLGIYNPFYLNFPELTVMQNVIDSWNTATAGVVSQE KNTYFIPINDLLYKGSGDKQAVESGSTSDAVSNNLLYTEDHFHPNNVGYQLMADAVFASY
KEVNQK'''
control = form.find_control("jobid")
print control.name, control.value, control.type
control = form.find_control("showres")
print control.name, control.value, control.type
print control.disabled
request2 = form.click("showres")
response2 = mechanize.urlopen(request2)
forms2 = mechanize.ParseResponse(response2, backwards_compat=False)
form2 = forms2[0]
print form2
The website http://toolkit.tuebingen.mpg.de/hhpred/ has many input fields but i could see only the "jobid" and "showres" in the control list using 'mechanize' parser. Code above is what i tried to do but its totally incorrect.
I actually want to paste the sequence in the text box and hit on submit and if possible give my own job-id at the bottom. And save the url of the resulting page after hitting submit.
Kindly help me. (I'm using windows)

Python Mechanize does not list all forms?

I've been writing a program to log in to Facebook and update the status as a side project. I managed to get the program to login. However, I'm having trouble selecting the textarea that ends up being the "Enter your status here" box. Using "Inspect Element" in Chrome, I'm able to see the form under which it's located, but listing the forms in the program doesn't seem to list said form...
import mechanize
import re
br = mechanize.Browser()
usernamecorrect = 0
while usernamecorrect == 0:
username = raw_input("What is the username for your Facebook Account? ")
matchmail = re.search(r'[\w.-]+#[\w.-]+', username)
if matchmail:
print matchmail.group()
usernamecorrect = 1
else:
print "That is not a valid username; please enter the e-mail address registered with your account.\n"
password = raw_input("What is the password for your account?")
print "Logging in..."
br.set_handle_robots(False)
br.open("https://www.facebook.com/")
br.select_form(nr = 0)
br['email'] = username
br['pass'] = password
br.submit()
raw_input("Login successful!")
print "Forms: \n"
for f in br.forms():
print f.name
The full output is as follows:
What is the username for your Facebook Account? myemail#website.com
What is the password for your account? thisisapassword
Logging in...
Login successful!
Forms:
navSearch
None
I took a look through the source of Facebook via Inspect Elements again, and "navSearch" is the "Find People, things, etc." search bar, and the unnamed form appears to have to do with the logout button. However, while Inspect Elements gives at least 2 more forms, one of which holds the status update box. I haven't been able to determine if it's because of JavaScript or not (while the status update box code block is encapsulated in , so are the navSearch and logout forms.) The most relevant thing I've been able to find is that navSearch and the logout forms are in a separate div, but I somehow feel as though that shouldn't be much of a problem for mechanize. Is there just something wrong with my code, or is it something else entirely?
Is there just something wrong with my code, or is it something else entirely?
Your whole approach is wrong:
I've been writing a program to log in to Facebook and update the status
That’s what the Graph API is for.
Scraping FB pages and trying to act as a “browser” is not the way to go. Apart from the fact, that FB policies do not allow that, you see how difficult it gets on a page that uses JavaScript/AJAX so much.
Go with the API, it’s the easy way.

pycurl script can't login to website

I'm currently trying to get a grasp on pycurl. I'm attempting to login to a website. After logging into the site it should redirect to the main page. However when trying this script it just gets returned to the login page. What might I be doing wrong?
import pycurl
import urllib
import StringIO
pf = {'username' : 'user', 'password' : 'pass' }
fields = urllib.urlencode(pf)
pageContents = StringIO.StringIO()
p = pycurl.Curl()
p.setopt(pycurl.FOLLOWLOCATION, 1)
p.setopt(pycurl.COOKIEFILE, './cookie_test.txt')
p.setopt(pycurl.COOKIEJAR, './cookie_test.txt')
p.setopt(pycurl.POST, 1)
p.setopt(pycurl.POSTFIELDS, fields)
p.setopt(pycurl.WRITEFUNCTION, pageContents.write)
p.setopt(pycurl.URL, 'http://localhost')
p.perform()
pageContents.seek(0)
print pageContents.readlines()
EDIT: As pointed out by Peter the URL should point to a login URL but the site I'm trying to get this to work for fails to show me what URL this would be. The form's action just points to the home page ( /index.html )
As you're troubleshooting this problem, I suggest getting a browser plugin like FireBug or LiveHTTPHeaders (I suggest Firefox plugins, but there are similar plugins for other browsers as well). Then you can exercise a request to the site and see what action (URL), method, and form parameters are being passed to the target server. This will likely help elucidate the crux of the problem.
If that's no help, you may consider using a different tool for your mechanization. I've used ClientForm and BeautifulSoup to perform similar operations. Based on what I've read in the pycURL docs and your code above, ClientForm might be a better tool to use. ClientForm will parse your HTML page, locate the forms on it (including login forms), and construct the appropriate request for you based on the answers you supply to the form. You could even use ClientForm with pycURL... but at least ClientForm will provide you with the appropriate action to which to POST, and construct all of the appropriate parameters.
Be aware, though, that if there is JavaScript handling any necessary part of the login form, even ClientForm can't help you there. You will need something that interprets the JavaScript to effectively automate the login. In that case, I've used SeleniumRC to control a browser (and I let the browser handle the JavaScript).
One of the golden rule, you need to 'brake the ice', have debugging enabled when trying to solve pycurl example:
Note: don't forget to use p.close() after p.perform()
def test(debug_type, debug_msg):
if len(debug_msg) < 300:
print "debug(%d): %s" % (debug_type, debug_msg.strip())
p.setopt(pycurl.VERBOSE, True)
p.setopt(pycurl.DEBUGFUNCTION, test)
Now you can see how your code is breathing, because you have debugging enabled
import pycurl
import urllib
import StringIO
def test(debug_type, debug_msg):
if len(debug_msg) < 300:
print "debug(%d): %s" % (debug_type, debug_msg.strip())
pf = {'username' : 'user', 'password' : 'pass' }
fields = urllib.urlencode(pf)
pageContents = StringIO.StringIO()
p = pycurl.Curl()
p.setopt(pycurl.FOLLOWLOCATION, 1)
p.setopt(pycurl.COOKIEFILE, './cookie_test.txt')
p.setopt(pycurl.COOKIEJAR, './cookie_test.txt')
p.setopt(pycurl.POST, 1)
p.setopt(pycurl.POSTFIELDS, fields)
p.setopt(pycurl.WRITEFUNCTION, pageContents.write)
p.setopt(pycurl.VERBOSE, True)
p.setopt(pycurl.DEBUGFUNCTION, test)
p.setopt(pycurl.URL, 'http://localhost')
p.perform()
p.close() # This is mandatory.
pageContents.seek(0)
print pageContents.readlines()

Categories

Resources