Mechanize difference between br.click_link() and br.follow_link() - python

I am using mechanize to automate some form submissions.
To do that I require to go to the home page of some website, click on a link with a particular text which redirects me to an another page and fill in the form in the new page.
I tried using
response = br.follow_link(text_regex="sometext")
for f in response.forms()
print f.name
The error Message I got was AttributeError: closeable_response instance has no attribute 'forms'
When I tried
for f in br.forms()
print f.name
It prints the name of forms in the hompage and not the new page I redirect to .
How can find the name of the forms in the new page ?
What does 'response' contain ?
And what is the difference between click_link() and follow_link() . The mechnanize doc doesn't explain this clearly.
Thanks

for the difference between click_link() and follow_link():
both methods take the same keywords as parameters.
click_link() is a method which will return a Request object, which then can be used for creating a request:
req = br.click_link(text='Sample Text')
br.open(req)
follow_link() will perform the same action as .open(), directly opening the link.
This information has been taken from the following documentation:
http://joesourcecode.com/Documentation/mechanize0.2.5/mechanize._mechanize.Browser-class.html#click_link
Follow_link() behaviour can be observed in the examples given at wwwsearch:
http://wwwsearch.sourceforge.net/mechanize/

Related

How to check if there is a next page or not from the response of requests without getting Key Error. Python

So I am calling github API to get the commit messages from some users repositories. And I am creating files to write the messages there...
I want to check if there is another page from my response so that I can call the api again with the page = page + 1
I saw that to check if there is another page you can use:
response.links['next']
but if there is no 'next' you get an error and your code stops. You don't get a null or something that you can use it in an 'if' statement.
KeyError: 'next'
So my questions is: How can I check if there is another page to loop my 'gets' on all pages. And if there is no 'next page' to continute my code...
Thank you in advance...
you can use response.links.get('next', None)
explanation:
dict get or default

Mechanicalsoup Not Logging Me In

I am trying to login to a site with mechanicalsoup, but when I submit the form, it keeps me on the same page. I have done a lot of researching for this and could not find an answer.
br.open(domain + action)
form = br.select_form()
user_info = getUserInfo()
br["ff_login_id"] = user_info["eid"]
br["ff_password"] = user_info["password"]
br["empl-login_submit"] = "SUBMITTED"
br.get_current_form().print_summary()
res = br.submit(form, domain)
print(res) #This is getting a response 200
If you have used Mechanize to do this in the past, then it should be possible with MechanicalSoup, unless the website has changed.
Depending on what action is, this could be an issue with the URL passed to submit. The preferred method to submit forms is:
res = br.submit_selected()
This will ensure that you are passing the correct URL to the submit function. Perhaps give that a try to see if it solves your problem.

Authentication issue using requests on aspx site

I am trying to use requests (python) to grab some pages from a website that requires me to be logged in.
I did inspect the login page to check out the username and password headers. But I found the names for those fields are not the standard 'username', 'password' used by most sites as you can see from the below screenshots
password field
I used them that way in my python script but each time I get a 'wrong syntax' error. Even sublimetext displayed a part of the name in orange as you can see from the pix below
From this I know there must be some problem with the name. But try to escape the $ signs did not help.
Even the login.aspx header disappears before google chrome could register it on the network.
The site is www dot bncnetwork dot net
I'd be happy if someone could help me figure out what to do about this.
Here is the code`import requests
import requests
def get_project_page(seed_page):
username = "*******************"
password = "*******************"
bnc_login = dict(ctl00$MainContent$txtEmailID=username, ctl00$MainContent$txtPassword=password)
sess_req = requests.Session()
sess_req.get(seed_page)
sess_req.post(seed_page, data=bnc_login, headers={"Referer":"http://www.bncnetwork.net/MyBNC.aspx"})
page = sess_req.get(seed_page)
return page.text`
You need to use strings for the keys, the $ will cause a syntax error if you don't:
data = {"ctl00$MainContent$txtPassword":password, "ctl00$MainContent$txtEmailID":email}
There are evenvalidation fileds etc.. to be filled in also, follow the logic from this answer to fill them out, all the fields can be seen in chrome tools:

Python Mechanize write to TinyMCE Text Editor

I'm using Python Mechanize for adding an event to WordPress but I can't seem to figure out how to write to the TinyMCE Editor in the 'Add New' Event section.
I've been able to make a draft so far by just setting the Title with some value for testing purposes but I am stuck here. What I've done so far is...
br = mechanize.Browser()
response = br.open(url)
Intermediate steps to get to the correct page that don't need to be listed...
Once on the correct page I choose the form that I want to work with, select it and set the title. Once I submit I can actually travel to my drafts section in my normal chrome/firefox browser to see a draft has been created.
for f in br.forms():
if f.name == postForm:
print f
br.select_form(f.name)
br.form['post_title'] = 'Creating from MECHANIZE'
br.submit(name='save', label='Save Draft')
What would be the intermediary steps to input data into the TinyMCE editor?
I realized that by writing:
br.form['content'] = "some content"
You are able to write to the custom textarea. Any HTML content that you have in triple double-quotes will show up as you want once you submit the post.

pycurl script can't login to website

I'm currently trying to get a grasp on pycurl. I'm attempting to login to a website. After logging into the site it should redirect to the main page. However when trying this script it just gets returned to the login page. What might I be doing wrong?
import pycurl
import urllib
import StringIO
pf = {'username' : 'user', 'password' : 'pass' }
fields = urllib.urlencode(pf)
pageContents = StringIO.StringIO()
p = pycurl.Curl()
p.setopt(pycurl.FOLLOWLOCATION, 1)
p.setopt(pycurl.COOKIEFILE, './cookie_test.txt')
p.setopt(pycurl.COOKIEJAR, './cookie_test.txt')
p.setopt(pycurl.POST, 1)
p.setopt(pycurl.POSTFIELDS, fields)
p.setopt(pycurl.WRITEFUNCTION, pageContents.write)
p.setopt(pycurl.URL, 'http://localhost')
p.perform()
pageContents.seek(0)
print pageContents.readlines()
EDIT: As pointed out by Peter the URL should point to a login URL but the site I'm trying to get this to work for fails to show me what URL this would be. The form's action just points to the home page ( /index.html )
As you're troubleshooting this problem, I suggest getting a browser plugin like FireBug or LiveHTTPHeaders (I suggest Firefox plugins, but there are similar plugins for other browsers as well). Then you can exercise a request to the site and see what action (URL), method, and form parameters are being passed to the target server. This will likely help elucidate the crux of the problem.
If that's no help, you may consider using a different tool for your mechanization. I've used ClientForm and BeautifulSoup to perform similar operations. Based on what I've read in the pycURL docs and your code above, ClientForm might be a better tool to use. ClientForm will parse your HTML page, locate the forms on it (including login forms), and construct the appropriate request for you based on the answers you supply to the form. You could even use ClientForm with pycURL... but at least ClientForm will provide you with the appropriate action to which to POST, and construct all of the appropriate parameters.
Be aware, though, that if there is JavaScript handling any necessary part of the login form, even ClientForm can't help you there. You will need something that interprets the JavaScript to effectively automate the login. In that case, I've used SeleniumRC to control a browser (and I let the browser handle the JavaScript).
One of the golden rule, you need to 'brake the ice', have debugging enabled when trying to solve pycurl example:
Note: don't forget to use p.close() after p.perform()
def test(debug_type, debug_msg):
if len(debug_msg) < 300:
print "debug(%d): %s" % (debug_type, debug_msg.strip())
p.setopt(pycurl.VERBOSE, True)
p.setopt(pycurl.DEBUGFUNCTION, test)
Now you can see how your code is breathing, because you have debugging enabled
import pycurl
import urllib
import StringIO
def test(debug_type, debug_msg):
if len(debug_msg) < 300:
print "debug(%d): %s" % (debug_type, debug_msg.strip())
pf = {'username' : 'user', 'password' : 'pass' }
fields = urllib.urlencode(pf)
pageContents = StringIO.StringIO()
p = pycurl.Curl()
p.setopt(pycurl.FOLLOWLOCATION, 1)
p.setopt(pycurl.COOKIEFILE, './cookie_test.txt')
p.setopt(pycurl.COOKIEJAR, './cookie_test.txt')
p.setopt(pycurl.POST, 1)
p.setopt(pycurl.POSTFIELDS, fields)
p.setopt(pycurl.WRITEFUNCTION, pageContents.write)
p.setopt(pycurl.VERBOSE, True)
p.setopt(pycurl.DEBUGFUNCTION, test)
p.setopt(pycurl.URL, 'http://localhost')
p.perform()
p.close() # This is mandatory.
pageContents.seek(0)
print pageContents.readlines()

Categories

Resources