Can't get proper results with mechanize when using br.submit() - python

I am trying to submit a form, and get the results of the page that it heads to after submitting the form. I'm using mechanize.
1) When I'm using the code to click on the first-button, it is getting a response. But when I read the response, it is showing the source of the same page (the page where the form is located). Not of the page that the browser is redirected to after the submission of the form.
from mechanize import Browser
br = Browser()
br.open("http://link.net/form_page.php")
br.select_form(nr=0)
br.form['number'] = '0123456789'
response = br.submit(nr=0)
print response.read()
Now, when I do this, the source of the same page (i.e. form_page.php) is showing up. But, it should have shown the source of "results.php" (that is where the browser leads to when I do it manually)
2) There are multiple submit buttons in the page. I am clicking only the first one. But when I'm trying to click other submit buttons using nr=1 or nr=2, it is showing this error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/mechanize /_mechanize.py", line 524, in select_form
raise FormNotFoundError("no form matching "+description)
mechanize._mechanize.FormNotFoundError: no form matching nr 1
Can you please help me?

make sure you are selecting right form or make sure there is a form that you are selecting on the web page. you can check it by like this code :
for form in br.forms():
print form
and see what result returned to you.

This looks similar to this issue, where submit was calling some Javascript to validate the inputs before redirecting. It may be worth having a look at the HTML of the page and checking what it does on submit.

Try the following:
import mechanize
br = mechanize.Browser()
br.open("http://link.net/form_page.php")
br.select_form(nr=0)
br['number'] = '0123456789' ### try instead of 'br.form[]'
response = br.submit() ### no need to specify form again
text = response.read()
Don't forget about 'br.set_handle_robots(False)', 'br.set_all_readonly(False)', etc...

Related

Python code to fill and submit Stulish Form

I'm trying to write a Python code to submit a simple form.
http://stulish.com/soumalya01
[Edit : http://travelangkawi.com/soumalya01/
When you use this link it returns a different page on form submit. Is good for debugging]
Any code would do
Tried both mechanize and mechanical soup. Both are unable to handle the text fields. It does not have a name only ID. But we are unable to get the element by ID
Any Code would do as long as it works. (Fill ABC in the text box and hit submit)
I just followed the documentations of mechanize. See sample code below:
from mechanize import Browser
br = Browser()
br.open('http://stulish.com/soumalya01')
br.select_form(nr=0)
form.set_all_readonly(False) #add this
br.form.set_value('ABC', nr=1)
print(br.form.controls[1])
br.submit()

Scraping a react.js webpage with dryscrape

I have trouble scraping the homepage http://www.jobs.ch which is programmed with react.js.
I want to put the term Business in the search box and execute the search.
Dryscrape worked for another example which was not a react.js page.
How can I write the term Business in this search field?
The error message when my script is executed:
ubuntu#ubuntu:~/scripts$ python jobs.py
Traceback (most recent call last):
File "jobs.py", line 30, in <module>
name.set("Business")
AttributeError: 'NoneType' object has no attribute 'set'
Here is my script:
#We will write a Python script to visit a webpage. Fill in the form and submit the form.
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import dryscrape
# make sure you have xvfb installed
dryscrape.start_xvfb()
root_url = 'http://www.jobs.ch/en/vacancies/'
if __name__ == '__main__':
# set up a web scraping session
session = dryscrape.Session(base_url = root_url)
# we don't need images
session.set_attribute('auto_load_images', False)
session.set_header('User-agent', 'Google Chrome')
# visit exact webpage which is the form in this example
session.visit('http://www.jobs.ch/en/vacancies/')
# fill in the form by taking ID of field from webdev tool
#name = session.at_xpath('//*[#data-reactid="107]')
name = session.at_xpath('//*[#data-reactid="107"]//*[#class="search-input col-sm-4 col-md-5"]')
name.set("Business")
# submit form
name.form().submit()
# save a screenshot of the web page
session.render("jobs.png")
print("Session rendered successfully!")
I think your xpath has an issue but apart from that, your session itself has been configured incorrectly.
This line
session = dryscrape.Session(base_url = root_url)
sets the base of the URL to your root_url so when you do session.visit('http://www.jobs.ch/en/vacancies/') you are in fact visiting the concatenation of your root_url and the URL provided in session.visit.
If you print session.url() you would be able to see that the URL you actually visited was http://www.jobs.ch/en/vacancies/http://www.jobs.ch/en/vacancies/
The xpath of the page which I got from Chrome -> Inspect -> Right Click -> Copy XPath is //*[#id="react-root"]/div/div[1]/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/div/div[1]/div/input
Please verify that you are using the correct xpath.

Input html form data from python script

I am working on a project and I need to validate a piece of data using a third party site. I wrote a python script using the lxml package that successfully checks if a specific piece of data is valid.
Unfortunately, the site does not have a convenient url scheme for their data and therefor I can not predict the specific url that will contain the data for each unique request. Instead the third party site has a query page with a standard html text input that redirects to the proper url.
My question is this: is there a way to input a value into the html input and submit it all from my python script?
Yes there is.
Mechanize
Forms
List the forms
import mechanize
br = mechanize.Browser()
br.open(url)
for form in br.forms():
print "Form name:", form.name
print form
select form
br.select_form("form1")
br.form = list(br.forms())[0]
login form example
br.select_form("login")
br['login:loginUsernameField'] = user
br['login:password'] = password
br.method = "POST"
response = br.submit()
Selenium
Sending input
Given an element defined as:
<input type="text" name="passwd" id="passwd-id" />
you could find it using any of:
element = driver.find_element_by_id("passwd-id")
element = driver.find_element_by_name("passwd")
element = driver.find_element_by_xpath("//input[#id='passwd-id']")
You may want to enter some text into a text field:
element.send_keys("some text")
You can simulate pressing the arrow keys by using the “Keys” class:
element.send_keys("and some", Keys.ARROW_DOWN)
These are the two packages I'm aware of that can do what you've asked.

Mechanize to submit and read response

I am using mechanize in python to submit a form and print out the response but it does not seem to work
import mechanize
# The URL to this service
URL = 'http://sppp.rajasthan.gov.in/bidsearch.php'
def main():
# Create a Browser instance
b = mechanize.Browser()
# Load the page
b.open(URL)
# Select the form
b.select_form(nr=0)
# Fill out the form
b['ddlfinancialyear'] = '2015-2016'
b.submit()
b.response().read()
What I am trying to do is submit a form using the url 'sppp.rajasthan.gov.in/bidsearch.php';, and when the form is submitted( by trying to pass value '2015-2016' to 'ddfinancialyear' control) another page should be returned as a response and I am not getting any output.
Try assigning the b.submit before reading it:
S = b.submit()
S.read()

Using Python and Mechanize with ASP Forms

I'm trying to submit a form on an .asp page but Mechanize does not recognize the name of the control. The form code is:
<form id="form1" name="frmSearchQuick" method="post">
....
<input type="button" name="btSearchTop" value="SEARCH" class="buttonctl" onClick="uf_Browse('dledir_search_quick.asp');" >
My code is as follows:
br = mechanize.Browser()
br.open(BASE_URL)
br.select_form(name='frmSearchQuick')
resp = br.click(name='btSearchTop')
I've also tried the last line as:
resp = br.submit(name='btSearchTop')
The error I get is:
raise ControlNotFoundError("no control matching "+description) ControlNotFoundError: no control matching name 'btSearchTop', kind 'clickable'
If I print br I get this: IgnoreControl(btSearchTop=)
But I don't see that anywhere in the HTML.
Any advice on how to submit this form?
The button doesn't submit the form - it calls some javascript function.
Mechanize can't run javascript, so you can't use it to click that button.
The easy way out is to read that function yourself, and see what it does - if it just submits the form, then maybe you can get around it by submitting the form without clicking on anything.
you need to inspect element first, did mechanize recognize the form ?
for form in br.forms():
print form

Categories

Resources