fill html form using python 3.4 - python

I am trying to fill in a html form using python code, the idea is that an agent sends data to the html webpage. After that another python script collects the data (cgi is used). I tried quite some options but I am not able to change the value of the form named 'os' using the agent script. When I fill in the form by hand everything works well and the collector takes over.
If more information is needed i happely provide it.
I hope someone can help me.
Agent script:
import requests
url = 'http://127.0.0.1/PythonApp/website.html'
data = {'os':'test12345'}
session = requests.session()
r = requests.post(url, data)
print(r.status_code, r.reason)
print(r.text)
HTML page:
<html>
<head>Text</head>
<body>
<form action="postmethod.py" method="POST">
<input type="text" name="os" value="abc">
</form>
</body>
</html>
Collector (postmethod.py):
import cgi
# print header
print("Content-type: text/html\n\n")
print("<h2>Arguments</h2>")
form = cgi.FieldStorage()
arg1 = str(form.getvalue('os'))
miauw = ("os: " +arg1+"<br>")
print(miauw)

Related

Python 3, filling out a form with request (library) returns same page HTML without inputting parameters

I am trying to use requests to fill out a form on https://www.doleta.gov/tradeact/taa/taa_search_form.cfm and return the HTML of the new page that this opens and extract information from the new page.
Here is the relevant HTML
<form action="taa_search.cfm" method="post" name="number_search" id="number_search" onsubmit="return validate(this);">
<label for="input">Petition number</label>
:
<input name="input" type="text" size="7" maxlength="7" id="input">
<input type="hidden" name="form_name" value="number_search" />
<input type=submit value="Get TAA information" />
</form>
Here is the python code I am trying to use.
url = 'https://www.doleta.gov/tradeact/taa/taa_search.cfm'
payload = {'number_search':'11111'}
r = requests.get(url, params=payload)
with open("requests_results1.html", "wb") as f:
f.write(r.content)
When you perform the query manually, this page opens https://www.doleta.gov/tradeact/taa/taa_search.cfm.
However, when I use the above Python code, it returns the HTML of https://www.doleta.gov/tradeact/taa/taa_search_form.cfm (the first page) and nothing is different.
I cannot perform similar code on https://www.doleta.gov/tradeact/taa/taa_search.cfm because it redirects to the first URL and thus, running the code returns the HTML of the first URL.
Because of the permissions setup of my computer, I cannot redirect the path of my PC (which means Selenium is off the table) and I cannot install Python 2 (which means mechanize is off the table). I am open to using urllib but do not know the library very well.
I need to perform this action ~10,000 times to scrap the information. I can build the iteration part myself, but I cannot figure out how to get the base function to work properly.
The first observation is that you seem to be using a get request in your example code instead of a post request.
<form action="taa_search.cfm" method="post" ...>
^^^^^^^^^^^^^
After changing to a post request, I was still getting the same results as you though (html from the main search form page). After a bit of experimentation, I seem to be able to get the proper html results by adding a referer to the header.
Here is the code (I only commented out the writing to file part for example purposes):
import requests
BASE_URL = 'https://www.doleta.gov/tradeact/taa'
def get_case_decision(case_number):
headers = {
'referer': '{}/taa_search_form.cfm'.format(BASE_URL)
}
payload = {
'form_name': 'number_search',
'input': case_number
}
r = requests.post(
'{}/taa_search.cfm'.format(BASE_URL),
data=payload,
headers=headers
)
r.raise_for_status()
return r.text
# with open('requests_results_{}.html'.format(case_number), 'wb') as f:
# f.write(r.content)
Testing:
>>> result = get_case_decision(10000)
>>> 'MODINE MFG. COMPANY' in result
True
>>> '9/12/1980' in result
True
>>> result = get_case_decision(10001)
>>> 'MUSKIN CORPORATION' in result
True
>>> '2/27/1981' in result
True
Since you mentioned that you need to perform this ~10,000 times, you will probably want to look into using requests.Session as well.

Methods on linking a HTML Tornado server and Python file

This is my sample HTML file
<html>
<head>
<title>
</title>
</head>
<body>
<form action="">
Value a:<br>
<input type="text" name="Va">
<br>
Value b:<br>
<input type="text" name="Vb">
<br><br>
<input type="submit">
</form>
<textarea rows="4" cols="10">
</textarea>
<p>
</p>
</body>
</html>
And a given template Tornado server code:(I also need help on the explanation of each section of the following code)
import tornado.ioloop
import tornado.web
import tornado.httpserver
import tornado.gen
import tornado.options
tornado.options.parse_command_line()
class APIHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self.render('template.html')
#tornado.gen.engine
def post(self):
try:
num = int(self.get_argument('num'))
except:
num = 5
self.render('template.html')
app = tornado.web.Application([(r"/next_rec",APIHandler),])
if __name__ == "__main__":
server = tornado.httpserver.HTTPServer(app)
server.bind(48763)
server.start(5)
tornado.ioloop.IOLoop.current().start()
and finally my python code:
if __name__ == '__main__':
a = int(raw_input())
b = int(raw_input())
print a+b
I am using a simple 'a+b' function to test out this feature. But my problem is I can't figure out a way to link them together. So my ultimate goal is to click on the "Submit" button on the HTML, pass on two values to the Tornado server, use it as input in my python script and finally show the output in the text area of the HTML or on another page. I'm know there are tons of information on the web, but I'm completely new to Tornado (near 0 knowledge) and most of them I can't really understand. Help on methods or keywords for search is much appreciated, thank you very much. (please keep answers as basic as possible, it will help a lot, thanks!)
First of all you should check the official documentation. It is quite simple and it targets the newcomers.
Also in this short guide, the sections of a similar code as your is being explained with simplicity.
Now for your code:
On your template you need to specify that the form should send a post request on submit by adding <form method="post" id="sum_form">
Also you need to make sure that you will be submit the data added in the form on an event: $("#sum_form").submit();
On your post method you need to read the passed numbers from your client's form, add them and then send them back to the template as a parameter.
For example:
def post(self):
numA = int(self.get_argument('Va'))
numB = int(self.get_argument('VB'))
sumAB = numA + numB
self.render('template.html',sumAB=sumAB)
In you template.html you need to add a field where you will display the passed sum as a jinja variable : {{sumAB}}

use python requests to post a html form to a server and save the reponse to a file

I have exactly the same problem as this post
Python submitting webform using requests
but your answers do not solve it. When I execute this HTML file called api.htm in the browser, then for a second or so I see its page.
Then the browser shows the data I want with the URL https://api.someserver.com/api/ as as per the action below. But I want the data written to a file so I try the Python 2.7 script below.
But all I get is the source code of api.htm Please put me on the right track!
<html>
<body>
<form id="ID" method="post" action="https://api.someserver.com/api/ ">
<input type="hidden" name="key" value="passkey">
<input type="text" name="start" value ="2015-05-01">
<input type="text" name="end" value ="2015-05-31">
<input type="submit" value ="Submit">
</form>
<script type="text/javascript">
document.getElementById("ID").submit();
</script>
</body>
</html>
The code:
import urllib
import requests
def main():
try:
values = {'start' : '2015-05-01',
'end' : '2015-05-31'}
req=requests.post("http://my-api-page.com/api.htm",
data=urllib.urlencode(values))
filename = "datafile.csv"
output = open(filename,'wb')
output.write(req.text)
output.close()
return
main()
I can see several problems:
Your post target URL is incorrect. The form action attribute tells you where to post to:
<form id="ID" method="post" action="https://api.someserver.com/api/ ">
You are not including all the fields; type=hidden fields need to be posted too, but you are ignoring this one:
<input type="hidden" name="key" value="passkey">
Do not URL-encode your POST variables yourself; leave this to requests to do for you. By encoding yourself requests won't recognise that you are using an application/x-www-form-urlencoded content type as the body. Just pass in the dictionary as the data parameters and it'll be encoded for you and the header will be set.
You can also stream the response straight to a file object; this is helpful when the response is large. Switch on response streaming, make sure the underlying raw urllib3 file-like object decodes from transfer encoding and use shutil.copyfileobj to write to disk:
import requests
import shutil
def main():
values = {
'start': '2015-05-01',
'end': '2015-05-31',
'key': 'passkey',
}
req = requests.post("http://my-api-page.com/api.htm",
data=values, stream=True)
if req.status_code == 200:
with open("datafile.csv", 'wb') as output:
req.raw.decode_content = True
shutil.copyfileobj(req.raw, output)
There may still be issues with that key value however; perhaps the server sets a new value for each session, coupled with a cookie, for example. In that case you'd have to use a Session() object to preserve cookies, first do a GET request to the api.htm page, parse out the key hidden field value and only then post. If that is the case then using a tool like robobrowser might just be easier.

Display Python output in HTML

What is the simplest way to display the Python ystockquote (http://goldb.org/ystockquote.html) module output in HTML? I am creating an HTML dashboard which will be run locally on my computer and want to insert the stock output results into the designated HTML placeholders. I am hoping that because it is local I can avoid many CGI and server requirements.
I would use a templating system (see the Python wiki article). jinja is a good choice if you don't have any particular preferences. This would allow you to write HTML augmented with expansion of variables, control flow, etc. which greatly simplifies producing HTML automatically.
You can simply write the rendered HTML to a file and open it in a browser, which should prevent you from needing a webserver (though running python -m SimpleHTTPServer in the directory containing the HTML docs will make them available under http://localhost:8000)
Here is a simple server built using web.py (I have been playing with this for a while now, so this was a fun question to answer)
import web
import ystockquote
urls = (
'/', 'index'
)
app = web.application(urls, globals())
class index:
def POST(self):
history = ystockquote.get_historical_prices(web.input()['stock'], web.input()['start'], web.input()['end'])
head = history[0]
html = '<html><head><link href="//netdna.bootstrapcdn.com/twitter-bootstrap/2.3.1/css/bootstrap-combined.min.css" rel="stylesheet"><body><table class="table table-striped table-bordered table-hover"><thead><tr><th>{}<th>{}<th>{}<th>{}<th>{}<th>{}<th>{}<tbody>'.format(*head)
for row in history[1:]:
html += "<tr><td>{}<td>{}<td>{}<td>{}<td>{}<td>{}<td>{}".format(*row)
return html
def GET(self):
return """<html>
<head><link href='//netdna.bootstrapcdn.com/twitter-bootstrap/2.3.1/css/bootstrap-combined.min.css' rel='stylesheet'>
<body>
<form method='POST' action='/'><fieldset>
Symbol <input type='input' name='stock' value='GOOG'/><br/>
From <input type='input' name='start' value='20130101'/><br/>
To <input type='input' name='end' value='20130506'/><br/>
<input type='submit' class='btn'/></fieldset></form>"""
if __name__ == "__main__":
app.run()

Python script for SVG to PNG conversion with Extjs

I'm trying to save a chart by converting SVG to PNG with a Python script.
So I start storing the svg data in a variable with :
var svgdata = Ext.draw.engine.SvgExporter.generate(chart.surface);
When I do alert(svgdata), I can see that this output is correct.
But when I send it to the server like this :
Ext.draw.engine.ImageExporter.defaultUrl = "data/svg_to_png.py?svgdata="+svgdata;
The svgdata that has been sent looks like this :
<?xml version=
I'm new to extjs, please help me on this one. What is the right way to send svg data to my python script and render a png image ?
This is my python script :
import cairo
import cgi
import rsvg
print "Content-type: image/png\n\n"
arguments = cgi.FieldStorage()
img = cairo.ImageSurface(cairo.FORMAT_ARGB32, 640,480)
ctx = cairo.Context(img)
handler= rsvg.Handle(None, str(arguments["svgdata"]))
handler.render_cairo(ctx)
img.write_to_png("svg.png")
HELP ME PLEASE!
<div style="display:none;">
<iframe id="file_download_iframe" src="blank.html"></iframe>
</div>
You will need a blank html page on your server for this to work properly in all browsers. Basically the blank.html page is an empty page to satisfy that the ifram always has a page in it.
Then you need a basic form hidden somewhere too:
<div style="display:none;">
<form
id = "file_download_iframe_form"
name = "file_download_iframe_form"
target = "file_download_iframe"
method = "post"
action = "data/svg_to_png.py"
>
<input type="hidden" id="svgdata" name="svgdata"/>
</form>
</div>
Then have a javascript function like this:
function getImage(svgdata){
var form = document.getElementById("file_download_iframe_form");
document.getElementById("svgdata").value = svgdata;
form.submit();
};

Categories

Resources