I am trying to write some python code to automate the querying of an online medical calculation tool. The ressource is available at:
http://www.shef.ac.uk/FRAX/tool.aspx?lang=en
I am new to this type of thing, but understand from my research that I should be able to use the python requests package for this.
From my inspection of the page source I have identified the form element
<form method="post" action="tool.aspx?lang=en" id="form1">
And the elements that seem to directly correspond to the fields (eg. age) look like this
<input name="ctl00$ContentPlaceHolder1$toolage" type="text" id="ContentPlaceHolder1_toolage" maxlength="5" size="3" onkeypress="numericValidate(event)" style="width:40px;" />
My testing code so far looks like this (The only required fields to have filled out are age, sex, weight and height):
import requests
url="http://www.shef.ac.uk/FRAX/tool.aspx?lang=en"
payload ={'ctl00$ContentPlaceHolder1$toolage':'60',
'ctl00$ContentPlaceHolder1$year':'1954',
'ctl00$ContentPlaceHolder1$month':'01',
'ctl00$ContentPlaceHolder1$day':'01',
'ctl00$ContentPlaceHolder1$sex':'female',
'ctl00$ContentPlaceHolder1$weight':'70',
'ctl00$ContentPlaceHolder1$ht':'165',
'ctl00$ContentPlaceHolder1$facture':'no',
'ctl00$ContentPlaceHolder1facture_hip$':'no',
'ctl00$ContentPlaceHolder1$smoking':'no',
'ctl00$ContentPlaceHolder1$glu':'no',
'ctl00$ContentPlaceHolder1$rhe_art':'no',
'ctl00$ContentPlaceHolder1$sec_ost':'no',
'ctl00$ContentPlaceHolder1$alcohol':'no',
'ctl00$ContentPlaceHolder1$bmd_input':'',
'ctl00$ContentPlaceHolder1$btnCalculate':'Calculate',
}
req = requests.post(url, params=payload)
with open("requests_results.html", "w") as f:
f.write(req.content)
This however does not work. I don't get en error message but the resulting saved html page (which I would later parse for the results) contains just the initial page with no resulting values. In addition to the fields in my current payload the form also contain other elements that are perhaps necessary, such as hidden elements for some of the same data types like age
<input name="ctl00$ContentPlaceHolder1$toolagehidden" type="hidden" id="ContentPlaceHolder1_toolagehidden"
I have tried different combinations of payloads, but the results are the same. Any help would be much appreciated
You want to encode the payload before the POST. Like this:
import urllib
usefulpayload = urllib.urlencode(payload)
Then use usefulpayload in your request.
Related
I have a university assignment which is to send and receive data via json on a python script via a server and to then display this in a browser with a add and search field. I am adding a students name, surname and age to the dictionary. Please accept my apologises as I'm not the best when it comes to coding.
Currently I can send the information inputted to the received script and it shows as a python dictionary on the python script. I know need to look at getting this to display in a browser e.g chrome with a fuction that can add new students but also search the data dictionary.
Im really struggling how to get the data dictionary to display in a browser. currently it shows on the received script and I can out put with a .txt file with the information.
I'm probably describing this in a rubbish way but any help would be great.
Tired to export as html instead of txt , but I can't find a way of formatting the data and adding a search function. Ive added the data dictionary part below and where it out puts the data to the receive file and a .txt file.
student[fname +" " + sname] = {#assign data to dictionary
'Student First name':fname,
'Student Last name':sname,
'Student Age':age,
'pass':passed
}
go = input("\n press x to exit OR any key to continue")
if go in ["x","X"]:
print ("\n data being sent")
jsonFile = json.dumps(student)#create json file from your dictionary
s.send(jsonFile.encode('utf-8'))
thing = False
print ("\n data sent")
time.sleep(5)
with open('student.txt', 'w') as json_file:
json.dump(student, json_file)
make a html template to take arguments and display the data. like this one.
Add the fields to search and manage the search yourself.
<html>
<title>User Data</title>
<body>
<head>Your head</head>
<p>
Available Student Data in the Database
<table>
<tr>
<td> fname</td>
<td> sname</td>
<td> age</td>
<td> passed</td>
</tr>
{% for user in users %}
<tr>
<td> {{user.fname}}</td>
<td> {{user.sname}}</td>
<td> {{user.age}}</td>
<td> {{user.passed}}</td>
</tr>
{% endfor %}
</table>
</p>
</body>
</html>
then render this html using render_template function of flask library
like this
render_template('user_data.html',users=your_user_data)
make sure that your_user_data is list of students with the specified attributes as written in html template.
As you say :
send and receive data via json on a python script via a server
So basically, you miss the server part. For your case, you'll need a Python web server that you can then connect to your Python script.
Have a look on different Python web server by yourself ;)
For a little projet like that, i recommend you Flask, but that's my opinion.
For sure, don't use Django for that.
I am trying to use requests to fill out a form on https://www.doleta.gov/tradeact/taa/taa_search_form.cfm and return the HTML of the new page that this opens and extract information from the new page.
Here is the relevant HTML
<form action="taa_search.cfm" method="post" name="number_search" id="number_search" onsubmit="return validate(this);">
<label for="input">Petition number</label>
:
<input name="input" type="text" size="7" maxlength="7" id="input">
<input type="hidden" name="form_name" value="number_search" />
<input type=submit value="Get TAA information" />
</form>
Here is the python code I am trying to use.
url = 'https://www.doleta.gov/tradeact/taa/taa_search.cfm'
payload = {'number_search':'11111'}
r = requests.get(url, params=payload)
with open("requests_results1.html", "wb") as f:
f.write(r.content)
When you perform the query manually, this page opens https://www.doleta.gov/tradeact/taa/taa_search.cfm.
However, when I use the above Python code, it returns the HTML of https://www.doleta.gov/tradeact/taa/taa_search_form.cfm (the first page) and nothing is different.
I cannot perform similar code on https://www.doleta.gov/tradeact/taa/taa_search.cfm because it redirects to the first URL and thus, running the code returns the HTML of the first URL.
Because of the permissions setup of my computer, I cannot redirect the path of my PC (which means Selenium is off the table) and I cannot install Python 2 (which means mechanize is off the table). I am open to using urllib but do not know the library very well.
I need to perform this action ~10,000 times to scrap the information. I can build the iteration part myself, but I cannot figure out how to get the base function to work properly.
The first observation is that you seem to be using a get request in your example code instead of a post request.
<form action="taa_search.cfm" method="post" ...>
^^^^^^^^^^^^^
After changing to a post request, I was still getting the same results as you though (html from the main search form page). After a bit of experimentation, I seem to be able to get the proper html results by adding a referer to the header.
Here is the code (I only commented out the writing to file part for example purposes):
import requests
BASE_URL = 'https://www.doleta.gov/tradeact/taa'
def get_case_decision(case_number):
headers = {
'referer': '{}/taa_search_form.cfm'.format(BASE_URL)
}
payload = {
'form_name': 'number_search',
'input': case_number
}
r = requests.post(
'{}/taa_search.cfm'.format(BASE_URL),
data=payload,
headers=headers
)
r.raise_for_status()
return r.text
# with open('requests_results_{}.html'.format(case_number), 'wb') as f:
# f.write(r.content)
Testing:
>>> result = get_case_decision(10000)
>>> 'MODINE MFG. COMPANY' in result
True
>>> '9/12/1980' in result
True
>>> result = get_case_decision(10001)
>>> 'MUSKIN CORPORATION' in result
True
>>> '2/27/1981' in result
True
Since you mentioned that you need to perform this ~10,000 times, you will probably want to look into using requests.Session as well.
I'm sure that it's pretty easy, but I really can understand it.
I need to write a script with python. The script has to take a link and send this it to http://archive.org/web/, to be more precise the script has to put this link to form "Save page now":
<form id='wwmform_save' name="wwmform_save" method="get" action="#" onsubmit="if (''==$('#web_save_url').val()){$('#web_save_url').attr('placeholder', 'enter a web address')} else {document.location.href='//web.archive.org/save/'+$('#web_save_url').val();} return false;" style="display:inline;">
<input id='web_save_url' class="web_input web_text" type="text" name="url" placeholder="http://" />
<button id='web_save_button' type="submit" class="web_button web_text">SAVE PAGE</button>
</form>
And get an achieved link.
I would like to use the "Requests" library, but can't understan how.
Should I make a request first?
I think I use request.post but I don't realize what parameters I have to use.
Edited:
I did like n1c9 has written, it words and saves links but also I need the link where page was saved. When send request to http://web.archive.org/save/(link) it's loadind few seconds and then redirect to the needed link.
url = 'urlyouwanttoarchive.com'
archive = 'http://web.archive.org/save/'
requests.get(archive + url)
and if you want the URL to the newly archived page:
print(archive + url)
edit: if you had a list of URLs you wanted to archive, this would work too:
urls = ['url1.com','url2.com','url3.com','url4.com']
for i in urls:
requests.get(archive + i)
I have a small .py program, rendering 2 HTML pages. One of those HTML pages has a form in it. A basic form requesting a name, and a comment. I can not figure out how to take the name and the comment from the form and store it into the csv file. I have got the coding so that the very little I already manually input into the csv file is printed/returned on the HTML page, which is one of the goals. But I can't get the data I input into the form into the csv file, then back n the HTML page. I feel like this is a simple fix, but the Flask book makes absolutely no sense to me, I'm dyslexic and I find it impossible to make sense of the examples and the written explanations.
This is the code I have for reading the csv back onto the page;
#app.route('/guestbook')
def guestbook():
with open('nameList.csv','r') as inFile:
reader=csv.reader(inFile)
names=[row for row in reader]
return render_template('guestbook.html',names=names[1:])
And this is my form coding;
<h3 class="tab">Feel free to enter your comments below</h3>
<br />
<br />
<form action="" method="get" enctype="text/plain" name="Comments Form">
<input id="namebox" type="text" maxlength="45" size="32" placeholder="Name"
class="tab"/>
<br />
<textarea id="txt1" class="textbox tab" rows="6" placeholder="Your comment"
class="tab" cols="28"></textarea>
<br />
<button class="menuitem tab" onclick="clearComment()" class="tab">Clear
comment</button>
<button class="menuitem" onclick="saveComment()" class="tab">Save comment</button>
<br>
</div>
By what I understand all you need is to save the data into the file and you don't know how to handle this in Flask, I'll try to explain it with code as clear as possible:
# request is a part of Flask's HTTP requests
from flask import request
import csv
# methods is an array that's used in Flask which requests' methods are
# allowed to be performed in this route.
#app.route('/save-comment', methods=['POST'])
def save_comment():
# This is to make sure the HTTP method is POST and not any other
if request.method == 'POST':
# request.form is a dictionary that contains the form sent through
# the HTTP request. This work by getting the name="xxx" attribute of
# the html form field. So, if you want to get the name, your input
# should be something like this: <input type="text" name="name" />.
name = request.form['name']
comment = request.form['comment']
# This array is the fields your csv file has and in the following code
# you'll see how it will be used. Change it to your actual csv's fields.
fieldnames = ['name', 'comment']
# We repeat the same step as the reading, but with "w" to indicate
# the file is going to be written.
with open('nameList.csv','w') as inFile:
# DictWriter will help you write the file easily by treating the
# csv as a python's class and will allow you to work with
# dictionaries instead of having to add the csv manually.
writer = csv.DictWriter(inFile, fieldnames=fieldnames)
# writerow() will write a row in your csv file
writer.writerow({'name': name, 'comment': comment})
# And you return a text or a template, but if you don't return anything
# this code will never work.
return 'Thanks for your input!'
I would like to use Mechanize (with Python) to submit a form, but unfortunately the page has been badly coded and the <select> element is not actually inside <form> tags.
So I can't use the traditional method via the form:
forms = [f for f in br.forms()]
mycontrol = forms[1].controls[0]
What can I do instead?
Here is the page I would like to scrape, and relevant bit of code - I'm interested in the la select item:
<fieldset class="searchField">
<label>By region / local authority</label>
<p id="regp">
<label>Region</label>
<select id="region" name="region"><option></option></select>
</p>
<p id="lap">
<label>Local authority</label>
<select id="la" name="la"><option></option></select>
</p>
<input id="byarea" type="submit" value="Go" />
<img id="regmap" src="/schools/performance/img/map_england.png" alt="Map of regions in England" border="0" usemap="#England" />
</fieldset>
This is actually more complex that you think, but still easy to implement. What is happening is that the webpage you linking is pulling in the local authorities by JSON (which is why the name="la" select element doesn't fill in Mechanize, which lacks Javascript). The easiest way around is to directly ask for this JSON data with Python and use the results to go directly to each data page.
import urllib2
import json
#The URL where we get our array of LA data
GET_LAS = 'http://www.education.gov.uk/cgi-bin/schools/performance/getareas.pl?level=la&code=0'
#The URL which we interpolate the LA ID into to get individual pages
GET_URL = 'http://www.education.gov.uk/schools/performance/geo/la%s_all.html'
def get_performance(la):
page = urllib2.urlopen(GET_URL % la)
#print(page.read())
#get the local authority list
las = json.loads(urllib2.urlopen(GET_LAS).read())
for la in las:
if la != 0:
print('Processing LA ID #%s (%s)' % (la[0], la[1]))
get_performance(la[0])
As you can see, you don't even need to load the page you linked or use Mechanize to do it! However, you will still need a way to parse out the school names and then then performance figures.