I try to upload a file on a random website using Python and HTTP requests. For this, I use the handy library named Requests.
According to the documentation, and some answers on StackOverflow here and there, I just have to add a files parameter in my application, after studying the DOM of the web page.
The method is simple:
Look in the source code for the URL of the form ("action" attribute);
Look in the source code for the "name" attribute of the uploading
button ;
Look in the source code for the "name" and "value" attributes of the submit form button ;
Complete the Python code with the required parameters.
Sometimes this works fine. Indeed, I managed to upload a file on this site : http://pastebin.ca/upload.php
After looking in the source code, the URL of the form is upload.php, the buttons names are file and s, the value is Upload, so I get the following code:
url = "http://pastebin.ca/upload.php"
myFile = open("text.txt","rb")
r = requests.get(url,data={'s':'Upload'},files={'file':myFile})
print r.text.find("The uploaded file has been accepted.")
# ≠ -1
But now, let's look at that site: http://www.pictureshack.us/
The corresponding code is as follows:
url = "http://www.pictureshack.us/index2.php"
myFile = open("text.txt","rb")
r = requests.get(url,data={'Upload':'upload picture'},files={'userfile':myFile})
print r.text.find("Unsupported File Type!")
# = -1
In fact, the only difference I see between these two sites is that for the first, the URL where the work is done when submitting the form is the same as the page where the form is and where the files are uploaded.
But that does not solve my problem, because I still do not know how to submit my file in the second case.
I tried to make my request on the main page instead of the .php, but of course it does not work.
In addition, I have another question.
Suppose that some form elements do not have "name" attribute. How am I supposed to designate it at my request with Python?
For example, this site: http://imagesup.org/
The submitting form button looks like this: <input type="submit" value="Héberger !">
How can I use it in my data parameters?
The forms have another component you must honour: the method attribute. You are using GET requests, but the forms you are referring to use method="post". Use requests.post to send a POST request.
Related
I am fairly new to html & JSON and am struggling a little with extracting the data I am after in a usable format within Python on a Raspberry Pi project.
I am using a device which outputs some live data over a wifi link in the format of a html page. Although the data shown on the page can be changed, I am only really concerned with getting data from a single page for now. When viewed in Notepad ++ the page looks like:
<!DOCTYPE html>
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252"><style>.b{position:absolute;top:0;bottom:0;left:0;right:0;height:100%;background-color:#000;height:auto !important;}.f{border-radius: 10px;font-weight:bold;position:absolute;top:50%;left:0;right:0;margin:auto;background:#024d27;padding:50px;box-sizing:border-box;color:#FF0;margin:30px;box-shadow:0px 2px 18px -4px #0F0;transform:translateY(-50%);}#V{font-size:96px;}#U{font-size: 56px;}#N{font-size: 36px;}</style></head><body><div class="b"><div class="f"><span id="N">Voltage</span><br><span id="V">12.53</span> <span id="U">V</span><br></div></div><script>reqData();setInterval(reqData, 200);function reqData() {var xhr = new XMLHttpRequest();xhr.onload = function() {if (this.status == 200) {var data = JSON.parse(xhr.responseText);document.getElementById('N').innerHTML = data.n;document.getElementById('V').innerHTML = data.v;document.getElementById('U').innerHTML = data.u;} else {document.getElementById('N').innerHTML = "?";document.getElementById('V').innerHTML = "?";document.getElementById('U').innerHTML = "?";}};xhr.open('GET', 'readVal', true);xhr.send();}</script></body></html>
As you can see, it is a fairly simple page which just provides the information I am trying to extract, presented in a Green box with Yellow text on a black background.
From staring at the info a little, the information I am trying to extract is that associated with Span ID = 'V' (voltage), 'N' (name) and 'U' (units).
The data is displayed live on the webpage (i.e. updates every 200ms (i think) without refreshing the page) and I would like to extract the values as frequently as possible.
I have tried a few different blocks of code/methods and this seems to be the only one which I am currently able to gain any success with:
import urllib.request, json, html
data = urllib.request.urlopen("http://192.168.4.1").read()
print (data)
This returns me the html source code for the page correctly (albeit with a delay of about 5seconds which may just be related to the low spec of the Pi Zero i am running it on).
However, I dont seem able to extract the JSON data from within this. I have tried:
data_json = json.loads(data)
but this gives me a JSONDecodeError: expecting value: line 1 column 1 (char 0) which I am assuming is because the 'data' is a mix of HTML code and JSON still. I have also noticed that the actual variable information I am trying to retrieve (Voltage, 12.53 & V from the example source page at the top) are just shown as '?' placeholders when I open the page using urllib rather than loading the actual value shown on the page.
Is anyone able to offer me any pointers at all please?
Thanks in advance,
Steve
As you've noticed from the error message and the raw HTML code, the result you're getting from your device isn't json data, it's html with javascript. It looks like the HTML you posted does an ajax request (a javascript GET request) to some local endpoint (/readVal perhaps?).
Try opening http://192.168.4.1 in your browser, open dev tools, and observe what network requests the page makes under the hood - specifically, look for some XHR requests. Look at the request URL and response - I bet you'll find some local endpoint that returns the raw json data you want.
Or just try http://192.168.4.1/readVal and see if that's it.
So i am calling out urls i.e. "domain.xyz" from a .csv file. The purpose is to use the requests module to GET/HEAD responses. Using this code as a work around to add a string.
x = "http://www."+str('domain.com')
response = requests.head(x)
The problem here is not all "domain.com" entries in my .csv start with standard http://www.. What's the best way to complete the URL before using the requests module?
p.s. I am looking for something similar to what Chromes address bar does to complete a url. For instance when we enter 'abc.com'. it completes it to "http://www.abc.xyz".
I'm working on a chess related project for which I have to download a very large quantity of files from ChessTempo.
When running the following code:
import urllib.request
url = "http://chesstempo.com/requests/download_game_pgn.php?gameids="
for i in range (3,500):
urllib.request.urlretrieve(url + str(i),'Games/Game ' + str(i) + ".pgn")
print("Downloaded file nº " + str(i))
I get the expected list of 500~ files but they are all blank except the second and third files, which have the correct data in them.
When I open the URLs by hand, it all works perfectly. What am I missing?
In fact, I can only download files 2 & 3, all others are empty...
Were you logged in while accessing those files "manually"? (Which I assume to be using a web browser).
If so, FYI an http request does not only consist of the URL, lots of other information is transfered. So if you are not getting the same information, you are almost certainly not making the same request.
In chrome you can see the requests you make within a page.
From Developer Tools go to Network > Select a name form the list > Request Headers (See picture)
The most probable thing that you may be looking for are the cookies
Hope it helps.
I have a file, gather.htm which is a valid HTML file with header/body and forms. If I double click the file on the Desktop, it properly opens in a web browser, auto-submits the form data (via <SCRIPT LANGUAGE="Javascript">document.forms[2].submit();</SCRIPT>) and the page refreshes with the requested data.
I want to be able to have Python make a requests.post(url) call using gather.htm. However, my research and my trail-and-error has provided no solution.
How is this accomplished?
I've tried things along these lines (based on examples found on the web). I suspect I'm missing something simple here!
myUrl = 'www.somewhere.com'
filename='/Users/John/Desktop/gather.htm'
f = open (filename)
r = requests.post(url=myUrl, data = {'title':'test_file'}, files = {'file':f})
print r.status_code
print r.text
And:
htmfile = 'file:///Users/John/Desktop/gather.htm'
files = {'file':open('gather.htm')}
webbrowser.open(url,new=2)
response = requests.post(url)
print response.text
Note that in the 2nd example above, the webbrowser.open() call works correctly but the requests.post does not.
It appears that everything I tried failed in the same way - the URL is opened and the page returns default data. It appears the website never receives the gather.htm file.
Since your request is returning 200 OK, there is nothing wrong getting your post request to the server. It's hard to give you an exact answer, but the problem lies with how the server is handling the request. Either your post request is being formatted in a way that the server doesn't recognise, or the server hasn't been set up to deal with them at all. If you're managing the website yourself, some additional details would help.
Just as a final check, try the following:
r = requests.post(url=myUrl, data={'title':'test_file', 'file':f})
I am new to python's pyramid framework so kindly help me.
I have a HTML dynamically generated. This HTML is generated by a python script which dynamically writes (tags/tables) which are extracted from some 'xyz.html' [using beautifulsoup] to another 'abc.html'.
Now i need to send this html page ('abc.html') back as a 'Response' object of 'pyramid.response' .
how can i do this. I tried the following
_resp = Response()
_resp.headerlist = [('Content-type',"text/html; charset=UTF-8'"\]
_resp.app_iter = open('abc.html','r')
return _resp
and also
with open('abc.html','r') as f:
data = f.read()
f.close()
return Response(data,content_type='text/html')
both did not work.
PS: I cannot use renderer="package:subpack/abc.html" or any similar renderer as this generated html is stored in a dynamically generated location everytime so i cannot guess the final storage location of this html file.
Thanks in advance for you help.
I'm a little surprised your first example doesn't work. Check out this cookbook entry on it from the Pyramid docs and see if that helps.
http://docs.pylonsproject.org/projects/pyramid_cookbook/en/latest/static_assets/files.html#serving-file-content-dynamically