Upload a large XML file with Python Requests library - python

I'm trying to replace curl with Python & the requests library. With curl, I can upload a single XML file to a REST server with the curl -T option. I have been unable to do the same with the requests library.
A basic scenario works:
payload = '<person test="10"><first>Carl</first><last>Sagan</last></person>'
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=payload, headers=headers, auth=HTTPDigestAuth("*", "*"))
When I change payload to a bigger string by opening an XML file, the .put method hangs (I use the codecs library to get a proper unicode string). For example, with a 66KB file:
xmlfile = codecs.open('trb-1996-219.xml', 'r', 'utf-8')
headers = {'content-type': 'application/xml'}
content = xmlfile.read()
r = requests.put(url, data=content, headers=headers, auth=HTTPDigestAuth("*", "*"))
I've been looking into using the multipart option (files), but the server doesn't seem to like that.
So I was wondering if there is a way to simulate curl -T behaviour in Python requests library.
UPDATE 1:
The program hangs in textmate, but throws an UnicodeEncodeError error on the commandline. Seems that must be the problem. So the question would be: is there a way to send unicode strings to a server with the requests library?
UPDATE 2:
Thanks to the comment of Martijn Pieters the UnicodeEncodeError went away, but a new issue turned up.
With a literal (ASCII) XML string, logging shows the following lines:
2012-11-11 15:55:05,154 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:55:05,294 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:55:05,430 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 201 0
Seems the server always bounces the first authentication attempt (?) but then accepts the second one.
With a file object (open('trb-1996-219.xml', 'rb')) passed to data, the logfile shows:
2012-11-11 15:50:54,309 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:50:55,105 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:51:25,603 WARNING Retrying (0 attempts remain) after connection broken by 'BadStatusLine("''",)': /v1/documents?uri=/example/test.xml
So, first attempt is blocked as before, but no second attempt is made.
According to Martijn Pieters (below), the second issue can be explained by a faulty server (empty line).
I will look into this, but if someone has a workaround (apart from using curl) I wouldn't mind hearing it.
And I am still surprised that the requests library behaves so differently for small string and file object. Isn't the file object serialized before it gets to the server anyway?

To PUT large files, don't read them into memory. Simply pass the file as the data keyword:
xmlfile = open('trb-1996-219.xml', 'rb')
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=xmlfile, headers=headers, auth=HTTPDigestAuth("*", "*"))
Moreover, you were opening the file as unicode (decoding it from UTF-8). As you'll be sending it to a remote server, you need raw bytes, not unicode values, and you should open the file as a binary instead.

Digest authentication always requires you to make at least two request to the server. The first request doesn't contain any authentication data. This first request will fail with a 401 "Authorization required" response code and a digest challenge (called a nounce) to be used for hashing your password etc. (the exact details don't matter here). This is used to make a second request to the server containing your credentials hashed with the challenge.
The problem is in the this two step authentication: your large file was already send with the first unauthorized request (send in vain) but on the second request the file object is already at the EOF position. Since the file size was also send in the Content-length header of the second request, this causes the server to wait for a file that will never be send.
You could solve it using a requests Session and first make a simple request for authentication purposes (say a GET request). Then make a second PUT request containing the actual payload using the same digest challenge form the first request.
sess = requests.Session()
sess.auth = HTTPDigestAuth("*", "*")
sess.get(url)
headers = {'content-type': 'application/xml'}
with codecs.open('trb-1996-219.xml', 'r', 'utf-8') as xmlfile:
sess.put(url, data=xmlfile, headers=headers)

i used requests in python to upload an XML file using the commands.
first to open the file use open()
file = open("PIR.xsd")
fragment = file.read()
file.close()
copy the data of XML file in the payload of the requests and post it
payload = {'key':'PFAkrzjmuZR957','xmlFragment':fragment}
r = requests.post(URL,data=payload)
to check the html validation code
print (r.text)

Related

Error when parsing body of an HTTP request using Python Request lib

I have tested some requests inside the Postman app. First, I want to get the body information of an HTTP request inside Python (package requests used). The response appears positive with 200 OK.
response = session.request("POST", url, headers=headers, data=payload, verify ='custom-proxy-ca.crt')
Now I would like to get the body with
body = response.content
Print(body) delivers
b'\x83\x84\x01\x00\xc4\xff\xd4\xe9\xb4\xf6\xde,\x13\xa9\xc0(\xc7_\x8dL\x90\xf0\xb4K\xc4<\xe7\xb1M\x02)\xe0\x80z\xd0\xdf>\xcf\xd7\xd2\xec\x8d\x1e\xe4un\x0c\x83\xa1\x88g\xe7fah\x89\xbe\xca\xa8\x04_\xa2W\xbd\xfe]W\xd1\x06\x1f\xef~ZN\xa6\x0bq\xfa\x18\xc4\x1f\xb3\xf8\xc2\x9dF\xc5\xf0\xe6\x8d\xb6\xc1\xa0\xab\x7f\xfbyM\xe0\x88I\xb4\xd4\x82\xa1%\xd9R7Nt\xa4~<\x8c\x8e\xdb\xe7<xx-.\xab\xa7|16\xcb"\xba\x89\xbc\xe7\xcaF\xd1\xacV-u\xbf\xaa\x04\xf7\xa2\x88\xa1\x1bUI\xdfkI$`\x18:j\x7fU\x02\x0e\xcb\x97\x8em\xc6\x81\xe6\x85\xbe\xa5\xb9vbjQ$}M&n\xe0$A\xe0\xd9\xd2\xc6\x9aA\xf4\x12\x81/1\x0c\xf0(\x0cy\xf5\xaf\xca\x1bQ\x1082\xa1\xb4n4VRR\xbb7\xa5XO\x08\x0c\x13\xf2:\xc0-\x06\xa9\xda\xaeGX\x97B\x81!\x17\x87\xfa\xd1\x1b\xc0\xd0\x89|\xe8E\x0f\rp\xfd\x00\x96\xeaI\xbe\xda\xbb\xe3\x87\xc7\xdb\x9b\xfd\xab\xe8\xc7\xdd\x0cEL-x\xe0\x9bVhY\x0cT\x08\x95S\xa3\xfd\xdc\xe3\x81/1\x9d\x9e\'T\xf6\xe0pl\xd33#0,T}X%\x04\x0e\xd7r\xfd\x10\x0cs\xe90\x05\xe8\xe8\xf8\xea\xfc\xe5\xf8\xe1\xfd\xb9\xea\xe7\xe0\xc0\x9a!\xa1\\M\xa8\x9d\x9f\xe4\xa2\x07_\xae\xd7\x0c\xdd\xb8\xaa\xbf\xe9\xfc\x1a|\x89^\xf59\x81\xe3J\x91\xa4v(\xff7J1\x1ao\x9c\x89\xa1#0\xf4\xaa\xa0\xc7\xbc\xea\x9f\xae\xa6\xe8\xa9-T\xc9#\xd1\x81\x7f\xee\x9a\xbb\xfd\x87\xc3\xe3+|K\xe2\xfdPe\xa0\xaa\x9d\x18\xf0\xcc\xc0\xf10\x80\xca\xb0XuW\x9d\xcc\xc0\xa5\xc8;bP\xdd\x9d\x1aeC\xfd\xf84\xa6\x14yG\xeb\xb5\x01\x03'
Now I try to search a token in the body, but it seems to be encrypted.
If I want to get the result of the JSON parser with
json.loads(body)
it returns
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte`.
Okay, it seems that the encoding is done in a different way than expected. But how did the Postman app do the decoding of the body? For example, I can read it there parsed as JSON (see the figure below). What am I doing wrong in Python?
Request
Okay, the problem is solved, but I want to share with you how to deal with this kind of problem.
The initial problem is to call the HTTP POST request with the header parameter Accept-Encoding like
'Accept-Encoding': 'gzip, deflate, br'
This line of code means: Locally can receive data in compressed format.
The server compresses the large file and sends it back to the client during processing. After receiving the IE, the IE performs a local pressure on the file.
The reason for the error is: the program did not extract the file
Solution: delete this line of code and it works

How to send entire http request from file?

I am writing a small application that interprets the http response of a request. I am writing the application in python. I have not found anything that allows me to send the body + headers stored in one file. I can send certain parts like the headers but not the entire request.
For example, if the request is:
GET /index.html HTTP/1.1
Host: localhost
Cookie: bob=lemon
I want to send this entire request in one go. How would I do this in python?
Check out the python requests library. https://requests.readthedocs.io/en/master/user/quickstart/#make-a-request
For the request above it would look something like
import requests
url = 'http://localhost:[YOUR PORT HERE]/'
cookies = {bob : lemon}
r = requests.get(url, cookies=cookies)
To check if you had a successful request you should get a 200 code from.
r.status_code
Check out the library for more, it is very extensive.

Python Post Getting couldn't determine the boundary from the message

I have a python program in which I am trying to do a post sending file contents as a string. I am doing the following:
header 'content-type':'multipart/form-data'
data {file1: filedata}
filedata is a string I build by reading a file putting the contents in the string
when I make the post call I get a 500 back and looking in the catalina log I see the error "couldn't determine the boundary from the message".
the thing is when I put this in insomnia it works properly and the catalina log shows the header had ; boundary=X-INSOMNIA-BOUNDARY appended to the content-type I defined
Why would this work for insomnia but not when I do it in python using requests? This is my requests call (auth is set to None)
response = requests.post(url, data=data, headers=headers, auth=auth, timeout=REQUEST_TIMEOUT, verify=False)
headers = 'content-type':'multipart/form-data
data = {'timepunch': 'datastring'}
I found my problem - when sending a file I need the request to be files-data instead of data=data - not sure exactly what happens in the background but this resolved my issue

Send data through POST request

I had been using sockets, with Python, for some time ago and I'm trying to understand why this POST which should send some data on fields data1 and data2 do not work.
POST /method.php HTTP/1.1\r\nHost: localhost\r\nContent-Type: multipart/form-data\r\n\r\ndata1=something&data2= otherthing\r\n\r\n
What is the problem with this request?
There are several things wrong with your request:
POST /method.php HTTP/1.1
Host: localhost
Content-Type: multipart/form-data
data1=something&data2= otherthing
First, whenever a body is used within a HTTP request the length of the body must be known. This is typically done by given the length up-front with Content-length in the HTTP header although also chunked encoding might be used if the full length is not known up front. Your request does not do any of these which means the request is an invalid HTTP request.
Additionally you claim a Content-Type of multipart/form-data although your body is not of this type. With multipart/form-data your body would consist of several MIME parts separated by a text boundary and this boundary would need to have been declared in your Content-type header. The correct type for the body you show would be instead application/x-www-form-urlencoded.
Even with application/x-www-form-urlencoded the body is partly wrong. This type of body should be only pairs of key=value concatenated by &, i.e. there should be neither as space after a key as you have after data2= nor there should be new lines added after the end of the data as you have.
When removing all these problems you should probably send the following request:
body = "data1=something&data2=otherthing"
request = ("POST /method.php HTTP/1.1\r\n" + \
"Host: localhost\r\n" + \
"Content-Type: application/x-www-form-urlencoded\r\n" + \
"Content-Length: %d\r\n" + \
"\r\n%s") % (len(body),body)
But once you have send this request the trouble continues since getting the response correctly is complex too. Generally I recommend to not code your own HTTP handling unless you really know what you do but instead use existing libraries. While HTTP might look simple when just looking at a few example requests it is way more complex than it initially looks. And while your code might seem to work against specific servers it might fail with other servers.
It might be easier to use the requests library so your code would look something like this:
import requests
# Data
data = {
'data1':'something',
'data2':'otherthing'
}
# Custom headers
headers = {
'content-type': 'multipart/form-data'
}
# Get response from server
response = requests.post('http://localhost/', data=data, headers=headers)
# If you care about the response
print(response.json())
You can also send files and a whole lot of other stuff
Have you tried using the Requests library instead, example of a post request below
import requests
header = {"Content-Type": "multipart/form-data"}
data1="something"
data2= "otherthing"
session_requests = requests.session()
result = session_requests.post("http://localhost/", data=dict(data1, data2), headers=header)

Python URLLib / URLLib2 POST

I'm trying to create a super-simplistic Virtual In / Out Board using wx/Python. I've got the following code in place for one of my requests to the server where I'll be storing the data:
data = urllib.urlencode({'q': 'Status'})
u = urllib2.urlopen('http://myserver/inout-tracker', data)
for line in u.readlines():
print line
Nothing special going on there. The problem I'm having is that, based on how I read the docs, this should perform a Post Request because I've provided the data parameter and that's not happening. I have this code in the index for that url:
if (!isset($_POST['q'])) { die ('No action specified'); }
echo $_POST['q'];
And every time I run my Python App I get the 'No action specified' text printed to my console. I'm going to try to implement it using the Request Objects as I've seen a few demos that include those, but I'm wondering if anyone can help me explain why I don't get a Post Request with this code. Thanks!
-- EDITED --
This code does work and Posts to my web page properly:
data = urllib.urlencode({'q': 'Status'})
h = httplib.HTTPConnection('myserver:8080')
headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}
h.request('POST', '/inout-tracker/index.php', data, headers)
r = h.getresponse()
print r.read()
I am still unsure why the urllib2 library doesn't Post when I provide the data parameter - to me the docs indicate that it should.
u = urllib2.urlopen('http://myserver/inout-tracker', data)
h.request('POST', '/inout-tracker/index.php', data, headers)
Using the path /inout-tracker without a trailing / doesn't fetch index.php. Instead the server will issue a 302 redirect to the version with the trailing /.
Doing a 302 will typically cause clients to convert a POST to a GET request.

Categories

Resources