I'm using the python poster library to try to upload a form containing including an image to a servlet. Locally, it runs fine, but when I deploy to app engine, it doesn't recognize it as multipart content.
ServletFileUpload.isMultipartContent(request) returns false
Here's how I'm using the poster library:
register_openers()
datagen, headers = multipart_encode({"image": open(filename)})
request = urllib2.Request(url, datagen, headers)
The servlet checks to make sure it is Multipart, but it fails that check. What can I do to further debug?
Thanks,
jean
*******update*********
printing out the stack trace...here's what i get. It complains the content type header isnull
org.apache.commons.fileupload.FileUploadBase$InvalidContentTypeException: the request doesn't contain a multipart/form-data or multipart/mixed stream, content type header is null
at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.(FileUploadBase.java:885)
at org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349)
at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
If you're on Windows (or a pedant;-), open(filename) is the wrong way to open a binary file and might mess things up -- use open(filename, 'rb'). Apart from that, assuming of course that you continue with a urllib2.urlopen(request) which you've omitted, that your imports are correct, and that filename and url are properly set previously, then your code seems legit.
Related
;TLDR
I want to send a file with requests.send() using multipart/form-data request without storing the file on a hard drive. Basically, I'm looking for an alternative for open() function for bytes object
Hello, I'm currently trying to send multipart/form-data request and pass in-memory files in it, but I can't figure out how to do that.
My app receives images from one source and sends them to another. Currently it sends get request directly to file, (e.g. requests.get('https://service.com/test.jpeg')), reads image's bytes and writes them into new file on the hard drive. The sending code that works looks like this:
def send_file(path_to_image: str)
url = get_upload_link()
data = {'photo': open(path_to_image, 'rb')}
r = requests.post(url, files=data)
send_file("test.jpeg")
The main issue I have with this approach is that I have to keep files on my hard drive. Sure, I can use my drive as some sort of a "temporary buffer" and just delete them after I no longer need these files, but I believe there's much more simple way to do that.
I want my function to receive bytes object and then send it. I actually tried doing that, but the backend doesn't accept them. Here's what I tried to do
Attempt 1
def send_file(image: bytes)
url = get_upload_link()
data = {'photo': open(image, 'rb')}
r = requests.post(url, files=data)
I get "ValueError: embedded null byte"
Attempt 2
def upload_photo(image: bytes):
url = get_upload_link()
file = BytesIO(image)
data = {'photo': file}
r = requests.post(url, files=data)
Backend server doesn't process my files correctly. It's like passing files=None, same response
I also tried:
sending the returning value of the methods: file.getbuffer() and file.read()
file.write(image) and then sending file
StringsIO object
etc.
Final notes
I noticed, that open() returns _io.BufferedReader object. I also looked for a way to construct its instance, but couldn't fund a way. Can someone help me, please?
UPD:
If anyone is interested, the receiving api is this
From the official documentation:
POST a Multipart-Encoded File
...
If you want, you can send strings to
be received as files:
url = 'https://httpbin.org/post'
files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
r = requests.post(url, files=files)
Here is my FileUploadView class to handle POST request of an uploaded file. The file I am expecting are XML Files in which I use ElementTree to parse through it in fileHandler(). However, when using Postman to send a file through using ('form-data'), I realized that it is attaching some type of header to my uploaded file, which in turn causes the tree parse() to have a syntax error since its reading something that is not of an XML format.
I tried using HTTPie to send the file through, which worked with no issue, the XML Parser parsed it correctly and entered the data into the expected Object.
I then tried to do some TestCases with Django, and tried to test the fileupload. Which caused the parser to have a syntax error again due to having a header attached to the file once more.
class UploadTest(APITestCase):
def test_file_upload(self):
c = Client()
with open("/Users/Ren/Desktop/Capstone/Backend/projectB/VMA/testing/Test.xml") as fp:
c.post('/upload/TestXML.xml', {'filename' : 'Test.xml', 'attachment': fp})
My question is: What is causing that header to pop up/be added onto the uploaded file. I'm guessing it has something to do with how I am sending the post request through Postman and the Django TestCase which is different to HTTPie
view.py
class FileUploadView(APIView):
parser_classes = (FileUploadParser,)
def post(self, request, filename, format=None):
print(request.FILES)
file_obj = request.FILES['file']
fileHandler(file_obj)
return Response(status=204)
FileReader.py
def fileHandler(file):
filepath = file.temporary_file_path()
print(file.read())
tree = ET.parse(filepath)
root = tree.getroot()
XML File and output when calling file.read()
XML I need to read in (Expected Output):
<site host="192.168.212.4" name="http://192.168.212.4" port="80" ssl="false"><alerts><alertitem>\n <pluginid>10021</pluginid>\n <alert>X-Content-Type-Options header missing</alert>\n <riskcode>1</riskcode>\n <reliability>2</reliability>\n <riskdesc>Low (Warning)</riskdesc>\n <desc>The Anti-MIME-Sniffing header X-Content-Type-Options was not set to \'nosniff\'.\n\tThis allows older versions of Internet Explorer and Chrome to perform MIME-sniffing on the response body, potentially causing the response body to be interpreted and displayed as a content type other than the declared content type.\n\tCurrent (early 2014) and legacy versions of Firefox will use the declared content type (if one is set), rather than performing MIME-sniffing.\n\t</desc>\n <uri>http://192.168.212.4/</uri>\n <param/>\n <attack/>\n <otherinfo/>\n <solution>Ensure that the application/web server sets the Content-Type header appropriately, and that it sets the X-Content-Type-Options header to \'nosniff\' for all web pages.\n\tIf possible, ensure that the end user uses a standards-compliant and modern web browser that does not perform MIME-sniffing at all, or that can be directed by the web application/web server to not perform MIME-sniffing.\n\t</solution>\n <reference>\n\t</reference>\n</alertitem>
The Output when running request.FILES['file'].read() --- Current Output
b'----------------------------507481440966899800347275\r\nContent-Disposition: form-data; name=""; filename="sampleXML.xml"\r\nContent-Type: application/xml\r\n\r\n<site host="192.168.212.4" name="http://192.168.212.4" port="80" ssl="false"><alerts><alertitem>\n <pluginid>10021</pluginid>\n <alert>X-Content-Type-Options header missing</alert>\n <riskcode>1</riskcode>\n <reliability>2</reliability>\n <riskdesc>Low (Warning)</riskdesc>\n <desc>The Anti-MIME-Sniffing header X-Content-Type-Options was not set to \'nosniff\'.\n\tThis allows older versions of Internet Explorer and Chrome to perform MIME-sniffing on the response body, potentially causing the response body to be interpreted and displayed as a content type other than the declared content type.\n\tCurrent (early 2014) and legacy versions of Firefox will use the declared content type (if one is set), rather than performing MIME-sniffing.\n\t</desc>\n <uri>http://192.168.212.4/</uri>\n <param/>\n <attack/>\n <otherinfo/>\n <solution>Ensure that the application/web server sets the Content-Type header appropriately, and that it sets the X-Content-Type-Options header to \'nosniff\' for all web pages.\n\tIf possible, ensure that the end user uses a standards-compliant and modern web browser that does not perform MIME-sniffing at all, or that can be directed by the web application/web server to not perform MIME-sniffing.\n\t</solution>\n <reference>\n\t</reference>\n</alertitem>\n\n \r\n----------------------------507481440966899800347275--\r\n'
Contains the unnecessary: b'----------------------------507481440966899800347275\r\nContent-Disposition: form-data; name=""; filename="sampleXML.xml"\r\nContent-Type: application/xml\r\n\r\n
I played around with code for a bit and made a tiny change into the testCase:
class UploadTest(APITestCase):
def test_file_upload(self):
c = Client()
with open("/Users/Ren/Desktop/Capstone/Backend/projectB/VMA/testing/Test.xml") as fp:
c.post('/upload/TestXML.xml', {'filename' : 'Test.xml', 'attachment': fp})
I changed the
{'filename' : 'Test.xml', 'attachment': fp}
to
{'filename' : b'Test.xml', 'attachment': fp}
I remember reading it somewhere, unfortunately I do not remember where but... turning the file into "bytes" fixed it...
I am trying to use the requests library in Python to upload a file into Fedora commons repository on localhost. I'm fairly certain my main problem is not understanding open() / read() and what I need to do to send data with an http request.
def postBinary(fileName,dirPath,url):
path = dirPath+'/'+fileName
print('to ' + url + '\n' + path)
openBin = {'file':(fileName,open(path,'rb').read())}
headers = {'Slug': fileName} #not important
r = requests.put(url, files=openBin,headers=headers, auth=HTTPBasicAuth('username', 'pass'))
print(r.text)
print("and the url used:")
print(r.url)
This will successfully upload a file in the repository, but it will be slightly larger and corrupted after. For example an image that was 6.6kb became 6.75kb and was not openable anymore.
So how should I properly open and upload a file using put in python?
###Extra details:###
When I replace files=openBin with data=openBin I end up with my dictionary and I presume the data as a string. I don't know if that information is helpful or not.
"file=FILE_NAME.extension&file=TYPE89a%24%02Q%03%E7%FF%00E%5B%19%FC%....
and the size of the file increases to a number of megabytes
I am using specifically put because the Fedora RESTful HTTP API end point says to use put.
The following command does work:
curl -u username:password -H "Content-Type: text/plain" -X PUT -T /path/to/someFile.jpeg http://localhost:8080/fcrepo/rest/someFile.jpeg
Updated
Using requests.put() with the files parameter sends a multipart/form-data encoded request which the server does not seem to be able to handle without corrupting the data, even when the correct content type is declared.
The curl command simply performs a PUT with the raw data contained in the body of the request. You can create a similar request by passing the file data in the data parameter. Specify the content type in the header:
headers = {'Content-type': 'image/jpeg', 'Slug': fileName}
r = requests.put(url, data=open(path, 'rb'), headers=headers, auth=('username', 'pass'))
You can vary the Content-type header to suit the payload as required.
Try setting the Content-type for the file.
If you are sure that it is a text file then try text/plain which you used in your curl command - even though you would appear to be uploading a jpeg file? However, for a jpeg image, you should use image/jpeg.
Otherwise for arbitrary binary data you can use application/octet-stream:
openBin = {'file': (fileName, open(path,'rb'), 'image/jpeg' )}
Also it is not necessary to explicitly read the file contents in your code, requests will do that for you, so just pass the open file handle as shown above.
I have the following code:
r = requests.put(
config.get('webdav', 'url') + file_name,
auth=(
config.get('webdav', 'username'),
config.get('webdav', 'password')
),
files={
"files": open(os.path.expanduser(charges_file_path), 'rb')
}
)
Which is fairly straightforward. It simply calls a PUT request to a webdav server, and pushes the data that is in files (plain text) to the server.
It works, except for a strange (or maybe not so strange if I am just missing something small) issue. When I do a GET on the file, or the file is viewed on the server directly, the file itself contains header information:
--55e72d74a10b423590cd4faa68212192
Content-Disposition: form-data; name="files"; filename="test_file6.txt"
(file_data)
--55e72d74a10b423590cd4faa68212192--
I haven't been able to find a reason or way around this. When I cURL the file from command line, it works fine.
Any ideas?
I am not really familiar with how Python requests works, but after reading through some docs and finding a similar issue someone had with sending files to Zendesk (this post), you might want to try using the data (or json) parameter instead of files in your request. Also, maybe attaching a params with filename if that's applicable here as well similar to the post I linked.
Another thing to do would be to put a Content-Type header on this request.
i.e.
requests.put(
...,
headers={'Content-Type': 'application/binary'},
data=open(os.path.expanduser(charges_file_path), 'rb').read()
)
I'm trying to replace curl with Python & the requests library. With curl, I can upload a single XML file to a REST server with the curl -T option. I have been unable to do the same with the requests library.
A basic scenario works:
payload = '<person test="10"><first>Carl</first><last>Sagan</last></person>'
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=payload, headers=headers, auth=HTTPDigestAuth("*", "*"))
When I change payload to a bigger string by opening an XML file, the .put method hangs (I use the codecs library to get a proper unicode string). For example, with a 66KB file:
xmlfile = codecs.open('trb-1996-219.xml', 'r', 'utf-8')
headers = {'content-type': 'application/xml'}
content = xmlfile.read()
r = requests.put(url, data=content, headers=headers, auth=HTTPDigestAuth("*", "*"))
I've been looking into using the multipart option (files), but the server doesn't seem to like that.
So I was wondering if there is a way to simulate curl -T behaviour in Python requests library.
UPDATE 1:
The program hangs in textmate, but throws an UnicodeEncodeError error on the commandline. Seems that must be the problem. So the question would be: is there a way to send unicode strings to a server with the requests library?
UPDATE 2:
Thanks to the comment of Martijn Pieters the UnicodeEncodeError went away, but a new issue turned up.
With a literal (ASCII) XML string, logging shows the following lines:
2012-11-11 15:55:05,154 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:55:05,294 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:55:05,430 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 201 0
Seems the server always bounces the first authentication attempt (?) but then accepts the second one.
With a file object (open('trb-1996-219.xml', 'rb')) passed to data, the logfile shows:
2012-11-11 15:50:54,309 INFO Starting new HTTP connection (1): my.ip.address
2012-11-11 15:50:55,105 DEBUG "PUT /v1/documents?uri=/example/test.xml HTTP/1.1" 401 211
2012-11-11 15:51:25,603 WARNING Retrying (0 attempts remain) after connection broken by 'BadStatusLine("''",)': /v1/documents?uri=/example/test.xml
So, first attempt is blocked as before, but no second attempt is made.
According to Martijn Pieters (below), the second issue can be explained by a faulty server (empty line).
I will look into this, but if someone has a workaround (apart from using curl) I wouldn't mind hearing it.
And I am still surprised that the requests library behaves so differently for small string and file object. Isn't the file object serialized before it gets to the server anyway?
To PUT large files, don't read them into memory. Simply pass the file as the data keyword:
xmlfile = open('trb-1996-219.xml', 'rb')
headers = {'content-type': 'application/xml'}
r = requests.put(url, data=xmlfile, headers=headers, auth=HTTPDigestAuth("*", "*"))
Moreover, you were opening the file as unicode (decoding it from UTF-8). As you'll be sending it to a remote server, you need raw bytes, not unicode values, and you should open the file as a binary instead.
Digest authentication always requires you to make at least two request to the server. The first request doesn't contain any authentication data. This first request will fail with a 401 "Authorization required" response code and a digest challenge (called a nounce) to be used for hashing your password etc. (the exact details don't matter here). This is used to make a second request to the server containing your credentials hashed with the challenge.
The problem is in the this two step authentication: your large file was already send with the first unauthorized request (send in vain) but on the second request the file object is already at the EOF position. Since the file size was also send in the Content-length header of the second request, this causes the server to wait for a file that will never be send.
You could solve it using a requests Session and first make a simple request for authentication purposes (say a GET request). Then make a second PUT request containing the actual payload using the same digest challenge form the first request.
sess = requests.Session()
sess.auth = HTTPDigestAuth("*", "*")
sess.get(url)
headers = {'content-type': 'application/xml'}
with codecs.open('trb-1996-219.xml', 'r', 'utf-8') as xmlfile:
sess.put(url, data=xmlfile, headers=headers)
i used requests in python to upload an XML file using the commands.
first to open the file use open()
file = open("PIR.xsd")
fragment = file.read()
file.close()
copy the data of XML file in the payload of the requests and post it
payload = {'key':'PFAkrzjmuZR957','xmlFragment':fragment}
r = requests.post(URL,data=payload)
to check the html validation code
print (r.text)