I am trying to use the requests library in Python to upload a file into Fedora commons repository on localhost. I'm fairly certain my main problem is not understanding open() / read() and what I need to do to send data with an http request.
def postBinary(fileName,dirPath,url):
path = dirPath+'/'+fileName
print('to ' + url + '\n' + path)
openBin = {'file':(fileName,open(path,'rb').read())}
headers = {'Slug': fileName} #not important
r = requests.put(url, files=openBin,headers=headers, auth=HTTPBasicAuth('username', 'pass'))
print(r.text)
print("and the url used:")
print(r.url)
This will successfully upload a file in the repository, but it will be slightly larger and corrupted after. For example an image that was 6.6kb became 6.75kb and was not openable anymore.
So how should I properly open and upload a file using put in python?
###Extra details:###
When I replace files=openBin with data=openBin I end up with my dictionary and I presume the data as a string. I don't know if that information is helpful or not.
"file=FILE_NAME.extension&file=TYPE89a%24%02Q%03%E7%FF%00E%5B%19%FC%....
and the size of the file increases to a number of megabytes
I am using specifically put because the Fedora RESTful HTTP API end point says to use put.
The following command does work:
curl -u username:password -H "Content-Type: text/plain" -X PUT -T /path/to/someFile.jpeg http://localhost:8080/fcrepo/rest/someFile.jpeg
Updated
Using requests.put() with the files parameter sends a multipart/form-data encoded request which the server does not seem to be able to handle without corrupting the data, even when the correct content type is declared.
The curl command simply performs a PUT with the raw data contained in the body of the request. You can create a similar request by passing the file data in the data parameter. Specify the content type in the header:
headers = {'Content-type': 'image/jpeg', 'Slug': fileName}
r = requests.put(url, data=open(path, 'rb'), headers=headers, auth=('username', 'pass'))
You can vary the Content-type header to suit the payload as required.
Try setting the Content-type for the file.
If you are sure that it is a text file then try text/plain which you used in your curl command - even though you would appear to be uploading a jpeg file? However, for a jpeg image, you should use image/jpeg.
Otherwise for arbitrary binary data you can use application/octet-stream:
openBin = {'file': (fileName, open(path,'rb'), 'image/jpeg' )}
Also it is not necessary to explicitly read the file contents in your code, requests will do that for you, so just pass the open file handle as shown above.
Related
;TLDR
I want to send a file with requests.send() using multipart/form-data request without storing the file on a hard drive. Basically, I'm looking for an alternative for open() function for bytes object
Hello, I'm currently trying to send multipart/form-data request and pass in-memory files in it, but I can't figure out how to do that.
My app receives images from one source and sends them to another. Currently it sends get request directly to file, (e.g. requests.get('https://service.com/test.jpeg')), reads image's bytes and writes them into new file on the hard drive. The sending code that works looks like this:
def send_file(path_to_image: str)
url = get_upload_link()
data = {'photo': open(path_to_image, 'rb')}
r = requests.post(url, files=data)
send_file("test.jpeg")
The main issue I have with this approach is that I have to keep files on my hard drive. Sure, I can use my drive as some sort of a "temporary buffer" and just delete them after I no longer need these files, but I believe there's much more simple way to do that.
I want my function to receive bytes object and then send it. I actually tried doing that, but the backend doesn't accept them. Here's what I tried to do
Attempt 1
def send_file(image: bytes)
url = get_upload_link()
data = {'photo': open(image, 'rb')}
r = requests.post(url, files=data)
I get "ValueError: embedded null byte"
Attempt 2
def upload_photo(image: bytes):
url = get_upload_link()
file = BytesIO(image)
data = {'photo': file}
r = requests.post(url, files=data)
Backend server doesn't process my files correctly. It's like passing files=None, same response
I also tried:
sending the returning value of the methods: file.getbuffer() and file.read()
file.write(image) and then sending file
StringsIO object
etc.
Final notes
I noticed, that open() returns _io.BufferedReader object. I also looked for a way to construct its instance, but couldn't fund a way. Can someone help me, please?
UPD:
If anyone is interested, the receiving api is this
From the official documentation:
POST a Multipart-Encoded File
...
If you want, you can send strings to
be received as files:
url = 'https://httpbin.org/post'
files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
r = requests.post(url, files=files)
I have a GitLab API (v4) that I need to call to get a project sub-directory (something apparently new in v.14.4, it seems not yet included python-gitlab libs), which in curl can be done with the following command:
curl --header "PRIVATE-TOKEN: A_Token001" http://192.168.156.55/api/v4/projects/10/repository/archive?path=ProjectSubDirectory --output ~./temp/ProjectSubDirectory.tar.gz
The issue is in the last part, the --output ~./GitLab/some_project_files/ProjectSubDirectory.tar.gz
I tried different methods (.content, .text) which failed, as:
...
response = requests.get(url=url, headers=headers, params=params).content
# and save the respon content with with open(...)
but in all the cases it saved a non-valid tar.gz file, or other issues.
I even tried https://curlconverter.com/, but the code it generates does not work as well, it seems ignoring precisely the --output parameter, not showing anything about the file itself:
headers = {'PRIVATE-TOKEN': 'A_Token001',}
params = (('path', 'ProjectSubDirectory'),)
response = requests.get('http://192.168.156.55/api/v4/projects/10/repository/archive', headers=headers, params=params)
For now, I just created a script and call it with sub-process, but I don't like much this approach due to Python has libraries, as requests, that I guess should have some way to do the same...
2 key things.
Allow redirects
Use raise_for_status() to make sure the request was successful before writing the file. This will help uncover other potential issues, like failed authentication.
After that write response.content to a file opened in binary mode for writing ('wb')
import requests
url = "https://..."
headers = {} # ...
paramus = {} # ...
output_path = 'path/to/local/file.tar.gz'
response = requests.get(url, headers=headers, params=params, allow_redirects=True)
response.raise_for_status() # make sure the request is successful
with open(output_path, 'wb') as f:
f.write(response.content)
I'm trying to upload a zipfile to a Server using Python requests. The upload works fine. However the uploaded file cannot be opened using Windows Explorer or ark. I suppose there's some problem with mime-type or content-Length.
Oddly, uploading the file using curl, does not seem to cause the same problem.
Here is my python code for the request:
s = requests.Session()
headers = {'Content-Type': 'application/zip'}
zip = open('file.zip', 'rb')
files = {'file': ('file.zip', zip, 'application/zip')}
fc = {'Content-Disposition': 'attachment; filename=file.zip'}
headers.update(fc)
r = requests.Request('POST', url, files=files, headers=headers, auth=(user, password))
prepared = r.prepare()
resp = s.send(prepared)
This is the curl code, which works flawlessly:
curl -X POST \
-ik \
-u user:password \
--data-binary '#file.zip' \
-H 'Content-Type: application/zip' \
-H "Content-Disposition: attachment; filename=file.zip" \
url
Uploading the file works in both, the Server also seems to recognize the content-type. However the file is rendered invalid when re-downloading. The zifile is readable before sending via requests or after sending with normal curl, using --data-binary.
Opening the downloaded zifile with unip or file-roller works either way.
EDIT:
I was uploading two files successively. Oddly the error was fixed when uploading the exact same files in reverse order.
This has NOT been a python problem. When trying with standard curl
I must have accidentally reversed the order, which is why it has been working.
I can not explain this behavior nor do I have a fix for it.
In conclusion: Uploading the bigger file first did the trick.
All of the above seems to be applicable in curl, pycurl and python requests, so I assume it's some kind of bug in one of the curl libraries.
I would like to make a POST request to upload a file to a web service (and get response) using Python. For example, I can do the following POST request with curl:
curl -F "file=#style.css" -F output=json http://jigsaw.w3.org/css-validator/validator
How can I make the same request with python urllib/urllib2? The closest I got so far is the following:
with open("style.css", 'r') as f:
content = f.read()
post_data = {"file": content, "output": "json"}
request = urllib2.Request("http://jigsaw.w3.org/css-validator/validator", \
data=urllib.urlencode(post_data))
response = urllib2.urlopen(request)
I got a HTTP Error 500 from the code above. But since my curl command succeeds, it must be something wrong with my python request?
I am quite new to this topic and my question may have very simple answers or mistakes.
Personally I think you should consider the requests library to post files.
url = 'http://jigsaw.w3.org/css-validator/validator'
files = {'file': open('style.css')}
response = requests.post(url, files=files)
Uploading files using urllib2 is not impossible but quite a complicated task: http://pymotw.com/2/urllib2/#uploading-files
After some digging around, it seems this post solved my problem. It turns out I need to have the multipart encoder setup properly.
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2
register_openers()
with open("style.css", 'r') as f:
datagen, headers = multipart_encode({"file": f})
request = urllib2.Request("http://jigsaw.w3.org/css-validator/validator", \
datagen, headers)
response = urllib2.urlopen(request)
Well, there are multiple ways to do it. As mentioned above, you can send the file in "multipart/form-data". However, the target service may not be expecting this type, in which case you may try some more approaches.
Pass the file object
urllib2 can accept a file object as data. When you pass this type, the library reads the file as a binary stream and sends it out. However, it will not set the proper Content-Type header. Moreover, if the Content-Length header is missing, then it will try to access the len property of the object, which doesn't exist for the files. That said, you must provide both the Content-Type and the Content-Length headers to have the method working:
import os
import urllib2
filename = '/var/tmp/myfile.zip'
headers = {
'Content-Type': 'application/zip',
'Content-Length': os.stat(filename).st_size,
}
request = urllib2.Request('http://localhost', open(filename, 'rb'),
headers=headers)
response = urllib2.urlopen(request)
Wrap the file object
To not deal with the length, you may create a simple wrapper object. With just a little change you can adapt it to get the content from a string if you have the file loaded in memory.
class BinaryFileObject:
"""Simple wrapper for a binary file for urllib2."""
def __init__(self, filename):
self.__size = int(os.stat(filename).st_size)
self.__f = open(filename, 'rb')
def read(self, blocksize):
return self.__f.read(blocksize)
def __len__(self):
return self.__size
Encode the content as base64
Another way is encoding the data via base64.b64encode and providing Content-Transfer-Type: base64 header. However, this method requires support on the server side. Depending on the implementation, the service can either accept the file and store it incorrectly, or return HTTP 400. E.g. the GitHub API won't throw an error, but the uploaded file will be corrupted.
I'm tying to write some simple app on python3 and tornado for server, and requests for client, and I'm getting some headers in 'self.request.body', which I can't dispose of. For instance, for file 'blahblahblah', I get:
--cb5f6ba84bdf42d382dfd3204f6307c7\r\nContent-Disposition: form-data; name="file"; filename="1.bin"\r\n\r\nblahblahblah\n\r\n--cb5f6ba84bdf42d382dfd3204f6307c7--\r\n
Files are sent by
f = {'file': open(FILE, 'rb')}
requests.post(URL_UPLOAD, files=f)
and received by
class UploadHandler(tornado.web.RequestHandler):
def post(self, filename):
with open(Dir + filename, 'wb') as f:
f.write(self.request.body)
My full code can be seen here
When I send the file by curl with curl -X POST -d $(cat ./1.bin) http://localhost:8080/upload/1.bin I get the correct file, but without \n.
There must be something I missed. Please can someone help me with that? Thank You.
There are two ways to upload files: simply using the file as the request body (usually, but not necessarily, with the HTTP PUT method), or using a multipart wrapper (usually with the HTTP POST method). If you upload the file from an HTML form, it will usually use the multipart wrapper. Your requests example is using a multipart wrapper and the curl one is not; your server is not expecting the wrapper.
To use a multipart wrapper: in requests, pass files= as you've done here. With curl, see this answer: Using curl to upload POST data with files. On the server, use self.request.files instead of self.request.body: http://www.tornadoweb.org/en/stable/httpserver.html#tornado.httpserver.HTTPRequest.files
To not use the multipart wrapper, use data=open(FILE, 'rb').read() from requests, and keep the other two components the same.
It is possible to support both styles simultaneously on the server: use self.requests.files when self.request.headers['Content-Type'] == 'multipart/form-data' and self.request.body otherwise.