I am using FastAPI to upload a file according to the official documentation, as shown below:
#app.post("/create_file")
async def create_file(file: UploadFile = File(...)):
file2store = await file.read()
# some code to store the BytesIO(file2store) to the other database
When I send a request using Python requests library, as shown below:
f = open(".../file.txt", 'rb')
files = {"file": (f.name, f, "multipart/form-data")}
requests.post(url="SERVER_URL/create_file", files=files)
the file2store variable is always empty. Sometimes (rarely seen), it can get the file bytes, but almost all the time it is empty, so I can't restore the file on the other database.
I also tried the bytes rather than UploadFile, but I get the same results. Is there something wrong with my code, or is the way I use FastAPI to upload a file wrong?
First, as per FastAPI documentation, you need to install python-multipart—if you haven't already—as uploaded files are sent as "form data". For instance:
pip install python-multipart
The below examples use the .file attribute of the UploadFile object to get the actual Python file (i.e., SpooledTemporaryFile), which allows you to call SpooledTemporaryFile's methods, such as .read() and .close(), without having to await them. It is important, however, to define your endpoint with def in this case—otherwise, such operations would block the server until they are completed, if the endpoint was defined with async def. In FastAPI, a normal def endpoint is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server).
The SpooledTemporaryFile used by FastAPI/Starlette has the max_size attribute set to 1 MB, meaning that the data are spooled in memory until the file size exceeds 1 MB, at which point the data are written to a temporary file on disk, under the OS's temp directory. Hence, if you uploaded a file larger than 1 MB, it wouldn't be stored in memory, and calling file.file.read() would actually read the data from disk into memory. Thus, if the file is too large to fit into your server's RAM, you should rather read the file in chunks and process one chunk at a time, as described in "Read the File in chunks" section below. You may also have a look at this answer, which demonstrates another approach to upload a large file in chunks, using Starlette's .stream() method and streaming-form-data package that allows parsing streaming multipart/form-data chunks, which results in considerably minimising the time required to upload files.
If you have to define your endpoint with async def—as you might need to await for some other coroutines inside your route—then you should rather use asynchronous reading and writing of the contents, as demonstrated in this answer. Moreover, if you need to send additional data (such as JSON data) together with uploading the file(s), please have a look at this answer. I would also suggest you have a look at this answer, which explains the difference between def and async def endpoints.
Upload Single File
app.py
from fastapi import File, UploadFile
#app.post("/upload")
def upload(file: UploadFile = File(...)):
try:
contents = file.file.read()
with open(file.filename, 'wb') as f:
f.write(contents)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
file.file.close()
return {"message": f"Successfully uploaded {file.filename}"}
Read the File in chunks
As described earlier and in this answer, if the file is too big to fit into memory—for instance, if you have 8GB of RAM, you can't load a 50GB file (not to mention that the available RAM will always be less than the total amount installed on your machine, as other applications will be using some of the RAM)—you should rather load the file into memory in chunks and process the data one chunk at a time. This method, however, may take longer to complete, depending on the chunk size you choose—in the example below, the chunk size is 1024 * 1024 bytes (i.e., 1MB). You can adjust the chunk size as desired.
from fastapi import File, UploadFile
#app.post("/upload")
def upload(file: UploadFile = File(...)):
try:
with open(file.filename, 'wb') as f:
while contents := file.file.read(1024 * 1024):
f.write(contents)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
file.file.close()
return {"message": f"Successfully uploaded {file.filename}"}
Another option would be to use shutil.copyfileobj(), which is used to copy the contents of a file-like object to another file-like object (have a look at this answer too). By default, the data is read in chunks with the default buffer (chunk) size being 1MB (i.e., 1024 * 1024 bytes) for Windows and 64KB for other platforms, as shown in the source code here. You can specify the buffer size by passing the optional length parameter. Note: If negative length value is passed, the entire contents of the file will be read instead—see f.read() as well, which .copyfileobj() uses under the hood (as can be seen in the source code here).
from fastapi import File, UploadFile
import shutil
#app.post("/upload")
def upload(file: UploadFile = File(...)):
try:
with open(file.filename, 'wb') as f:
shutil.copyfileobj(file.file, f)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
file.file.close()
return {"message": f"Successfully uploaded {file.filename}"}
test.py
import requests
url = 'http://127.0.0.1:8000/upload'
file = {'file': open('images/1.png', 'rb')}
resp = requests.post(url=url, files=file)
print(resp.json())
Upload Multiple (List of) Files
app.py
from fastapi import File, UploadFile
from typing import List
#app.post("/upload")
def upload(files: List[UploadFile] = File(...)):
for file in files:
try:
contents = file.file.read()
with open(file.filename, 'wb') as f:
f.write(contents)
except Exception:
return {"message": "There was an error uploading the file(s)"}
finally:
file.file.close()
return {"message": f"Successfuly uploaded {[file.filename for file in files]}"}
Read the Files in chunks
As described earlier in this answer, if you expect some rather large file(s) and don't have enough RAM to accommodate all the data from the beginning to the end, you should rather load the file into memory in chunks, thus processing the data one chunk at a time (Note: adjust the chunk size as desired, below that is 1024 * 1024 bytes).
from fastapi import File, UploadFile
from typing import List
#app.post("/upload")
def upload(files: List[UploadFile] = File(...)):
for file in files:
try:
with open(file.filename, 'wb') as f:
while contents := file.file.read(1024 * 1024):
f.write(contents)
except Exception:
return {"message": "There was an error uploading the file(s)"}
finally:
file.file.close()
return {"message": f"Successfuly uploaded {[file.filename for file in files]}"}
or, using shutil.copyfileobj():
from fastapi import File, UploadFile
from typing import List
import shutil
#app.post("/upload")
def upload(files: List[UploadFile] = File(...)):
for file in files:
try:
with open(file.filename, 'wb') as f:
shutil.copyfileobj(file.file, f)
except Exception:
return {"message": "There was an error uploading the file(s)"}
finally:
file.file.close()
return {"message": f"Successfuly uploaded {[file.filename for file in files]}"}
test.py
import requests
url = 'http://127.0.0.1:8000/upload'
files = [('files', open('images/1.png', 'rb')), ('files', open('images/2.png', 'rb'))]
resp = requests.post(url=url, files=files)
print(resp.json())
#app.post("/create_file/")
async def image(image: UploadFile = File(...)):
print(image.file)
# print('../'+os.path.isdir(os.getcwd()+"images"),"*************")
try:
os.mkdir("images")
print(os.getcwd())
except Exception as e:
print(e)
file_name = os.getcwd()+"/images/"+image.filename.replace(" ", "-")
with open(file_name,'wb+') as f:
f.write(image.file.read())
f.close()
file = jsonable_encoder({"imagePath":file_name})
new_image = await add_image(file)
return {"filename": new_image}
Related
I am trying to download a large file (.tar.gz) from FastAPI backend. On server side, I simply validate the filepath, and I then use Starlette.FileResponse to return the whole file—just like what I've seen in many related questions on StackOverflow.
Server side:
return FileResponse(path=file_name, media_type='application/octet-stream', filename=file_name)
After that, I get the following error:
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 149, in serialize_response
return jsonable_encoder(response_content)
File "/usr/local/lib/python3.10/dist-packages/fastapi/encoders.py", line 130, in jsonable_encoder
return ENCODERS_BY_TYPE[type(obj)](obj)
File "pydantic/json.py", line 52, in pydantic.json.lambda
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I also tried using StreamingResponse, but got the same error. Any other ways to do it?
The StreamingResponse in my code:
#x.post("/download")
async def download(file_name=Body(), token: str | None = Header(default=None)):
file_name = file_name["file_name"]
# should be something like xx.tar
def iterfile():
with open(file_name,"rb") as f:
yield from f
return StreamingResponse(iterfile(),media_type='application/octet-stream')
Ok, here is an update to this problem.
I found the error did not occur on this api, but the api doing forward request of this.
#("/")
def f():
req = requests.post(url ="/download")
return req.content
And here if I returned a StreamingResponse with .tar file, it led to (maybe) encoding problems.
When using requests, remember to set the same media-type. Here is media_type='application/octet-stream'. And it works!
If you find yield from f being rather slow when using StreamingResponse with file-like objects, you could instead create a generator where you read the file in chunks using a specified chunk size; hence, speeding up the process. Examples can be found below.
Note that StreamingResponse can take either an async generator or a normal generator/iterator to stream the response body. In case you used the standard open() method that doesn't support async/await, you would have to declare the generator function with normal def. Regardless, FastAPI/Starlette will still work asynchronously, as it will check whether the generator you passed is asynchronous (as shown in the source code), and if is not, it will then run the generator in a separate thread, using iterate_in_threadpool, that is then awaited.
You can set the Content-Disposition header in the response (as described in this answer, as well as here and here) to indicate if the content is expected to be displayed inline in the browser (if you are streaming, for example, a .mp4 video, .mp3 audio file, etc), or as an attachment that is downloaded and saved locally (using the specified filename).
As for the media_type (also known as MIME type), there are two primary MIME types (see Common MIME types):
text/plain is the default value for textual files. A textual file should be human-readable and must not contain binary data.
application/octet-stream is the default value for all other cases. An unknown file type should use this type.
For a file with .tar extension, as shown in your question, you can also use a different subtype from octet-stream, that is, x-tar. Otherwise, if the file is of unknown type, stick to application/octet-stream. See the linked documentation above for a list of common MIME types.
Option 1 - Using normal generator
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
CHUNK_SIZE = 1024 * 1024 # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()
#app.get('/')
def main():
def iterfile():
with open(some_file_path, 'rb') as f:
while chunk := f.read(CHUNK_SIZE):
yield chunk
headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')
Option 2 - Using async generator with aiofiles
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import aiofiles
CHUNK_SIZE = 1024 * 1024 # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()
#app.get('/')
async def main():
async def iterfile():
async with aiofiles.open(some_file_path, 'rb') as f:
while chunk := await f.read(CHUNK_SIZE):
yield chunk
headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')
I am trying to upload an mp4 video file using UploadFile in FastAPI.
However, the uploaded format is not readable by OpencCV (cv2).
This is my endpoint:
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import PlainTextResponse
#app.post("/video/test", response_class=PlainTextResponse)
async def detect_faces_in_video(video_file: UploadFile):
contents = await video_file.read()
print(type(video_file)) # <class 'starlette.datastructures.UploadFile'>
print(type(contents)) # <class 'bytes'>
return ""
and the two file formats (i.e., bytes and UploadFile) are not readable by OpenCV.
You are trying to pass either the file contents (bytes) or UploadFile object; however, VideoCapture() accepts either a video filename, capturing device or or an IP video stream.
UploadFile is basically a SpooledTemporaryFile (a file-like object) that operates similar to a TemporaryFile. However, it does not have a visible name in the file system. As you mentioned that you wouldn't be keeping the files on the server after processing them, you could copy the file contents to a NamedTemporaryFile that "has a visible name in the file system, which can be used to open the file" (using the name attribute), as described here and here. As per the documentation:
Whether the name can be used to open the file a second time, while the
named temporary file is still open, varies across platforms (it can be
so used on Unix; it cannot on Windows). If delete is true (the
default), the file is deleted as soon as it is closed.
Hence, on Windows you need to set the delete argument to False when instantiating a NamedTemporaryFile, and once you are done with it, you can manually delete it, using the os.remove() or os.unlink() method.
Below are given two options on how to do that. Option 1 implements a solution using a def endpoint, while Option 2 uses an async def endpoint (utilising the aiofiles library). For the difference between def and async def, please have a look at this answer. If you are expecting users to upload rather large files in size that wouldn't fit into memory, have a look at this and this answer on how to read the uploaded video file in chunks instead.
Option 1 - Using def endpoint
from fastapi import FastAPI, File, UploadFile
from tempfile import NamedTemporaryFile
import os
#app.post("/video/detect-faces")
def detect_faces(file: UploadFile = File(...)):
temp = NamedTemporaryFile(delete=False)
try:
try:
contents = file.file.read()
with temp as f:
f.write(contents);
except Exception:
return {"message": "There was an error uploading the file"}
finally:
file.file.close()
res = process_video(temp.name) # Pass temp.name to VideoCapture()
except Exception:
return {"message": "There was an error processing the file"}
finally:
#temp.close() # the `with` statement above takes care of closing the file
os.remove(temp.name)
return res
Option 2 - Using async def endpoint
from fastapi import FastAPI, File, UploadFile
from tempfile import NamedTemporaryFile
from fastapi.concurrency import run_in_threadpool
import aiofiles
import asyncio
import os
#app.post("/video/detect-faces")
async def detect_faces(file: UploadFile = File(...)):
try:
async with aiofiles.tempfile.NamedTemporaryFile("wb", delete=False) as temp:
try:
contents = await file.read()
await temp.write(contents)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
await file.close()
res = await run_in_threadpool(process_video, temp.name) # Pass temp.name to VideoCapture()
except Exception:
return {"message": "There was an error processing the file"}
finally:
os.remove(temp.name)
return res
I'm using wave.open to open file , if i give local path
async def run_test(uri):
async with websockets.connect(uri) as websocket:
wf = wave.open('test.wav', "rb")
then it is working but if i give external path it is not working
async def run_test(uri):
async with websockets.connect(uri) as websocket:
wf = wave.open('http://localhost:8000/storage/uploads/test.wav', "rb")
getting this error :
OSError: [Errno 22] Invalid argument:
'http://localhost:8000/storage/uploads/test.wav'
Yeah, wave.open() doesn't know anything about HTTP.
You'll need to download the file first, e.g. with requests (or aiohttp or httpx since you're in async land).
import io, requests, wave
resp = requests.get('http://localhost:8000/storage/uploads/test.wav')
resp.raise_for_status()
bio = io.BytesIO() # file-like memory buffer
bio.write(resp.content) # todo: if file can be large, maybe use streaming
bio.seek(0) # seek to the start of the file
wf = wave.open(bio, "rb") # wave.open accepts file-like objects
This assumes the file is small enough to fit in memory; if it's not, you'd want to use tempfile.NamedTemporaryFile instead.
browser using vue element-ui el-upload component to upload file,and aiohttp as backend receive form data then save it.but aiohttp request.multipart() was always blank but request.post() will be ok.
vue:
<el-upload class="image-uploader"
:data="dataObj"
drag
name="aaa"
:multiple="false"
:show-file-list="false"
:action="action" -> upload url,passed from outer component
:on-success="handleImageScucess">
<i class="el-icon-upload"></i>
</el-upload>
export default {
name: 'singleImageUpload3',
props: {
value: String,
action: String
},
methods: {
handleImageScucess(file) {
this.emitInput(file.files.file)
},
}
aiohttp: not work
async def post_image(self, request):
reader = await request.multipart()
image = await reader.next()
print (image.text())
filename = image.filename
print (filename)
size = 0
with open(os.path.join('', 'aaa.jpg'), 'wb') as f:
while True:
chunk = await image.read_chunk()
print ("chunk", chunk)
if not chunk:
break
size += len(chunk)
f.write(chunk)
return await self.reply_ok([])
aiohttp: work
async def post_image(self, request):
data = await request.post()
print (data)
mp3 = data['aaa']
filename = mp3.filename
mp3_file = data['aaa'].file
content = mp3_file.read()
with open('aaa.jpg', 'wb') as f:
f.write(content)
return await self.reply_ok([])
browser console:
bug or anything i missed ? please help me to solve it,thanks in advance.
I think you might have check the example at aiohttp document about file upload server. But that snippet is ambiguous, and the document doesn't explain itself very well.
After digging around its source code for a while, I found that request.multipart() actually yields a MultipartReader instance, which processes multipart/form-data requests one field each time on calling of .next(), yielding another BodyPartReader instance.
In your not working code, image = await reader.next() that line actually reads out one entire field from the form data, which you can't be sure which field it actually is. It might be the token field, the key field, the filename field, the aaa field... or any of them. So in your not working example, that post_image coroutine function will only process one single field from your requesting data, and you cannot be very sure that it is the aaa file field.
Here's my code snippet,
async def post_image(self, request):
# Iterate through each field of MultipartReader
async for field in (await request.multipart()):
if field.name == 'token':
# Do something about token
token = (await field.read()).decode()
pass
if field.name == 'key':
# Do something about key
pass
if field.name == 'filename':
# Do something about filename
pass
if field.name == 'aaa':
# Process any files you uploaded
filename = field.filename
# In your example, filename should be "2C80...jpg"
# Deal with actual file data
size = 0
with open(os.path.join('', filename), 'wb') as fd:
while True:
chunk = await field.read_chunk()
if not chunk:
break
size += len(chunk)
fd.write(chunk)
# Reply ok, all fields processed successfully
return await self.reply_ok([])
And the snippet above can also deal with multiple files in single request with duplicate field name, or 'aaa' in your example. The filename in Content-Disposition header should be automatically filled in by the browser itself, so no need to worry about filename.
BTW, when dealing with file uploads in requests, data = await request.post() will eat up considerable amount of memory to load file data. So request.post() should be avoided when involving file uploads, use request.multipart() instead.
I'm experimenting with Content-Disposition on tornado. My code for reading and writing of file looks like this:
with open(file_name, 'rb') as f:
while True:
data = f.read(4096)
if not data:
break
self.write(data)
self.finish()
I expected the memory usage to be consistent since it is not reading everything at once. But the resource monitor shows:
In use Available
12.7 GB 2.5GB
Sometimes it will even BSOD my computer...
How do I download a large file (say 12GB in size)?
tornado 6.0 provide a api download large file may use like below:
import aiofiles
async def get(self):
self.set_header('Content-Type', 'application/octet-stream')
# the aiofiles use thread pool,not real asynchronous
async with aiofiles.open(r"F:\test.xyz","rb") as f:
while True:
data = await f.read(1024)
if not data:
break
self.write(data)
# flush method call is import,it makes low memory occupy,beacuse it send it out timely
self.flush()
just use aiofiles but not use the self.flush() may not solve the trouble.
just look at the method self.write():
def write(self, chunk: Union[str, bytes, dict]) -> None:
"""Writes the given chunk to the output buffer.
To write the output to the network, use the `flush()` method below.
If the given chunk is a dictionary, we write it as JSON and set
the Content-Type of the response to be ``application/json``.
(if you want to send JSON as a different ``Content-Type``, call
``set_header`` *after* calling ``write()``).
Note that lists are not converted to JSON because of a potential
cross-site security vulnerability. All JSON output should be
wrapped in a dictionary. More details at
http://haacked.com/archive/2009/06/25/json-hijacking.aspx/ and
https://github.com/facebook/tornado/issues/1009
"""
if self._finished:
raise RuntimeError("Cannot write() after finish()")
if not isinstance(chunk, (bytes, unicode_type, dict)):
message = "write() only accepts bytes, unicode, and dict objects"
if isinstance(chunk, list):
message += (
". Lists not accepted for security reasons; see "
+ "http://www.tornadoweb.org/en/stable/web.html#tornado.web.RequestHandler.write" # noqa: E501
)
raise TypeError(message)
if isinstance(chunk, dict):
chunk = escape.json_encode(chunk)
self.set_header("Content-Type", "application/json; charset=UTF-8")
chunk = utf8(chunk)
self._write_buffer.append(chunk)
at the ending of the code:it just append the data you want send to the _write_buffer.
the data would be sent when the get or post method has finished and the finish method be called.
the document about tornado 's handler flush is :
http://www.tornadoweb.org/en/stable/web.html?highlight=flush#tornado.web.RequestHandler.flush