Is there a way to download a file through FastAPI? The files we want are located in an Azure Datalake and retrieving them from the lake is not an issue, the problem occurs when we try to get the bytes we get from the datalake down to a local machine.
We have tried using different modules in FastAPI such as starlette.responses.FileResponse and fastapi.Response with no luck.
In Flask this is not an issue and can be done in the following manner:
from io import BytesIO
from flask import Flask
from werkzeug import FileWrapper
flask_app = Flask(__name__)
#flask_app.route('/downloadfile/<file_name>', methods=['GET'])
def get_the_file(file_name: str):
the_file = FileWrapper(BytesIO(download_file_from_directory(file_name)))
if the_file:
return Response(the_file, mimetype=file_name, direct_passthrough=True)
When running this with a valid file name the file automatically downloads. Is there equivalent way to this in FastAPI?
Solved
After some more troubleshooting I found a way to do this.
from fastapi import APIRouter, Response
router = APIRouter()
#router.get('/downloadfile/{file_name}', tags=['getSkynetDL'])
async def get_the_file(file_name: str):
# the_file object is raw bytes
the_file = download_file_from_directory(file_name)
if the_file:
return Response(the_file)
So after a lot of troubleshooting and hours of looking through documentation, this was all it took, simply returning the bytes as Response(the_file).
After some more troubleshooting I found a way to do this.
from fastapi import APIRouter, Response
router = APIRouter()
#router.get('/downloadfile/{file_name}', tags=['getSkynetDL'])
async def get_the_file(file_name: str):
# the_file object is raw bytes
the_file = download_file_from_directory(file_name)
if the_file:
return Response(the_file)
So after a lot of troubleshooting and hours of looking through documentation, this was all it took, simply returning the bytes as Response(the_file) with no extra parameters and no extra formatting for the raw bytes object.
As far as I know, you need to set media_type to the adequate type. I did that with some code a year ago and it worked fine.
#app.get("/img/{name}")
def read(name: str, access_token_cookie: str=Cookie(None)):
r = internal.get_data(name)
if r is None:
return RedirectResponse(url="/static/default.png")
else:
return Response(content=r["data"], media_type=r["mime"])
r is a dictionary with the data as raw bytes and mime the type of the data as given by PythonMagick.
To add a custom filename to #Markus's answer, in case your api's path doesn't end with a neat filename or you want to determine a custom filename from server side and give to the user:
from fastapi import APIRouter, Response
router = APIRouter()
#router.get('/downloadfile/{file_name}', tags=['getSkynetDL'])
async def get_the_file(file_name: str):
# the_file object is raw bytes
the_file = download_file_from_directory(file_name)
filename1 = make_filename(file_name) # a custom filename
headers1 = {'Content-Disposition': f'attachment; filename="{filename1}"'}
if the_file:
return Response(the_file, headers=headers1)
Related
I am trying to download a large file (.tar.gz) from FastAPI backend. On server side, I simply validate the filepath, and I then use Starlette.FileResponse to return the whole fileājust like what I've seen in many related questions on StackOverflow.
Server side:
return FileResponse(path=file_name, media_type='application/octet-stream', filename=file_name)
After that, I get the following error:
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 149, in serialize_response
return jsonable_encoder(response_content)
File "/usr/local/lib/python3.10/dist-packages/fastapi/encoders.py", line 130, in jsonable_encoder
return ENCODERS_BY_TYPE[type(obj)](obj)
File "pydantic/json.py", line 52, in pydantic.json.lambda
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I also tried using StreamingResponse, but got the same error. Any other ways to do it?
The StreamingResponse in my code:
#x.post("/download")
async def download(file_name=Body(), token: str | None = Header(default=None)):
file_name = file_name["file_name"]
# should be something like xx.tar
def iterfile():
with open(file_name,"rb") as f:
yield from f
return StreamingResponse(iterfile(),media_type='application/octet-stream')
Ok, here is an update to this problem.
I found the error did not occur on this api, but the api doing forward request of this.
#("/")
def f():
req = requests.post(url ="/download")
return req.content
And here if I returned a StreamingResponse with .tar file, it led to (maybe) encoding problems.
When using requests, remember to set the same media-type. Here is media_type='application/octet-stream'. And it works!
If you find yield from f being rather slow when using StreamingResponse with file-like objects, you could instead create a generator where you read the file in chunks using a specified chunk size; hence, speeding up the process. Examples can be found below.
Note that StreamingResponse can take either an async generator or a normal generator/iterator to stream the response body. In case you used the standard open() method that doesn't support async/await, you would have to declare the generator function with normal def. Regardless, FastAPI/Starlette will still work asynchronously, as it will check whether the generator you passed is asynchronous (as shown in the source code), and if is not, it will then run the generator in a separate thread, using iterate_in_threadpool, that is then awaited.
You can set the Content-Disposition header in the response (as described in this answer, as well as here and here) to indicate if the content is expected to be displayed inline in the browser (if you are streaming, for example, a .mp4 video, .mp3 audio file, etc), or as an attachment that is downloaded and saved locally (using the specified filename).
As for the media_type (also known as MIME type), there are two primary MIME types (see Common MIME types):
text/plain is the default value for textual files. A textual file should be human-readable and must not contain binary data.
application/octet-stream is the default value for all other cases. An unknown file type should use this type.
For a file with .tar extension, as shown in your question, you can also use a different subtype from octet-stream, that is, x-tar. Otherwise, if the file is of unknown type, stick to application/octet-stream. See the linked documentation above for a list of common MIME types.
Option 1 - Using normal generator
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
CHUNK_SIZE = 1024 * 1024 # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()
#app.get('/')
def main():
def iterfile():
with open(some_file_path, 'rb') as f:
while chunk := f.read(CHUNK_SIZE):
yield chunk
headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')
Option 2 - Using async generator with aiofiles
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import aiofiles
CHUNK_SIZE = 1024 * 1024 # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()
#app.get('/')
async def main():
async def iterfile():
async with aiofiles.open(some_file_path, 'rb') as f:
while chunk := await f.read(CHUNK_SIZE):
yield chunk
headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')
I am trying to upload an mp4 video file using UploadFile in FastAPI.
However, the uploaded format is not readable by OpencCV (cv2).
This is my endpoint:
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import PlainTextResponse
#app.post("/video/test", response_class=PlainTextResponse)
async def detect_faces_in_video(video_file: UploadFile):
contents = await video_file.read()
print(type(video_file)) # <class 'starlette.datastructures.UploadFile'>
print(type(contents)) # <class 'bytes'>
return ""
and the two file formats (i.e., bytes and UploadFile) are not readable by OpenCV.
You are trying to pass either the file contents (bytes) or UploadFile object; however, VideoCapture() accepts either a video filename, capturing device or or an IP video stream.
UploadFile is basically a SpooledTemporaryFile (a file-like object) that operates similar to a TemporaryFile. However, it does not have a visible name in the file system. As you mentioned that you wouldn't be keeping the files on the server after processing them, you could copy the file contents to a NamedTemporaryFile that "has a visible name in the file system, which can be used to open the file" (using the name attribute), as described here and here. As per the documentation:
Whether the name can be used to open the file a second time, while the
named temporary file is still open, varies across platforms (it can be
so used on Unix; it cannot on Windows). If delete is true (the
default), the file is deleted as soon as it is closed.
Hence, on Windows you need to set the delete argument to False when instantiating a NamedTemporaryFile, and once you are done with it, you can manually delete it, using the os.remove() or os.unlink() method.
Below are given two options on how to do that. Option 1 implements a solution using a def endpoint, while Option 2 uses an async def endpoint (utilising the aiofiles library). For the difference between def and async def, please have a look at this answer. If you are expecting users to upload rather large files in size that wouldn't fit into memory, have a look at this and this answer on how to read the uploaded video file in chunks instead.
Option 1 - Using def endpoint
from fastapi import FastAPI, File, UploadFile
from tempfile import NamedTemporaryFile
import os
#app.post("/video/detect-faces")
def detect_faces(file: UploadFile = File(...)):
temp = NamedTemporaryFile(delete=False)
try:
try:
contents = file.file.read()
with temp as f:
f.write(contents);
except Exception:
return {"message": "There was an error uploading the file"}
finally:
file.file.close()
res = process_video(temp.name) # Pass temp.name to VideoCapture()
except Exception:
return {"message": "There was an error processing the file"}
finally:
#temp.close() # the `with` statement above takes care of closing the file
os.remove(temp.name)
return res
Option 2 - Using async def endpoint
from fastapi import FastAPI, File, UploadFile
from tempfile import NamedTemporaryFile
from fastapi.concurrency import run_in_threadpool
import aiofiles
import asyncio
import os
#app.post("/video/detect-faces")
async def detect_faces(file: UploadFile = File(...)):
try:
async with aiofiles.tempfile.NamedTemporaryFile("wb", delete=False) as temp:
try:
contents = await file.read()
await temp.write(contents)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
await file.close()
res = await run_in_threadpool(process_video, temp.name) # Pass temp.name to VideoCapture()
except Exception:
return {"message": "There was an error processing the file"}
finally:
os.remove(temp.name)
return res
I am writing a python post request with a bytes body:
with open('srt_file.srt', 'rb') as f:
data = f.read()
res = requests.post(url='http://localhost:8000/api/parse/srt',
data=data,
headers={'Content-Type': 'application/octet-stream'})
And in the server part, I tried to parse the body:
app = FastAPI()
BaseConfig.arbitrary_types_allowed = True
class Data(BaseModel):
data: bytes
#app.post("/api/parse/{format}", response_model=CaptionJson)
async def parse_input(format: str, data: Data) -> CaptionJson:
...
However, I got the 422 error:
{"detail":[{"loc":["body"],"msg":"value is not a valid dict","type":"type_error.dict"}]}
So where is wrong with my code, and how should I fix it?
Thank you all in advance for helping out!!
FastAPI by default will expect you to pass json which will parse into a dict. It can't do that if it's isn't json, which is why you get the error you see.
You can use the Request object instead to receive arbitrary bytes from the POST body.
from fastapi import FastAPI, Request
app = FastAPI()
#app.get("/foo")
async def parse_input(request: Request):
data: bytes = await request.body()
# Do something with data
You might consider using Depends which will allow you to clean up your route function. FastAPI will first call your dependency function (parse_body in this example) and will inject that as an argument into your route function.
from fastapi import FastAPI, Request, Depends
app = FastAPI()
async def parse_body(request: Request):
data: bytes = await request.body()
return data
#app.get("/foo")
async def parse_input(data: bytes = Depends(parse_body)):
# Do something with data
pass
If the endgoal of your request is to only send bytes then please look at the documentation of FastAPI to accept bytes-like objects: https://fastapi.tiangolo.com/tutorial/request-files.
There is no need for the bytes to be enclosed into a model.
I'd like to http-send (e.g. requests.post) image files from servers/clients (with the requests lib) and receive/save these files with a django rest framework app. How do I do this?
Secondly I'd like to know how to extract parts from a QueryDict sent by requests.post in general. And more specific: How do I parse and save the _io-Object from this set of data?
# sending app
file = "path/to/image.jpg"
data = open(file, 'rb')
files = {
'json': (None, crawlingResultConnectionsJson, 'application/json'),
'file': (os.path.basename(file), open(file, 'rb'), 'application/octet-stream')
}
url = "http://127.0.0.1:8000/result/" # receiving/saving django rest framework app
r = requests.post(url, files=files)
I've tried quite a while now. Your help would be much appreciated! Thanks!
I came to a solution that perfectly fits my needs. As I only found contributions for either the sending OR the receiving part, I'll try to put everything together here.
Due to more flexibility my approach is to transfer json and images in seperated requests. The two following apps are completely independent.
The sending side does as follows (app with no server needed):
from django.core.serializers.json import DjangoJSONEncoder
import requests # http://docs.python-requests.org/en/master/
import datetime # in case...
import json
### send image stuff ###
urlImages = "http://127.0.0.1:8000/receive-images/"
file = "C:\\path\\to\\filename.jpg" # "\\" windows machine...
# this will probably run in a loop or so to process a bunch of images
with open(file, 'rb') as f:
filename = "filename.jpg"
files = {'file': (filename, f)}
r = requests.post(urlImages, files=files)
print(r) # some logging here
### send data stuff ###
data = data # python data - lists, dicts, whatever
json = json.dumps(data, cls=DjangoJSONEncoder) # DjangoJSONEncoder handles datetime fields
urlData = "http://127.0.0.1:8000/receive-data/"
headers = {'content-type': 'application/json'}
r = requests.post(urlData, json, headers=headers)
print(r) # some logging here
The receiving side needs to run a server (built-in django server for dev, apache with the WSGInterface in production) and it has this chum installed: http://www.django-rest-framework.org/
Finally we have two views to handle the requests:
from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework import status
from .api_controller import ApiController
from django.core.files.storage import default_storage
class ReceiveImages(APIView): # make sure to nail down corresponding url-confs
def post(self, request, format=None):
file = request.data.get('file')
filename = str(file)
with default_storage.open('images/' + filename, 'wb+') as destination:
for chunk in file.chunks():
destination.write(chunk)
return Response("ok", status=status.HTTP_200_OK)
class ReceiveData(APIView): # make sure to nail down corresponding url-confs
def post(self, request, format=None):
json = request.data
ApiController().createDataIfNotExists(json)
# As my json is quite complex,
# I've sourced out the DB-interactions to a controller-like class (coming
# from PHP symfony :)) with heavy use of the great
# MyModel.objects.get_or_create() method. Also the strptime() method is used
# to convert the datetime fields. This could also go here...
return Response("ok", status=status.HTTP_200_OK)
using chunk() in respect to https://stackoverflow.com/a/30195605/6522103
Please (!) comment/answer if you don't agree or think that this could be improved. Thanks!!
I am building a simple web service that requires all requests to be signed. The signature hash is generated using request data including the request body. My desire is to have a middleware component that validates the request signature, responding with an error if the signature is invalid. The problem is the middleware needs to read the request body using env['wsgi.input'].read(). This advances the pointer for the request body string to the end, which makes the data inaccessible to other components further down in the chain of execution.
Is there any way to make it so env['wsgi.input'] can be read twice?
Ex:
from myapp.lib.helpers import sign_request
from urlparse import parse_qs
import json
class ValidateSignedRequestMiddleware(object):
def __init__(self, app, secret):
self._app = app
self._secret = secret
def __call__(self, environ, start_response):
auth_params = environ['HTTP_AUTHORIZATION'].split(',', 1)
timestamp = auth_params[0].split('=', 1)[1]
signature = auth_params[1].split('=', 1)[1]
expected_signature = sign_request(
environ['REQUEST_METHOD'],
environ['HTTP_HOST'],
environ['PATH_INFO'],
parse_qs(environ['QUERY_STRING']),
environ['wsgi.input'].read(),
timestamp,
self._secret
)
if signature != expected_signature:
start_response('400 Bad Request', [('Content-Type', 'application/json')])
return [json.dumps({'error': ('Invalid request signature',)})]
return self._app(environ, start_response)
You can try seeking back to the beginning, but you may find that you'll have to replace it with a StringIO containing what you just read out.
The following specification deals with that exact problem, providing explanation of the problem as well as the solution including source code and special cases to take into account:
http://wsgi.readthedocs.org/en/latest/specifications/handling_post_forms.html