Basically, I'm trying to create an endpoint to Upload files to Amazon S3.
async def upload_files(filepath: str, upload_file_list: List[UploadFile] = File(...)):
for upload_file in upload_file_list:
abs_file_path = "/manual/path/works" + upload_file.path
# Replace above line to get absolute file path from UploadFile
response = s3_client.upload_file(abs_file_path,bucket_name,
os.path.join(dest_path, upload_file.filename))
Above is my code to upload multiple files to the S3 bucket.
s3_client.upload_file() accepts an absolute file path of the file to upload.
It is working when I manually put the full path.
This, however, didn't work:
response = s3_client.upload_file(upload_file.filename, bucket_name,
os.path.join(dest_path, upload_file.filename))
Is there a way to get this absolute path in FastAPI? Or, any alternative with temp_path without copying or writing the file?
If not, then any alternative with boto3 to upload files to S3 using FastAPI?
UploadFile uses Python's SpooledTemporaryFile, which is a "file stored in memory", and "is destroyed as soon as it is closed". You can either read the file contents (i.e., using contents = file.file.read() or for async read/write have a look at this answer), and then upload these bytes to your server (if it permits), or copy the contents of the uploaded file into a NamedTemporaryFile, as explained here. Unlike SpooledTemporaryFile, a NamedTemporaryFile "is guaranteed to have a visible name in the file system" that "can be used to open the file". That name can be retrieved from the name attribute (i.e., temp.name). Example:
from fastapi import HTTPException
#app.post("/upload")
def upload(file: UploadFile = File(...)):
temp = NamedTemporaryFile(delete=False)
try:
try:
contents = file.file.read()
with temp as f:
f.write(contents);
except Exception:
raise HTTPException(status_code=500, detail='Error on uploading the file')
finally:
file.file.close()
# Here, upload the file to your S3 service using `temp.name`
s3_client.upload_file(temp.name, 'local', 'myfile.txt')
except Exception:
raise HTTPException(status_code=500, detail='Something went wrong')
finally:
#temp.close() # the `with` statement above takes care of closing the file
os.remove(temp.name) # Delete temp file
Update
Additionally, one can access the actual Python file using the .file attribute. As per the documentation:
file: A SpooledTemporaryFile (a file-like object). This is the actual
Python file that you can pass directly to other functions or libraries
that expect a "file-like" object.
Thus, you could also try using upload_fileobj function and passing upload_file.file:
response = s3_client.upload_fileobj(upload_file.file, bucket_name, os.path.join(dest_path, upload_file.filename))
or, passing a file-like object using the ._file attribute of the SpooledTemporaryFile, which returns either an io.BytesIO or io.TextIOWrapper object (depending on whether binary or text mode was specified).
response = s3_client.upload_fileobj(upload_file.file._file, bucket_name, os.path.join(dest_path, upload_file.filename))
Update 2
You could even keep the bytes in an in-memory buffer (i.e., BytesIO), use it to upload the contents to the S3 bucket, and finally close it ("The buffer is discarded when the close() method is called."). Remember to call seek(0) method to reset the cursor back to the beginning of the file after you finish writing to the BytesIO stream.
contents = file.file.read()
temp_file = io.BytesIO()
temp_file.write(contents)
temp_file.seek(0)
s3_client.upload_fileobj(temp_file, bucket_name, os.path.join(dest_path, upload_file.filename))
temp_file.close()
Related
I am trying to give the user a "Save as" option when the user clicks the download button in my Django app. When the user clicks the button it kicks-off the following function. The function gets some CSVs from a blob container in Azure and adds them to a zip. That zip should then be offered to download and store in a location of the user's choice.
def create_downloadable_zip():
container_client = az.container_client(container_name=blob_generator.container_name)
blobs = container_client.list_blobs()
zip_file = zipfile.ZipFile(f'{models.AppRun.client_name}.zip', 'w')
for blob in blobs:
if blob.name.endswith(".csv"):
downloaded_blob = container_client.download_blob(blob)
blob_data = downloaded_blob.readall()
zip_file.writestr(blob.name, blob_data)
zip_file.close()
return zip_file
My views.py looks like follow:
def download_file(request):
if request.method == 'POST':
zip = create_downloadable_zip()
response = HttpResponse(zip, content_type='application/zip')
response['Content-Disposition'] = 'attachement;' f'filename={zip}.zip'
return response
#
# else:
# # return a 404 response if this is a POST request
# return HttpResponse(status=404)
return render(request, "download_file.html")
The functionality works, but it returns an empty non-zip file when the "Save as" window pop-ups. However, the actual zip file contains the files is being saved in the root folder of the Django project.
I really don't get why I doesn't return the zip file from memory, but rather directly stores that zip file in root and returns an empty non-zip file with the download functionality.
Someone knows what I am doing wrong?
zipfile is used to open a file, but it is not the actual file, simply a zipfile object as #b-remmelzwaal mentioned. You will need to create a file like object, and return that instead. This can be done using io.BytesIO.
from io import BytesIO
from zipfile import ZipFile
def create_zip():
container_client = az.container_client(container_name=blob_generator.container_name)
blobs = container_client.list_blobs()
buffer = BytesIO()
with ZipFile(buffer, 'w') as zip_file:
for blob in blobs:
if blob.name.endswith(".csv"):
downloaded_blob = container_client.download_blob(blob)
blob_data = downloaded_blob.readall()
zip_file.writestr(blob.name, blob_data)
return buffer.getvalue()
Note we are returning the file like object, not the zip file object. This is because buffer represents the actual file you've created.
You don't have to use a context manager, but I find them very useful.
Also, check your spelling for the line:
# attachment instead attachement
response['Content-Disposition'] = 'attachment;' f'filename={zip}.zip'
BytesIO Documentation
I have a FastAPI endpoint that receives a file, uploads it to s3, and then processes it. Everything works fine except for the processing, that fails with this message:
File "/usr/local/lib/python3.9/site-packages/starlette/datastructures.py", line 441, in read
return self.file.read(size)
File "/usr/local/lib/python3.9/tempfile.py", line 735, in read
return self._file.read(*args)
ValueError: I/O operation on closed file.
My simplified code looks like this:
async def process(file: UploadFile):
reader = csv.reader(iterdecode(file.file.read(), "utf-8"), dialect="excel") # This fails!
datarows = []
for row in reader:
datarows.append(row)
return datarows
How can I read the contents of the uploaded file?
UPDATE
I managed to isolate the problem a bit more. Here's my simplified endpoint:
import boto3
from loguru import logger
from botocore.exceptions import ClientError
UPLOAD = True
#router.post("/")
async def upload(file: UploadFile = File(...)):
if UPLOAD:
# Upload the file
s3_client = boto3.client("s3", endpoint_url="http://localstack:4566")
try:
s3_client.upload_fileobj(file.file, "local", "myfile.txt")
except ClientError as e:
logger.error(e)
contents = await file.read()
return JSONResponse({"message": "Success!"})
If UPLOAD is True, I get the error. If it's not, everything works fine. It seems boto3 is closing the file after uploading it. Is there any way I can reopen the file? Or send a copy to upload_fileobj?
FastAPI's (actually Starlette's) UploadFile (see Starlette's documentation as well) uses Python's SpooledTemporaryFile, a "file stored in memory up to a maximum size limit, and after passing this limit it will be stored in disk.". It "operates exactly as TemporaryFile", which "is destroyed as soon as it is closed (including an implicit close when the object is garbage collected)". Hence, it seems that once the contents of the file have been read by boto3, the file gets closed, which, in turn, causes the file to be deleted.
Option 1
If the server supports it, you could read the file contents—using contents = file.file.read(), as shown in this answer (or for async reading/writing see here)—and then upload these contents (i.e.,bytes) to your server directly.
Otherwise, you can again read the contents and then move the file's reference point at the beginning of the file. In a file there is an internal "cursor" (or "file pointer") denoting the position from which the file contents will be read (or written). When calling read() reads all the way to the end of the buffer, leaving zero bytes beyond the cursor. Thus, one could also use the seek() method to set the current position of the cursor to 0 (i.e., rewinding the cursor to the start of the file); thus, allowing you to pass the file object (i.e., upload_fileobj(file.file) see this answer) after reading the file contents.
As per FastAPI's documentation:
seek(offset): Goes to the byte position offset (int) in the file.
E.g., await myfile.seek(0) would go to the start of the file.
This is especially useful if you run await myfile.read() once and then need to read the contents again.
Example
from fastapi import File, UploadFile, HTTPException
#app.post('/')
def upload(file: UploadFile = File(...)):
try:
contents = file.file.read()
file.file.seek(0)
# Upload the file to to your S3 service
s3_client.upload_fileobj(file.file, 'local', 'myfile.txt')
except Exception:
raise HTTPException(status_code=500, detail='Something went wrong')
finally:
file.file.close()
print(contents) # Handle file contents as desired
return {"filename": file.filename}
Option 2
Copy the contents of the file into a NamedTemporaryFile, which, unlike TemporaryFile, "has a visible name in the file system" that "can be used to open the file" (that name can be retrieved from the .name attribute ). Additionally, it can remain accesible after it is closed, by setting the delete argument to False; thus, allowing the file to reopen when needed. Once you are done with it, you can delete it using the os.remove() or os.unlink() method. Below is a working example (inspired by this answer):
from fastapi import FastAPI, File, UploadFile, HTTPException
from tempfile import NamedTemporaryFile
import os
app = FastAPI()
#app.post("/upload")
def upload_file(file: UploadFile = File(...)):
temp = NamedTemporaryFile(delete=False)
try:
try:
contents = file.file.read()
with temp as f:
f.write(contents);
except Exception:
raise HTTPException(status_code=500, detail='Error on uploading the file')
finally:
file.file.close()
# Upload the file to your S3 service using `temp.name`
s3_client.upload_file(temp.name, 'local', 'myfile.txt')
except Exception:
raise HTTPException(status_code=500, detail='Something went wrong')
finally:
#temp.close() # the `with` statement above takes care of closing the file
os.remove(temp.name) # Delete temp file
print(contents) # Handle file contents as desired
return {"filename": file.filename}
Option 3
You could even keep the bytes in an in-memory buffer BytesIO, use it to upload the contents to the S3 bucket, and finally close it ("The buffer is discarded when the close() method is called."). Remember to call seek(0) method to reset the cursor back to the beginning of the file after you finish writing to the BytesIO stream.
contents = file.file.read()
temp_file = io.BytesIO()
temp_file.write(contents)
temp_file.seek(0)
s3_client.upload_fileobj(temp_file, "local", "myfile.txt")
temp_file.close()
From FastAPI ImportFile:
Import File and UploadFile from fastapi:
from fastapi import FastAPI, File, UploadFile
app = FastAPI()
#app.post("/files/")
async def create_file(file: bytes = File(...)):
return {"file_size": len(file)}
#app.post("/uploadfile/")
async def create_upload_file(file: UploadFile = File(...)):
return {"filename": file.filename}
From FastAPI UploadFile:
For example, inside of an async path operation function you can get the
contents with:
contents = await myfile.read()
with your code you should have something like this:
async def process(file: UploadFile = File(...)):
content = await file.read()
reader = csv.reader(iterdecode(content, "utf-8"), dialect="excel")
datarows = []
for row in reader:
datarows.append(row)
return datarows
For a project, I'm trying to get an uploaded image file, stored in a bucket. I'm trying to have Python save a copy temporarily, just to perform a few tasks on this file (read, decode and give the decoded file back as JSON). After this is done, the temp file needs to be deleted.
I'm using Python 3.8, if that helps at all.
If you want some snippets of what I tried, I'm happy to provide :)
#edit
So far, I tried just downloading the file from the bucket, which works. But I can't seem to figure out how to temporarily save it to just decode (I got an API that will decode the image and get data from that file). This is the code for downloading
def download_file_from_bucket(blob_name, file_path, bucket_name):
try:
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
with open(file_path, 'wb') as f:
storage_client.download_blob_to_file(blob, f)
except Exception as e:
print(e)
return False
bucket_name = 'white-cards-with-qr'
download_file_from_bucket('My first Blob Image', os.path.join(os.getcwd(), 'file2.jpg'), bucket_name)
for object store in cloud environment, you can sign your object to give access for ones who don't have account for that object, you may read this for google cloud
You can use the tempfile library. This is a really basic snippet. You can also name the file or read it after writing it.
import tempfile
temp = tempfile.TemporaryFile()
try:
temp.write(blob)
finally:
temp.close()
I have written code on my backend (hosted on Elastic Beanstalk) to retrieve a file from an S3 bucket and save it back to the bucket under a different name. I am using boto3 and have created an s3 client called 's3'.
bucketname is the name of the bucket, keyname is name of the key. I am also using the tempfile module
tmp = tempfile.NamedTemporaryFile()
with open(tmp.name, 'wb') as f:
s3.download_fileobj(bucketname, keyname, f)
s3.upload_file(tmp, bucketname, 'fake.jpg')
I was wondering if my understanding was off (still debugging why there is an error) - I created a tempfile and opened and saved within it the contents of the object with the keyname and bucketname. Then I uploaded that temp file to the bucket under a different name. Is my reasoning correct?
The upload_file() command is expecting a filename (as a string) in the first parameter, not a file object.
Instead, you should use upload_fileobj().
However, I would recommend something different...
If you simply wish to make a copy of an object, you can use copy_object:
response = client.copy_object(
Bucket='destinationbucket',
CopySource='/sourcebucket/HappyFace.jpg',
Key='HappyFaceCopy.jpg',
)
I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)
after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r