How can I serve temporary files from Python Pyramid

How can I serve temporary files from Python Pyramid - python

Currently, I'm just serving files like this:
# view callable
def export(request):
response = Response(content_type='application/csv')
# use datetime in filename to avoid collisions
f = open('/temp/XML_Export_%s.xml' % datetime.now(), 'r')
# this is where I usually put stuff in the file
response.app_iter = f
response.headers['Content-Disposition'] = ("attachment; filename=Export.xml")
return response
The problem with this is that I can't close or, even better, delete the file after the response has been returned. The file gets orphaned. I can think of some hacky ways around this, but I'm hoping there's a standard way out there somewhere. Any help would be awesome.

You do not want to set a file pointer as the app_iter. This will cause the WSGI server to read the file line by line (same as for line in file), which is typically not the most efficient way to control a file upload (imagine one character per line). Pyramid's supported way of serving files is via pyramid.response.FileResponse. You can create one of these by passing a file object.
response = FileResponse('/some/path/to/a/file.txt')
response.headers['Content-Disposition'] = ...
Another option is to pass a file pointer to app_iter but wrap it in the pyramid.response.FileIter object, which will use a sane block size to avoid just reading the file line by line.
The WSGI specification has strict requirements that response iterators which contain a close method will be invoked at the end of the response. Thus setting response.app_iter = open(...) should not cause any memory leaks. Both FileResponse and FileIter also support a close method and will thus be cleaned up as expected.
As a minor update to this answer I thought I'd explain why FileResponse takes a file path and not a file pointer. The WSGI protocol provides servers an optional ability to provide an optimized mechanism for serving static files via environ['wsgi.file_wrapper']. FileResponse will automatically handle this if your WSGI server has provided that support. With this in mind, you find it to be a win to save your data to a tmpfile on a ramdisk and providing the FileResponse with the full path, instead of trying to pass a file pointer to FileIter.
http://docs.pylonsproject.org/projects/pyramid/en/1.4-branch/api/response.html#pyramid.response.FileResponse

Update:
Please see Michael Merickel's answer for a better solution and explanation.
If you want to have the file deleted once response is returned, you can try the following:
import os
from datetime import datetime
from tempfile import NamedTemporaryFile
# view callable
def export(request):
response = Response(content_type='application/csv')
with NamedTemporaryFile(prefix='XML_Export_%s' % datetime.now(),
suffix='.xml', delete=True) as f:
# this is where I usually put stuff in the file
response = FileResponse(os.path.abspath(f.name))
response.headers['Content-Disposition'] = ("attachment; filename=Export.xml")
return response
You can consider using NamedTemporaryFile:
NamedTemporaryFile(prefix='XML_Export_%s' % datetime.now(), suffix='.xml', delete=True)
Setting delete=True so that the file is deleted as soon as it is closed.
Now, with the help of with you can always have the guarantee that the file will be closed, and hence deleted:
from tempfile import NamedTemporaryFile
from datetime import datetime
# view callable
def export(request):
response = Response(content_type='application/csv')
with NamedTemporaryFile(prefix='XML_Export_%s' % datetime.now(),
suffix='.xml', delete=True) as f:
# this is where I usually put stuff in the file
response.app_iter = f
response.headers['Content-Disposition'] = ("attachment; filename=Export.xml")
return response

The combination of Michael and Kay's response works great under Linux/Mac but won't work under Windows (for auto-deletion). Windows doesn't like the fact that FileResponse tries to open the already open file (see description of NamedTemporaryFile).
I worked around this by creating a FileDecriptorResponse class which is essentially a copy of FileResponse, but takes the file descriptor of the open NamedTemporaryFile. Just replace the open with a seek(0) and all the path based calls (last_modified, content_length) with their fstat equivalents.
class FileDescriptorResponse(Response):
"""
A Response object that can be used to serve a static file from an open
file descriptor. This is essentially identical to Pyramid's FileResponse
but takes a file descriptor instead of a path as a workaround for auto-delete
not working with NamedTemporaryFile under Windows.
``file`` is a file descriptor for an open file.
``content_type``, if passed, is the content_type of the response.
``content_encoding``, if passed is the content_encoding of the response.
It's generally safe to leave this set to ``None`` if you're serving a
binary file. This argument will be ignored if you don't also pass
``content-type``.
"""
def __init__(self, file, content_type=None, content_encoding=None):
super(FileDescriptorResponse, self).__init__(conditional_response=True)
self.last_modified = fstat(file.fileno()).st_mtime
if content_type is None:
content_type, content_encoding = mimetypes.guess_type(path,
strict=False)
if content_type is None:
content_type = 'application/octet-stream'
self.content_type = content_type
self.content_encoding = content_encoding
content_length = fstat(file.fileno()).st_size
file.seek(0)
app_iter = FileIter(file, _BLOCK_SIZE)
self.app_iter = app_iter
# assignment of content_length must come after assignment of app_iter
self.content_length = content_length
Hope that's helpful.

There is also repoze.filesafe which will take care of generating a temporary file for you, and delete it at the end. I use it for saving files uploaded to my server. Perhaps it can be useful to you too.

Because your Object response is holding a file handle for the file '/temp/XML_Export_%s.xml'. Use del statement to delete handle 'response.app_iter'.
del response.app_iter

both Michael Merickel and Kay Zhu are fine.
I found out that I also need to reset file position at the begninnign of the NamedTemporaryFile before passing it to response, as it seems that the response starts from the actual position in the file and not from the beginning (It's fine, you just need to now it).
With NamedTemporaryFile with deletion set, you can not close and reopen it, because it would delete it (and you can't reopen it anyway), so you need to use something like this:
f = tempfile.NamedTemporaryFile()
#fill your file here
f.seek(0, 0)
response = FileResponse(
f,
request=request,
content_type='application/csv'
)
hope it helps ;)

Related

File corrupted when using send_file() from flask, data from pymongo gridfs

Well my English is not good, and the title may looks weird.
Anyway, I'm now using flask to build a website that can store files, and mongodb is the database.
The file upload, document insert functions have no problems, the weird thing is that the file sent from flask send_file() was truncated for no reasons. Here's my code
from flask import ..., send_file, ...
import pymongo
import gridfs
#...
#app.route("/record/download/<record_id>")
def api_softwares_record_download(record_id):
try:
#...
file = files_gridfs.find_one({"_id": record_id})
file_ext = filetype.guess_extension(file.read(2048))
filename = "{}-{}{}".format(
app["name"],
record["version"],
".{}".format(file_ext) if file_ext else "",
)
response = send_file(file, as_attachment=True, attachment_filename=filename)
return response
except ...
The original image file, for example, is 553KB. But the response body returns 549.61KB, and the image was broken. But if I just directly write the file to my disk
#...
with open('test.png', 'wb+') as file:
file.write(files_gridfs.find_one({"_id": record_id}).read())
The image file size is 553KB and the image is readable.
When I compare the two files with VS Code's text editor, I found that the correct file starts with �PNG, but the corrupted file starts with �ϟ8���>�L�y
search the corrupted file head in the correct file
And I tried to use Blob object and download it from the browser. No difference.
Is there any wrong with my code or I misused send_file()? Or should I use flask_pymongo?

And it's interesting that I have found what is wrong with my code.
This is how I solved it
...file.read(2048)
file.seek(0)
...
file.read(2048)
file.seek(0)
...
response = send_file(file, ...)
return response
And here's why:
For some reasons, I use filetype to detect the file's extension name and mime type, so I sent 2048B to filetype for detection.
file_ext = filetype.guess_extension(file.read(2048))
file_mime = filetype.guess_mime(file.read(2048)) #this line wasn't copied in my question. My fault.
And I have just learned from the pymongo API that python (or pymongo or gridfs, completely unknown to this before) reads file by using a cursor. When I try to find the cursor's position using file.seek(), it returns 4096. So when I call file.read() again in send_file(), the cursor reads from 4096B away to the file head. 549+4=553, and here's the problem.
Finally I set the cursor to position 0 after every read() operation, and it returns the correct file.
Hope this can help if you made the same mistake just like me.

Flask - Delete zipfile after download [duplicate]

I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)

after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r

How to serve a created tempfile in django

I have a remote storage project that when the user requests his file, the django server retrieves and stores the file locally (for some processing) as a temporary file and then serves it to the user with mod x-sendfile. I certainly want the tempfile to be deleted after it is served to the user.
The documentations state that NamedTemporaryFile delete argument if set to False leads to deletion of the file after that all the references are gone. But when the user is served the tempfile, it doesn't get deleted. If I set the delete=True in case of downloading I get the "The requested URL /ServeSegment/Test.jpg/ was not found on this server."
Here is a view to list the user files:
def file_profile(request):
obj = MainFile.objects.filter(owner=request.user)
context = {'title': 'welcome',
'obj': obj
}
return render(request, 'ServeSegments.html', context=context)
This is the view which retrieves, stores temporarily and serve the requested file:
def ServeSegment(request, segmentID):
if request.method == 'GET':
url = 'http://192.168.43.7:8000/foo/'+str(segmentID)
r = requests.get(url, stream=True)
if r.status_code == 200:
with tempfile.NamedTemporaryFile(dir=
'/tmp/Files', mode='w+b') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
response = HttpResponse()
response['Content-Disposition'] = 'attachment; segmentID={0}'.format(f.name)
response['X-Sendfile'] = "{0}".format(f.name)
return response
else:
return HttpResponse(str(segmentID))
I guess if I could manage to return the response inside with a statement and after that, the last chunk was written, it would work as I want, but I found no solution regarding how to determine if we are in the last loop (without being hackish).
What should I do the serve the tempfile and have it deleted right after?

Adding a generalized answer (based on Cyrbil's) that avoids using signals by doing the cleanup in a finally block.
While the directory entry is deleted by os.remove on the way out, the underlying file remains open until FileResponse closes it. You can check this by inspecting response._closable_objects[0].fileno() in the finally block with pdb, and checking open files with lsof in another terminal while it's paused.
It looks like it's important that you're on a Unix system if you're going to use this solution (see os.remove docs)
https://docs.python.org/3/library/os.html#os.remove
import os
import tempfile
from django.http import FileResponse
def my_view(request):
try:
tmp = tempfile.NamedTemporaryFile(delete=False)
with open(tmp.name, 'w') as fi:
# write to your tempfile, mode may vary
response = FileResponse(open(tmp.name, 'rb'))
return response
finally:
os.remove(tmp.name)

Any file created by tempfile will be deleted once the file handler is closed. In your case, when you exit the with statement. The delete=False argument prevent this behavior and let the deletion up to the application. You can delete the file after its been sent by registering a signal handler that will unlink the file once response is sent.
Your example does nothing on the file, so you might want to stream the content directly with StreamingHttpResponse or FileResponse. But as you said you "stores the file locally (for some processing)", I would suggest thinking on doing the processing without any temporary file created and only work with streams.

Disposable files
The solution to the question is to not use with in the NamedTemporaryFile and handle exceptions. Currently your file is being deleted before your read. At the end return
f.seek(0)
return FileResponse(f, as_attachment=True, filename=f.name)
The temporary file will be closed when the read is complete and therefore deleted.
Non-disposable files
For those who stumble across do not have an automatically disposable file handle.
From the other answers, signals seemed to be a reasonable solution however passing data required altering protected members. I was unsure how supported it would be in the future. I also found that whp's solution did not work in the current version of Django. The most future-proof version I could come up with was monkey patching the file output so the file is deleted on close. Django closes the file handles at the end of sending the file and I can't see that changing.
def my_view(request):
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
# write file tmp (remember to close if re-opening)
# after write close the file (if not closed)
stream_file = open(tmp.name, 'rb')
# monkey patch the file
original_close = stream_file.close
def new_close():
original_close()
os.remove(tmp.name)
stream_file.close = new_close
# return the result
return FileResponse(stream_file, as_attachment=True, filename='out.txt')
except Exception:
os.remove(output.name)
raise

Delete an uploaded file after downloading it from Flask

I am currently working on a small web interface which allows different users to upload files, convert the files they have uploaded, and download the converted files. The details of the conversion are not important for my question.
I am currently using flask-uploads to manage the uploaded files, and I am storing them in the file system. Once a user uploads and converts a file, there are all sorts of pretty buttons to delete the file, so that the uploads folder doesn't fill up.
I don't think this is ideal. What I really want is for the files to be deleted right after they are downloaded. I would settle for the files being deleted when the session ends.
I've spent some time trying to figure out how to do this, but I have yet to succeed. It doesn't seem like an uncommon problem, so I figure there must be some solution out there that I am missing. Does anyone have a solution?

There are several ways to do this.
send_file and then immediately delete (Linux only)
Flask has an after_this_request decorator which could work for this use case:
#app.route('/files/<filename>/download')
def download_file(filename):
file_path = derive_filepath_from_filename(filename)
file_handle = open(file_path, 'r')
#after_this_request
def remove_file(response):
try:
os.remove(file_path)
file_handle.close()
except Exception as error:
app.logger.error("Error removing or closing downloaded file handle", error)
return response
return send_file(file_handle)
The issue is that this will only work on Linux (which lets the file be read even after deletion if there is still an open file pointer to it). It also won't always work (I've heard reports that sometimes send_file won't wind up making the kernel call before the file is already unlinked by Flask). It doesn't tie up the Python process to send the file though.
Stream file, then delete
Ideally though you'd have the file cleaned up after you know the OS has streamed it to the client. You can do this by streaming the file back through Python by creating a generator that streams the file and then closes it, like is suggested in this answer:
def download_file(filename):
file_path = derive_filepath_from_filename(filename)
file_handle = open(file_path, 'r')
# This *replaces* the `remove_file` + #after_this_request code above
def stream_and_remove_file():
yield from file_handle
file_handle.close()
os.remove(file_path)
return current_app.response_class(
stream_and_remove_file(),
headers={'Content-Disposition': 'attachment', 'filename': filename}
)
This approach is nice because it is cross-platform. It isn't a silver bullet however, because it ties up the Python web process until the entire file has been streamed to the client.
Clean up on a timer
Run another process on a timer (using cron, perhaps) or use an in-process scheduler like APScheduler and clean up files that have been on-disk in the temporary location beyond your timeout (e. g. half an hour, one week, thirty days, after they've been marked "downloaded" in RDMBS)
This is the most robust way, but requires additional complexity (cron, in-process scheduler, work queue, etc.)

You can also store the file's data in memory, delete it, then serve what you have in memory.
For example, if you were serving a PDF:
import io
import os
#app.route('/download')
def download_file():
file_path = get_path_to_your_file()
return_data = io.BytesIO()
with open(file_path, 'rb') as fo:
return_data.write(fo.read())
# (after writing, cursor will be at last byte, so move it to start)
return_data.seek(0)
os.remove(file_path)
return send_file(return_data, mimetype='application/pdf',
attachment_filename='download_filename.pdf')
(above I'm just assuming it's PDF, but you can get the mimetype programmatically if you need)

Flask has an after_request decorator which could work in this case:
#app.route('/', methods=['POST'])
def upload_file():
uploaded_file = request.files['file']
file = secure_filename(uploaded_file.filename)
#app.after_request
def delete(response):
os.remove(file_path)
return response
return send_file(file_path, as_attachment=True, environ=request.environ)

Based on #Garrett comment, the better approach is to not blocking the send_file while removing the file. IMHO, the better approach is to remove it in the background, something like the following is better:
import io
import os
from flask import send_file
from multiprocessing import Process
#app.route('/download')
def download_file():
file_path = get_path_to_your_file()
return_data = io.BytesIO()
with open(file_path, 'rb') as fo:
return_data.write(fo.read())
return_data.seek(0)
background_remove(file_path)
return send_file(return_data, mimetype='application/pdf',
attachment_filename='download_filename.pdf')
def background_remove(path):
task = Process(target=rm(path))
task.start()
def rm(path):
os.remove(path)

How to use pyramid.response.FileIter

I have the following view code that attempts to "stream" a zipfile to the client for download:
import os
import zipfile
import tempfile
from pyramid.response import FileIter
def zipper(request):
_temp_path = request.registry.settings['_temp']
tmpfile = tempfile.NamedTemporaryFile('w', dir=_temp_path, delete=True)
tmpfile_path = tmpfile.name
## creating zipfile and adding files
z = zipfile.ZipFile(tmpfile_path, "w")
z.write('somefile1.txt')
z.write('somefile2.txt')
z.close()
## renaming the zipfile
new_zip_path = _temp_path + '/somefilegroup.zip'
os.rename(tmpfile_path, new_zip_path)
## re-opening the zipfile with new name
z = zipfile.ZipFile(new_zip_path, 'r')
response = FileIter(z.fp)
return response
However, this is the Response I get in the browser:
Could not convert return value of the view callable function newsite.static.zipper into a response object. The value returned was .
I suppose I am not using FileIter correctly.
UPDATE:
Since updating with Michael Merickel's suggestions, the FileIter function is working correctly. However, still lingering is a MIME type error that appears on the client (browser):
Resource interpreted as Document but transferred with MIME type application/zip: "http://newsite.local:6543/zipper?data=%7B%22ids%22%3A%5B6%2C7%5D%7D"
To better illustrate the issue, I have included a tiny .py and .pt file on Github: https://github.com/thapar/zipper-fix

FileIter is not a response object, just like your error message says. It is an iterable that can be used for the response body, that's it. Also the ZipFile can accept a file object, which is more useful here than a file path. Let's try writing into the tmpfile, then rewinding that file pointer back to the start, and using it to write out without doing any fancy renaming.
import os
import zipfile
import tempfile
from pyramid.response import FileIter
def zipper(request):
_temp_path = request.registry.settings['_temp']
fp = tempfile.NamedTemporaryFile('w+b', dir=_temp_path, delete=True)
## creating zipfile and adding files
z = zipfile.ZipFile(fp, "w")
z.write('somefile1.txt')
z.write('somefile2.txt')
z.close()
# rewind fp back to start of the file
fp.seek(0)
response = request.response
response.content_type = 'application/zip'
response.app_iter = FileIter(fp)
return response
I changed the mode on NamedTemporaryFile to 'w+b' as per the docs to allow the file to be written to and read from.

current Pyramid version has 2 convenience classes for this use case- FileResponse, FileIter. The snippet below will serve a static file. I ran this code - the downloaded file is named "download" like the view name. To change the file name and more set the Content-Disposition header or have a look at the arguments of pyramid.response.Response.
from pyramid.response import FileResponse
#view_config(name="download")
def zipper(request):
path = 'path_to_file'
return FileResponse(path, request) #passing request is required
docs:
http://docs.pylonsproject.org/projects/pyramid/en/latest/api/response.html#
hint: extract the Zip logic from the view if possible

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I serve temporary files from Python Pyramid - python

There is also repoze.filesafe which will take care of generating a temporary file for you, and delete it at the end. I use it for saving files uploaded to my server. Perhaps it can be useful to you too.

Because your Object response is holding a file handle for the file '/temp/XML_Export_%s.xml'. Use del statement to delete handle 'response.app_iter'. del response.app_iter

Related

File corrupted when using send_file() from flask, data from pymongo gridfs

Flask - Delete zipfile after download [duplicate]

How to serve a created tempfile in django

Delete an uploaded file after downloading it from Flask

How to use pyramid.response.FileIter

Categories

Resources