Django 1.11 download file chunk by chunk - python

In my case, I have the Django 1.11 server acting as a proxy. When you click "download" from the browser, it sends a request to the django proxy that downloads files from another server and processes them, after which they must "send" them to the browser to allow the user to download them. My proxy downloads and processes the files chunks by chunks.
How can I send chunks to the browser as they are ready so that the user finally downloads a single file?
In practice, I have to let you download a file that is not yet ready, like a stream.
def my_download(self, res)
# some code
file_handle = open(local_path, 'wb', self.chunk_size)
for chunk in res.iter_content(self.chunk_size):
i = i+1
print("index: ", i, "/", chunks)
if i > chunks-1:
is_last = True
# some code on the chunk
# Here, instead of saving the chunk locally, I would like to allow it to download it directly.
file_handle.write(chunk)
file_handle.close()
return True
Thank you in advance, greetings.

This question should be flagged as duplicate of this post: Serving large files ( with high loads ) in Django
Always try to find the answer before you create a question in SO, please!
Essentially the answer is included in Django's Documentation: "Streaming Large CSV files" example and we will apply the above question into that example:
You can use Django's StreamingHttpResponse and Python's wsgiref.util.FileWrapper to serve a large file in chunks effectivelly and without loading it in memory.
def my_download(request):
file_path = 'path/to/file'
chunk_size = DEFINE_A_CHUNK_SIZE_AS_INTEGER
filename = os.path.basename(file_path)
response = StreamingHttpResponse(
FileWrapper(open(file_path, 'rb'), chunk_size),
content_type="application/octet-stream"
)
response['Content-Length'] = os.path.getsize(file_path)
response['Content-Disposition'] = "attachment; filename=%s" % filename
return response
Now if you want to apply some processing to the file chunk-by-chunk you can utilize FileWrapper's generated iterator:
Place your chunk processing code in a function which MUST return the chunk:
def chunk_processing(chunk):
# Process your chunk here
# Be careful to preserve chunk's initial size.
return processed_chunk
Now apply the function inside the StreamingHttpResponse:
response = StreamingHttpResponse(
(
process_chunk(chunk)
for chunk in FileWrapper(open(file_path, 'rb'), chunk_size
),content_type="application/octet-stream"
)

Related

Flask - Delete zipfile after download [duplicate]

I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)
after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r

how to upload chunks of a string longer than 2147483647 bytes?

I am trying to upload a file around ~5GB size as below but, it throws the error string longer than 2147483647 bytes. It sounds like there is a limit of 2 GB to upload. Is there a way to upload data in chunks? Can anyone provide guidance?
logger.debug(attachment_path)
currdir = os.path.abspath(os.getcwd())
os.chdir(os.path.dirname(attachment_path))
headers = self._headers
headers['Content-Type'] = content_type
headers['X-Override-File'] = 'true'
if not os.path.exists(attachment_path):
raise Exception, "File path was invalid, no file found at the path %s" % attachment_path
filesize = os.path.getsize(attachment_path)
fileToUpload = open(attachment_path, 'rb').read()
logger.info(filesize)
logger.debug(headers)
r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)),
headers=headers,data=fileToUpload,timeout=300)
ERROR:
string longer than 2147483647 bytes
UPDATE:
def read_in_chunks(file_object,chunk_size=30720*30720):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
f = open(attachment_path)
for piece in read_in_chunks(f):
r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)),
headers=headers,data=piece,timeout=300)
Your question has been asked on the requests bug tracker; their suggestion is to use streaming upload. If that doesn't work, you might see if a chunk-encoded request works.
[edit]
Example based on the original code:
# Using `with` here will handle closing the file implicitly
with open(attachment_path, 'rb') as file_to_upload:
r = requests.put(
"{base}problems/{pid}/{atype}/{path}".format(
base=self._baseurl,
# It's better to use consistent naming; search PEP-8 for standard Python conventions.
pid=problem_id,
atype=attachment_type,
path=urllib.quote(os.path.basename(attachment_path)),
),
headers=headers,
# Note that you're passing the file object, NOT the contents of the file:
data=file_to_upload,
# Hard to say whether this is a good idea with a large file upload
timeout=300,
)
I can't guarantee this would run as-is, since I can't realistically test it, but it should be close. The bug tracker comments I linked to also mention that sending multiple headers may cause issues, so if the headers you're specifying are actually necessary, this may not work.
Regarding chunk encoding: This should be your second choice. Your code was not specifying 'rb' as the mode for open(...), so changing that should probably make the code above work. If not, you could try this.
def read_in_chunks():
# If you're going to chunk anyway, doesn't it seem like smaller ones than this would be a good idea?
chunk_size = 30720 * 30720
# I don't know how correct this is; if it doesn't work as expected, you'll need to debug
with open(attachment_path, 'rb') as file_object:
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
# Same request as above, just using the function to chunk explicitly; see the `data` param
r = requests.put(
"{base}problems/{pid}/{atype}/{path}".format(
base=self._baseurl,
pid=problem_id,
atype=attachment_type,
path=urllib.quote(os.path.basename(attachment_path)),
),
headers=headers,
# Call the chunk function here and the request will be chunked as you specify
data=read_in_chunks(),
timeout=300,
)

How to serve a created tempfile in django

I have a remote storage project that when the user requests his file, the django server retrieves and stores the file locally (for some processing) as a temporary file and then serves it to the user with mod x-sendfile. I certainly want the tempfile to be deleted after it is served to the user.
The documentations state that NamedTemporaryFile delete argument if set to False leads to deletion of the file after that all the references are gone. But when the user is served the tempfile, it doesn't get deleted. If I set the delete=True in case of downloading I get the "The requested URL /ServeSegment/Test.jpg/ was not found on this server."
Here is a view to list the user files:
def file_profile(request):
obj = MainFile.objects.filter(owner=request.user)
context = {'title': 'welcome',
'obj': obj
}
return render(request, 'ServeSegments.html', context=context)
This is the view which retrieves, stores temporarily and serve the requested file:
def ServeSegment(request, segmentID):
if request.method == 'GET':
url = 'http://192.168.43.7:8000/foo/'+str(segmentID)
r = requests.get(url, stream=True)
if r.status_code == 200:
with tempfile.NamedTemporaryFile(dir=
'/tmp/Files', mode='w+b') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
response = HttpResponse()
response['Content-Disposition'] = 'attachment; segmentID={0}'.format(f.name)
response['X-Sendfile'] = "{0}".format(f.name)
return response
else:
return HttpResponse(str(segmentID))
I guess if I could manage to return the response inside with a statement and after that, the last chunk was written, it would work as I want, but I found no solution regarding how to determine if we are in the last loop (without being hackish).
What should I do the serve the tempfile and have it deleted right after?
Adding a generalized answer (based on Cyrbil's) that avoids using signals by doing the cleanup in a finally block.
While the directory entry is deleted by os.remove on the way out, the underlying file remains open until FileResponse closes it. You can check this by inspecting response._closable_objects[0].fileno() in the finally block with pdb, and checking open files with lsof in another terminal while it's paused.
It looks like it's important that you're on a Unix system if you're going to use this solution (see os.remove docs)
https://docs.python.org/3/library/os.html#os.remove
import os
import tempfile
from django.http import FileResponse
def my_view(request):
try:
tmp = tempfile.NamedTemporaryFile(delete=False)
with open(tmp.name, 'w') as fi:
# write to your tempfile, mode may vary
response = FileResponse(open(tmp.name, 'rb'))
return response
finally:
os.remove(tmp.name)
Any file created by tempfile will be deleted once the file handler is closed. In your case, when you exit the with statement. The delete=False argument prevent this behavior and let the deletion up to the application. You can delete the file after its been sent by registering a signal handler that will unlink the file once response is sent.
Your example does nothing on the file, so you might want to stream the content directly with StreamingHttpResponse or FileResponse. But as you said you "stores the file locally (for some processing)", I would suggest thinking on doing the processing without any temporary file created and only work with streams.
Disposable files
The solution to the question is to not use with in the NamedTemporaryFile and handle exceptions. Currently your file is being deleted before your read. At the end return
f.seek(0)
return FileResponse(f, as_attachment=True, filename=f.name)
The temporary file will be closed when the read is complete and therefore deleted.
Non-disposable files
For those who stumble across do not have an automatically disposable file handle.
From the other answers, signals seemed to be a reasonable solution however passing data required altering protected members. I was unsure how supported it would be in the future. I also found that whp's solution did not work in the current version of Django. The most future-proof version I could come up with was monkey patching the file output so the file is deleted on close. Django closes the file handles at the end of sending the file and I can't see that changing.
def my_view(request):
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
# write file tmp (remember to close if re-opening)
# after write close the file (if not closed)
stream_file = open(tmp.name, 'rb')
# monkey patch the file
original_close = stream_file.close
def new_close():
original_close()
os.remove(tmp.name)
stream_file.close = new_close
# return the result
return FileResponse(stream_file, as_attachment=True, filename='out.txt')
except Exception:
os.remove(output.name)
raise

Access files from web.py urls

I'm using web.py for a small project and I have files I want the user to be able to access in /files directory on the server. I can't seem to find how to return a file on a GET request so I can't work how to do this.
Exactly want to do essentially is:
urls = ('/files/+', 'files')
class files:
def GET(self)
#RETURN SOME FILE
Is there a simple way to return a file from a GET request?
Playing around I came up with this webpy GET method:
def GET(self):
request = web.input( path=None )
getPath = request.path
if os.path.exists( getPath ):
getFile = file( getPath, 'rb' )
web.header('Content-type','application/octet-stream')
web.header('Content-transfer-encoding','base64')
return base64.standard_b64encode( getFile.read( ) )
else:
raise web.notfound( )
Other respondants are correct when they advise you consider carefully the security implications. In my case we will include code like this to an administrative web service that will be (should be!) available only within our internal LAN.
You can read the contents of a file and stream them down to the user, but I don't believe that a file handle is serializable.
It would seem to be a potential security hole to allow users to access and modify files on the server or to copy files down to their own machine. I think you should reassess what you're trying to accomplish.
This is how I do it by using generator and not reading the whole file into memory:
web.header("Content-Disposition", "attachment; filename=%s" % doc.filename)
web.header("Content-Type", doc.filetype)
web.header("Transfer-Encoding","chunked")
f = open(os.path.join(config.upload_dir, doc.path, doc.filename), "rb")
while 1:
buf = f.read(1024 * 8)
if not buf:
break
yield buf

Serving dynamically generated ZIP archives in Django

How to serve users a dynamically generated ZIP archive in Django?
I'm making a site, where users can choose any combination of available books and download them as ZIP archive. I'm worried that generating such archives for each request would slow my server down to a crawl. I have also heard that Django doesn't currently have a good solution for serving dynamically generated files.
The solution is as follows.
Use Python module zipfile to create zip archive, but as the file specify StringIO object (ZipFile constructor requires file-like object). Add files you want to compress. Then in your Django application return the content of StringIO object in HttpResponse with mimetype set to application/x-zip-compressed (or at least application/octet-stream). If you want, you can set content-disposition header, but this should not be really required.
But beware, creating zip archives on each request is bad idea and this may kill your server (not counting timeouts if the archives are large). Performance-wise approach is to cache generated output somewhere in filesystem and regenerate it only if source files have changed. Even better idea is to prepare archives in advance (eg. by cron job) and have your web server serving them as usual statics.
Here's a Django view to do this:
import os
import zipfile
import StringIO
from django.http import HttpResponse
def getfiles(request):
# Files (local path) to put in the .zip
# FIXME: Change this (get paths from DB etc)
filenames = ["/tmp/file1.txt", "/tmp/file2.txt"]
# Folder name in ZIP archive which contains the above files
# E.g [thearchive.zip]/somefiles/file2.txt
# FIXME: Set this to something better
zip_subdir = "somefiles"
zip_filename = "%s.zip" % zip_subdir
# Open StringIO to grab in-memory ZIP contents
s = StringIO.StringIO()
# The zip compressor
zf = zipfile.ZipFile(s, "w")
for fpath in filenames:
# Calculate path for file in zip
fdir, fname = os.path.split(fpath)
zip_path = os.path.join(zip_subdir, fname)
# Add file, at correct path
zf.write(fpath, zip_path)
# Must close zip for all contents to be written
zf.close()
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(s.getvalue(), mimetype = "application/x-zip-compressed")
# ..and correct content-disposition
resp['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
return resp
Many answers here suggest to use a StringIO or BytesIO buffer. However this is not needed as HttpResponse is already a file-like object:
response = HttpResponse(content_type='application/zip')
zip_file = zipfile.ZipFile(response, 'w')
for filename in filenames:
zip_file.write(filename)
response['Content-Disposition'] = 'attachment; filename={}'.format(zipfile_name)
return response
Note that you should not call zip_file.close() as the open "file" is response and we definitely don't want to close it.
I used Django 2.0 and Python 3.6.
import zipfile
import os
from io import BytesIO
def download_zip_file(request):
filelist = ["path/to/file-11.txt", "path/to/file-22.txt"]
byte_data = BytesIO()
zip_file = zipfile.ZipFile(byte_data, "w")
for file in filelist:
filename = os.path.basename(os.path.normpath(file))
zip_file.write(file, filename)
zip_file.close()
response = HttpResponse(byte_data.getvalue(), content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=files.zip'
# Print list files in zip_file
zip_file.printdir()
return response
For python3 i use the io.ByteIO since StringIO is deprecated to achieve this. Hope it helps.
import io
def my_downloadable_zip(request):
zip_io = io.BytesIO()
with zipfile.ZipFile(zip_io, mode='w', compression=zipfile.ZIP_DEFLATED) as backup_zip:
backup_zip.write('file_name_loc_to_zip') # u can also make use of list of filename location
# and do some iteration over it
response = HttpResponse(zip_io.getvalue(), content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename=%s' % 'your_zipfilename' + ".zip"
response['Content-Length'] = zip_io.tell()
return response
Django doesn't directly handle the generation of dynamic content (specifically Zip files). That work would be done by Python's standard library. You can take a look at how to dynamically create a Zip file in Python here.
If you're worried about it slowing down your server you can cache the requests if you expect to have many of the same requests. You can use Django's cache framework to help you with that.
Overall, zipping files can be CPU intensive but Django shouldn't be any slower than another Python web framework.
Shameless plug: you can use django-zipview for the same purpose.
After a pip install django-zipview:
from zipview.views import BaseZipView
from reviews import Review
class CommentsArchiveView(BaseZipView):
"""Download at once all comments for a review."""
def get_files(self):
document_key = self.kwargs.get('document_key')
reviews = Review.objects \
.filter(document__document_key=document_key) \
.exclude(comments__isnull=True)
return [review.comments.file for review in reviews if review.comments.name]
I suggest to use separate model for storing those temp zip files. You can create zip on-fly, save to model with filefield and finally send url to user.
Advantages:
Serving static zip files with django media mechanism (like usual uploads).
Ability to cleanup stale zip files by regular cron script execution (which can use date field from zip file model).
A lot of contributions were made to the topic already, but since I came across this thread when I first researched this problem, I thought I'd add my own two cents.
Integrating your own zip creation is probably not as robust and optimized as web-server-level solutions. At the same time, we're using Nginx and it doesn't come with a module out of the box.
You can, however, compile Nginx with the mod_zip module (see here for a docker image with the latest stable Nginx version, and an alpine base making it smaller than the default Nginx image). This adds the zip stream capabilities.
Then Django just needs to serve a list of files to zip, all done!
It is a little more reusable to use a library for this file list response, and django-zip-stream offers just that.
Sadly it never really worked for me, so I started a fork with fixes and improvements.
You can use it in a few lines:
def download_view(request, name=""):
from django_zip_stream.responses import FolderZipResponse
path = settings.STATIC_ROOT
path = os.path.join(path, name)
return FolderZipResponse(path)
You need a way to have Nginx serve all files that you want to archive, but that's it.
Can't you just write a link to a "zip server" or whatnot? Why does the zip archive itself need to be served from Django? A 90's era CGI script to generate a zip and spit it to stdout is really all that's required here, at least as far as I can see.

Categories

Resources